The dazzling technological achievement of the Comédie-Française Registers Project raises a correspondingly mundane question: what now?1 In other words, now that the daily receipt registers scrupulously kept by administrators of the Comédie-Française theater troupe between 1680 and 1793 have been made publicly available in the form of an online searchable database, what kinds of questions might scholars and others interested in the early modern French theater ask of this treasure trove of now-easily accessible historical evidence?2 Of course this question is only superficially mundane and, on the one hand, its answer is obvious. The Comédie-Française Registers Project (hereafter, CFRP) makes it possible to sort through immense quantities of data quickly according to search parameters designed to reveal historical trends and patterns in the theater’s daily activities. These questions might range from the more predictable – which plays were performed most often, by which authors, at what house capacity? – to the more subtle – is it “possible to trace the radicalization of the king’s subjects over the course of the centur[ies] by studying their theatrical tastes”?3 In this sense, the CFRP is an invaluable tool of historical reconstruction and analysis.
But, on the other hand, the “what now” question raises another set of far more complicated issues involving print culture, data analysis, pattern recognition, the phenomenology of scholarly observation, knowledge-production, and even the nature of the humanities themselves. To be sure, the kind of historical research enabled by the CFRP has always been possible. The account registers, long held in the Bibliothèque-Musée de la Comédie-Française, have lent themselves to a highly productive, more conventional form of data analysis that has yielded important works of scholarship, including Jules Bonnassies’s nineteenth-century La Comédie-Française: histoire administrative (1658-1757), H. C. Lancaster’s The Comédie-Française, 1680-1701. Plays, Actors, Spectators, Finances (1941), Lancaster’s classic nine-volume A History of French Dramatic Literature in the Seventeenth Century (1929-1942), as well as, more recently, Sabine Chaouche’s La mise en scène du repertoire à la Comédie-Française, 1680-1815, and Jan Clarke’s work on the earlier Guénégaud theater.4 But the CFRP, specifically, offers us something different. By bringing to bear on the register books the digital media technologies of high-res imaging, algorithmic sorting and cross-listing, exportable data sets, and searchable databases, the CFRP makes it possible to discover tendencies and structures in the registers that might not otherwise have been perceptible to more traditional forms of archival investigation. It holds out the tantalizing possibility that a new kind of knowledge is within our grasp, not only with respect to the Comédie-Française registers, but, more broadly, to the very principles of inquiry in the humanities. Debates in the recent and evolving field of digital humanities in fact often turn on the notion that computer modeling illuminates the hidden recesses of the arts and humanities just as the microscope and telescope did for the physical sciences during the Renaissance.5
In the discussion that follows, I want to make two simple, straightforward observations. The first is that humanities archive initiatives like the CFRP compel us to generate visualizations of the information they contain. Second, these data visualizations propose, naturally, to tell us something new. They reveal things we did not know, could not perceive, or maybe didn’t even know we were looking for. In the case of the CFRP, our visualizations take so-called raw data (“Raw Data” is one of the tabs housed on the CFRP database menu), gathered initially from the high-resolution scans of the daily receipt registers [Fig. 1], abstract them, and display them visually in graphic form. [Fig. 2]6
When we look at the visualization in Figure 2 we learn, for example, the frequency which with an author represented in the registers saw his plays produced, by decade. If we employ a different set of variables and ask a different set of questions, we are able to see the receipts by author relative to annual average revenue. [Fig. 3]
If we focus on the production of one author, say, Molière, we can see the distribution of Molière’s grandes pièces by year and by day of the week. [Fig. 4]
In each case, we are engaged in an act of seeing that is qualitatively different, at least on the surface, from the practices of sifting, sorting, collating – in short, reading – we are accustomed to doing in the archives. I would suggest that seeing (literary) data is not an insignificant scholarly and conceptual situation. Is a data-driven approach to literary study possible without the visualizations that make them, paradoxically, legible? What kind of being do data have if they are not visually displayed?
I want to consider the specifically visual form that digital humanities projects like the CFRP tend to take and wonder briefly if the prevalence of visual interface provides an opportunity to consider once again the core principles of literary investigation, including meaning and interpretation, rationality, selfhood, consciousness, and even the concept of the human itself. Bernard Stiegler has recently written that digitized and purely computational capitalism has come to define the twenty-first-century Anthropocene and requires new ways of imagining wealth and value in a future that combats what he calls the entropy of total digital automation.7 Along these lines, I want to consider how digital technologies in the humanities might require us to imagine and formulate possible futures for concepts of the literary and of literary studies. To put it somewhat crudely, are the digital methods of information visualization merely a tool with which to see the literary or are they the conditions of a new understanding of literary study? Does the specifically literary visualization have an aesthetic? As a form of image, does it have affect, for example? Does it have its own agency? Does it, as the new media theorist Matthew Fuller asks of the artist Kurt Schwitters, have a material poetics? Does it, as Fuller puts it, “make the world and take part in it, and at the same time, synthesize, block, or make possible other worlds [which] gently slip into, swell across, or mutate those we are apparently content that we live in [?]”8 How, in Mark Hansen’s formulation, might we accept that, “rather than finding instantiation in a privileged technical form (including the computer interface),” the digital image “now demarcates the very process through which the body, in conjunction with the various apparatuses for rendering information perceptible, gives form to or in-forms information[?]”9 In other words, is the literary visualization constitutive of, rather than merely representative of, the objects of literary investigation?10
I. Data Anxiety
I want to emphasize what many scholars have already pointed out, namely, that digital analysis is far from merely prosthetic. It is not a critical tool simply applied to or enacted upon historical and textual data, nor do those data emerge untouched and untransformed by its application or enactment. (It seems worth repeating that in this important respect automated computer analysis is no different from its more traditional, archival forebear.) There exists an immense body of scholarship which gives lie to an assumption prevalent in the popular media that data themselves are somehow pre-analytical, pre-factual, pre-ideological, and that big data analytics especially promise today to reveal natural truths about our world that have previously been invisible to human observation.11 As Rob Kitchin puts it, in one representative study, so-called “raw” data and the infrastructures that gather them are never raw, but are instead socially, politically, and materially situated. They are “complex sociotechnical systems that are embedded within a larger institutional landscape of researchers, institutions and corporations, constituting essential tools in the production of knowledge, governance and capital.”12 Data have meaning only to the extent that they have been given what we might call narrative shape. As theorists have shown, the meaningfulness of data requires the establishment of correlations and linkages between data events: big data, for example, is “less about data that is big than it is about a capacity to search, aggregate, and cross-reference large data sets.”13 Accurate predictive analysis, the holy grail of data analytics (anticipated – and problematized – in Philip K. Dick’s 1956 short story about predictive policing, “The Minority Report”), depends precisely on the algorithmic derivation of patterns in the personal data that is regularly and automatically collected about us.14
The application of digital modeling to the arts and humanities, however, brings into sharp focus, and, I would suggest, distills to its essence, a central epistemological anxiety underlying the often breathless promotion of data analytics and the promise they hold out for transformative advancement in the areas of scientific research, big business, marketing, and national security. The arts and humanities generate – and embrace; indeed, are constituted by – ways of knowing that very often cannot be quantified and, as such, exemplify a stymying byproduct of data technologies. Time and again, scholars and commentators identify what we might call a kind of epistemological other lurking within or beside the apparently bias-free, absolutely objective knowledges generated by the “big data revolution.” For every article published in Wired magazine announcing that theoretical modeling is obsolete, that hypothesis and experimentation are quaint and out of date, that the inferences of causal logic have been displaced by the truths of algorithmic correlation – “with enough data, the numbers speak for themselves”15 – there is a corresponding round of hand-wringing brought on by the sense that, in Kate Crawford’s apt formulation, “it will never be enough.”16 Google collects data and its algorithms reveal patterns, but we don’t always know what they mean; our security agencies either will never have enough data to predict the next terrorist plot, or they will have so much data that they will forever be forced into irrelevant wild-goose chases, and, again, fail to predict the next terrorist plot; as our “haystack of stored information,” in Frank Pasquale’s words, increases exponentially, automatically, and indefinitely, we can only hope that it “may someday reveal a needle.”17 Data collection requires mechanisms for “coping with abundance, exhaustiveness and variety, timeliness and dynamism, messiness and uncertainty, high relationality, and the fact that much [data] are generated with no specific question in mind.”18 What do we do with all of this data (a question social media sites and marketing firms are all too happy to answer)?19 In other words, and to return to our original question, what now?
On this understanding, the humanities would appear to provide a unique opportunity to test the limits of research derived from data analytics. The cultural material grouped under the disciplinary heading of the humanities is perhaps the purest expression of big data’s seductive appeal. For if the conceptual phenomena that make art what it is are by definition unquantifiable and therefore unplottable, the successful application of digital technologies to the ineffable mysteries of art is of mutual epistemological benefit to both humanists and big data apologists: the former can hope to solve seemingly intractable and previously unaskable problems of explanation while the latter get to tout the unprecedented insights made possible by algorithmic modeling. What if the affective secrets of the color red could be derived by way of its collection and measurement in exhaustive data sets? What if the Kantian sublime could be mapped on a line graph or a scatter plot? Surely this is the kind of ambition held out most provocatively in the humanities by the notorious proclamations of Franco Moretti since the publication of his Atlas of the European Novel in 1998.20 “[W]e have never really tried to read the entire volume of the literary past,” Moretti writes elsewhere and a decade later. Quantitative analysis, he concludes, “is a small step in that direction.”21
To get a sense of the methodological hope that data analytics provokes, as well as the epistemological anxiety that haunts its application, we need only look very briefly at several recent studies that have followed Moretti’s influential example and have employed many of the same techniques of data analysis. (Let me also be very clear that my intention is not to impugn the value of this innovative, fascinating, and revealing scholarship. It is instead to use this work as a way to illustrate what I think are several of the richest and most interesting questions we might ask about the present and future – as well as the past – of literary studies. And though there is much to be said on the topic, I am not concerned, for now, with engaging the equally important body of scholarship seeking to complicate, and sometimes to question the validity of, data-driven literary textual analysis and literary history. Crucial issues pertaining to the situation of the digital humanities in the neoliberal academic economy, to the role of cultural and political critique in humanities data analysis, to the relation of theory and computation, to the necessary historicization of the social imaginary transmitted by digital culture and literacy, and, perhaps most important, in my view, to the nature of “literariness” itself are left aside in the present context.)22
In her fine computational analysis of Victorian poetics, for example, Natalie Houston has shown what we can learn about the cultural history of Victorian texts, as well as their bibliographic, visual, and linguistic codes, by transforming existing catalog metadata for hundreds of nineteenth-century books of poetry into a searchable and visualizable database. Scholars can discover, she writes, “what patterns of growth or decline in the publishing of poetry are visible over time; how the proportion of male to female poets varies by publisher, decade, or kind of poetry; and the ways in which poetry’s publication was dispersed throughout Victorian print culture.”23 The visualizations Houston presents give us the power to move beyond our perceptual limitations as human organisms: digital reading is a method “of literary research and interpretation that draw[s] upon computational analysis to move beyond human limitations of vision, memory, and attention.”24 As such, they embody what Catelijne Coopmans has called the capacity of visual analytics software “to reveal what has hitherto been hidden, to give access to what is otherwise inaccessible.”25 The advantage of the literary data visualization, in other words, is that it reveals what we could not have known otherwise; Houston’s digital method makes trends in poetry publication visible.
Similarly, Matthew Wilkins’s visualization of the number of new fiction works in thousands published in the U.S. between 1940 and 2010 provides a graphic demonstration, again, of our human, biological limitations. We cannot possibly read the vast quantities of books published each year, nor, he suggests, can our traditional reliance on the relatively few works in our literary canons hope to tell us anything representative about a given era of literary and cultural production. “Our time is finite,” he tells us, quite rightly, of course, and therefore
[w]e need to do less close reading and more of anything and everything else that might help us extract information from and about texts as indicators of larger cultural issues. That includes bibliometrics and book historical work, data mining and quantitative text analysis, economic study of the book trade and of other cultural industries, geospatial analysis, and so on.26
Wilkins’s thoughtful discussion addresses the real impossibility of the long-standing human-scholarly dream of thoroughness, coverage, exhaustiveness – in short, of mastery. We should do “more algorithmic and quantitative analysis of piles of text much too large to tackle ‘directly’.” As a result, he tells us, we will be able to “break the grip” of the tiny, arbitrarily chosen canons on which we rely and get down to the real business of thinking about large-scale cultural trends.
Finally, the work being done by a group of researchers at the Stanford Literary Lab demonstrates not only what insights it is statistically possible to discern from the rapidly growing size of digitized literary corpora, but it reveals the same hope for mastery we detect in Wilkin’s essay. In one experiment, the Stanford researchers plot the British novel between 1770 and 1830 according to the criteria of “popularity” and “prestige.” Drawing on the Raven and Garside bibliography, the authors compared the number of novels reprinted in the British isles during this period or translated into French or German (a measure, they write, of “popularity”) with the twentieth-century scholarship on these same works listed in the Dictionary of National Biography and the Modern Language Association database (an index therefore of “prestige”). The results of this comparison, displayed in two of the visualizations included in their results, demonstrate, as the authors put it, that “at this point, an empirical cartography of the literary field was no longer a daydream.”27
Here, the authors’ data visualization succeeds in ways that Pierre Bourdieu’s well-known and influential diagram of the French literary field at the end of the nineteenth century, to which the authors explicitly compare their own visualizations, could not.28 Bourdieu’s chart, the authors point out, is artificially regular in distribution because it included no empirical evidence. The work of the Stanford researchers, by contrast, literally visualizes – their measurement “makes you ‘see’ the process of canonization”29 – the decrease in popularity along the horizontal axis with the passing years while prestige accelerates along the vertical axis. Like a conventional astronomy lab, the Stanford Literary Lab finds itself on the brink of revolutionary breakthrough: “we used to work on a couple of hundred nineteenth-century novels, and now we can analyze thousands of them, tens of thousands, tomorrow hundreds of thousands. It’s a moment of euphoria, for quantitative literary history: like having a telescope that makes you see entirely new galaxies.”30 The dual nature of the nineteenth-century literary canon is now made “visible”31 when previously hidden transformations are mined from the depths of the evidence.
II. A Resistance to Theory
Franco Moretti’s own work is course also full of what he calls literary maps, graphs, and trees. As is his wont, Moretti casts the importance of the literary data visualization in the starkest terms. In a 2011 pamphlet titled “Network Theory, Plot Analysis,” also generated through the Stanford Literary Lab research collective and later published in New Left Review, Moretti examines Hamlet and discovers, to his surprise, as he says, that “Horatio has a function in the play, but not a motivation. No aim, no emotions – no language, really, worthy of Hamlet. I can think of no other character that is so central to a Shakespeare play, and so flat in its style.”32 This insight, Moretti tells us, is drawn from what he calls network theory, a “theory that studies connections within large groups of objects.”33 In the case of Horatio, Moretti asks rhetorically whether he really needed his network theory to produce his discussion. “No,” he responds,
I did not need network theory; but I probably needed networks. I had been thinking about Horatio for some time – but I had never ‘seen’ his position within Hamlet’s field of forces until I looked at the network of the play. ‘Seen’ is the keyword here. What I took from network theory were less concepts than visualization: the possibility of extracting characters and interactions from a dramatic structure, and turning them into a set of signs that I could see at a glance, in a two-dimensional space.34
Moretti’s essay includes fifty-seven visualizations that are intended to demonstrate a number of previously unrecognizable relations and patterns in the plot of Shakespeare’s play.
It is unfair to Moretti to suggest that his proposed measurement and visualization instruments argue against theory in favor of charting and graphing, particularly since he makes a case elsewhere for the theoretical implications of literary computation and his analyses are far more subtle than I present them here. Moreover, Moretti’s work is representative neither of digital humanities today nor of information visualization, and so I admit to using him as a bit of a convenient straw man to make a specific point.35 Moretti’s suggestion that theory can be dispensed with because the real action is to be found in quantification and its visualization brings to mind a notorious 2008 essay by Chris Anderson in Wired called “The End of Theory.” In the era of the “data deluge,” of big data, Anderson tells us, we can forgo the search for models: “The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.”36 Moretti’s work evinces something of the epistemological seductiveness of big data. As danah boyd and Kate Crawford put it, big data are a phenomenon defined not only by specific technologies and modes of analysis, but by cultural mythology as well. Big data maximize “computation power and algorithmic accuracy” to identify patterns in huge quantities of digital information, but they also propagate, in their words, “the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.”37 Whence the promise and threat of big data: big data as the potential to solve previously intractable problems in medical and climate change research, for example, but also as the source of troubling invasions of privacy and increased state and corporate control.
In the area of literary investigation, Moretti’s analyses propose to bring order and rigor to what had been “random and unsystematic.”38 Daring us to be scandalized, Moretti writes very early on in 2000 that close reading is “a theological exercise – very solemn treatment of very few texts taken very seriously – whereas what we really need is a little pact with the devil: we know how to read texts, now let’s learn how not to read them.” Theological is an interesting word here, of course, not only, however, because it carries with it the taint of fuzziness and mysticism, and therefore of the unquantifiable, but because it formulates the paradoxical hope, often shared by champions of big data, that quantification and algorithmic analysis open the way to a new divinity: the possibility of total knowledge. I would propose, however, that for all of the quantitative insight they bring, and for all of the speed with which they bring it, data visualizations that emerge from algorithmic analysis might be more provocatively understood as the site of our engagement with, rather than an exhaustion of, the ineffable capacity of the literary to create the new, and of literary studies to create new concepts in the present, as well as potential images of the future, from the objects of the past.
III. Events of the New
By today’s standards, the information made available by the CFRP does not qualify as big data. At a time when monthly global IP traffic hovers around 100 million EB per month (1 EB, or exabyte = 1 million TB, or terabytes), the data held in the 111 years of Comédie-Française registers is decidedly small.39 In this respect, the resources of the CFRP do not of course carry with them the revolutionary promise of big data in the twenty-first century: to challenge and transform, as Viktor Mayer-Schönberger and Kenneth Cukier suggest, “our most basic understanding of how to make decisions and comprehend reality.”40 The fact remains, however, that digital humanities initiatives like the CFRP are premised on the notion that they, like data analytics more broadly, generate the new. By new, I do not mean (only) that they offer the student of the French theater a new information platform with which to examine the historical record. Nor do I mean (only) that they herald, as the authors of Digital_Humanities propose, “a fundamental shift in the perception of the core creative activities of being human, in which the values and knowledges of the humanities are seen as crucial for shaping every domain of culture and society,” or even, as the same authors continue, that contemporary digital humanities challenge the primacy of the text as the fundamental unit of inquiry while prioritizing design “as an integral component of research, transmedia crisscrossings, and an expanded concept of the sensorium of humanistic knowledge.”41 I refer instead, and in far broader terms, to the ontological conditions whereby the convergence(s) of digital technologies and literary art stage an event of the new.
As the emergent site of evolving modes of knowing, data-driven approaches to literary study do not reveal that which is already out there – “truths” hidden away in a digitized archive, for example – but rather express the conditions through which forces of human and nonhuman, organic and inorganic origin create the new, a situation that is not without similarity to the becoming of art. Elizabeth Grosz puts this beautifully in Chaos, Territory, Art when she writes that what art elicits are
not so much representations, perceptions, images that are readily at hand, recognizable, directly interpretable, identifiable, as does the cliché or popular opinion, good sense, or calculation: rather, they produce and generate sensations never before experienced, perceptions of what has never been perceived before or perhaps cannot be perceived otherwise.42
To think the digital data archive as an event of the new interrogates its status as a given, as an object that is already in the world, as precisely a datum in the etymological sense of “what is given.”43 To consider the humanities database as the creation of the new along the lines Grosz presents is also to uncouple data from the subject and to abstract their deployment in research and scholarship as singularities, as becomings, as what Gilles Deleuze calls “expressions.” Deleuze’s concept of “expression,” which he worked out throughout his career, in, for example, The Logic of Sense and Expressionism in Philosophy: Spinoza, is perhaps best described in the first of his cinema books, The Movement-Image, where it is connected to the image. In an understanding that he derives from the logician C. S. Peirce, Deleuze suggests that expression is not the expression of something for someone, but rather expression: the power to express. Embodied in the image – specifically, the cinematic affection-image exemplified by the close-up shot – expression is devoid of spatio-temporal actualization. Instead, it is pure potential for the event of the new.44 As Brian Massumi puts it, images are incomplete when we understand them only in semantic or semiotic terms, when we fail to see them as expression-events.45
By considering data visualizations as events of “our information- and image-based late capitalist culture,”46 we distinguish them from both their object and subject of representation, and understand them as problematizing the relation itself of object and subject. They are neither an image of nor an image for, but rather the opening into and of the new. Software, as Jussi Parikka writes, is the “potentiality for new connections” in both human and nonhuman agencies, the nonhuman materiality of computers and the human perceptions of digital imaging. Code, Parikka continues, “enacts, enables, and produces effects and affects: relations across scales, tapping into human social relations, relations with machines, relations inside machines – code, and software events as well . . . . Software, then, is not only a black box for input, but a process of modulation in itself, the poetics of potentiality in action.”47 To wit, Wendy Hui Kyong Chun has shown that the copyright laws governing the commodification of software in the twentieth century were predicated on the notion that the immateriality of software could be reconceived as “a thing” external to the individual and therefore put at stake the distinction between internal and external, tangible and intangible, subject and object.48 By considering the software program as a thing, we become aware not of a world of objects, but of subject-object relations understood in their contingency, as the product of historically situated networks, relations, materialities, and embodiments.49 As Heidegger puts it, the English word “thing” conveys most faithfully the Latin res as a matter for discourse, “that which concerns somebody, an affair, a contested matter, a case at law.”50 It is, in short, an event.
In the vocabulary of digital computation, the generation of the new, of new concepts, may well take the form of graphical user interface, of which the data visualization is an important example. We tend to think of interface as the material shape taken by computer screens and displays, or the menu bars and app icons organized and visualized by software code so that we may navigate the abstraction of computational processes. As Johanna Drucker and others who study interface theory and design have shown, however, interface is not specific to information technology, just as visualization is not the sole property of quantitative analysis. It is rather, as Drucker puts it, the very condition of encountering the world. It is, she writes, a boundary-space “through which we imagine our lives into being and give knowledge its forms of expression.”51 It is not a portal through which, in Drucker’s words, information passes like fast-food at a drive-through window. As an element of design, therefore, interface uses “algorithmic methods to play with texts experimentally, generatively, or ‘deformatively’ to discover,” as Alan Liu writes, “alternative ways of meaning that are not so much true to preexisting signals as riffs on those signals. The common goal is to banish, or at least crucially delay, human ideation at the formative onset of interpretation.”52 Before they signify, data visualizations express. How might we understand them in terms of what they do or make rather than what they represent? When Fuller writes that “[s]oftware will need to be seen to do what it does, not do what something else does,”53 we hear an echo of Deleuze when he writes, quoting Jean-Luc Godard, that the film image is “not a just image, just an image.”54
Interface brings interpretation and engineering environments into contact. By thinking the visualization as a creative boundary-space rather than merely the site of pattern recognition, humanistic interpretation and information flows work together as a collective system. This collective, which, in my view, we might understand not only as a kind of hybrid mode of analysis embracing both reading and automation, but as a mode of scholarly collaboration as well, begins to engage what Katherine Hayles, in her most recent work, calls “cognitive nonconscious.” Pushed beyond its usual identification with thought, Hayles writes,
cognition in some instances may be located in the system rather than an individual participant, an important change from a model of cognition centered on the self. As a general concept, the term ‘cognitive nonconscious’ does not specify whether the cognition occurs inside the mental world of the participant, between participants, or within the system as a whole . . . . It . . . operates across and within the full spectrum of cognitive agents: humans, animals, technical devices.55
For those digital humanists who think about graphic interface as a form of making and doing rather than a static mode of recognition, a process-oriented mode of literary visualization perhaps promotes and reinforces a total cognitive system wherein getting one’s hands dusty in the archives is a gesture of information engineering and the data-driven bar graph is a humanistic act of thinking.