Home The Signal in the Noise: Hermeneutics and/of Computational Literary Sociology
Article Open Access

The Signal in the Noise: Hermeneutics and/of Computational Literary Sociology

  • Anna Muenchrath EMAIL logo
Published/Copyright: March 6, 2025
Become an author with De Gruyter Brill

Abstract

In literary sociology, texts have often been treated as a product of human history, and therefore as symptoms of the forces that have shaped them. This essay considers a single strain of computational literary studies that uses predictive modeling to locate differences between two corpora of texts, determining, for example, what makes a text literary, reviewable, or prize-worthy. This type of computational literary studies is sociological in nature but changes the epistemological status of the text from a symptom of the conditions of its production to a source of evidence supporting claims about human selection. In the ongoing »method wars«, some proponents of postcritique have allied themselves with computational literary studies due to its perceived machinic ability to read objectively and on the surface of texts; however, as this essay argues, the scale of computational methods and their ability to produce strong, generalizable theories complicate not only this alliance, but also the distinction between critique (characterized in this essay with reference to Jameson) and postcritique (characterized in this essay with reference to Sedgwick, Latour, Felski, Love, Best and Marcus), which rests on what role the text should play in our accounts of history.

This essay responds to the seemingly confounding position that computational literary sociology takes within this debate by arguing that the relationship between the text and history is accounted for in predictive modeling in the first instance through a statistical hermeneutic. This hermeneutic, as is shown through a historization of computational literary sociology’s assumptions about humans and their relationship to the world, is inherently paranoid, imagining that human behavior emits historical signals that are retrievable from amidst the noise of the data. This essay pays particular attention to this construction – »signal in the noise« – in order to argue that computational literary sociology is fundamentally concerned with demystifying and finding meaning within the randomness of human behavior, which is why it cannot be so easily divorced from the tradition of critique.

By conducting a brief survey of work in computational literary sociology (Dalen-Oskam, English, Moretti, So, Underwood), the essay argues that there are numerous ways of making meaning of the signal in the noise. One of these is to treat it as a symbol requiring interpretation, usually leading to an explanation that relies on the existence of something like the political unconscious. Another is to refuse critique in order to trace or assemble the actions that manufactured the appearance of the signal. In either case, the second instance of hermeneutical maneuvering that occurs in computational literary sociology is an answer to the question: how did the signal get into the noise? Also revealed through this survey is that attending to the noise, rather than the signal, is a productive way to leverage a unique affordance of computational literary sociology – its visualization of the relationship between the corpus and the signal that appears to order it.

Ultimately this essay argues that despite computational literary sociology’s reconsideration of the role of the text, it continues to, in the critical tradition, treat the text as a material instantiation of history. Although computational literary sociology is not as postcritical as some might have thought, it still enables postcritical readings of a model’s outputs. Furthermore, the essay finds that what can become visible in computational literary studies is, first, the way that statistics has come to act – like ideology critique before it – as a nearly ubiquitous and therefore infrequently questioned tool for understanding human behavior and, second, that what is valuable in thinking computationally is the ability to pay attention to the »noise«, without which statistical models would necessarily flounder.

Mathematics, which most of us see as the most factual of all sciences, constitutes the most colossal metaphor imaginable, and must be judged, aesthetically as well as intellectually, in terms of the success of this metaphor.

– Norbert Wiener in The Human Use of Human Beings (1950)

»Some novels undoubtedly do allegorize their conditions of production«, Lee Konstantinou (2023) writes in a recent review, »but does adopting a sociological perspective require us to believe that all (or even many) do?« Konstantinou is writing here about Dan Sinykin’s work on the production of fiction in the age of conglomeration. Sinykin uses archival research to compellingly argue that US publishing houses bought by large conglomerates in the 20th century realigned their publishing aims and presents close readings of novels by Danielle Steel, Stephen King, and Toni Morrison (among others) to demonstrate how these changing conditions of production manifest themselves in fiction. As Konstantinou notes, the allegorical relationship between literary sociological findings and the literature it takes as its object is de rigueur, to be found in literary sociological work from Pierre Bourdieu to Mark McGurl. Readings of these relationships are often convincing, charismatic, and undoubtedly pleasurable to both read and write (as I know from personal experience). They function to connect the scale of the social with the scale of the text – an institution, time period, or political context with a plot, sentence, or word.

Whereas literary sociology has often used the text as a post facto allegory for its conditions of production, relatively new methods of quantitative literary sociology, particularly the predictive modeling facet of its computational strand, which is my sole focus here, reverses this causality in order to say something about the social context of literary production based on data from within the text itself. Predictive modeling trains a program on a portion of two distinct groups of texts, for example prize-winning novels and non-prize-winning novels. The model is then tested on the remaining texts in those two groups, to see if it can accurately predict which novels belong in which group based only on textual features like frequency of words. Anything over fifty percent accuracy means that the program is better than a random guess at sorting the texts. Scholars can then query the model to find which features of the texts were more likely to appear in one group or the other. For example, using predictive modeling, Ted Underwood and Jordan Sellers employ a word-counting algorithm to identify differences in texts that were reviewed in prestigious venues versus those that were not in order to »capture a truly generalizable relationship between language and reception« and record changes in this relationship over time (2016, 326). Elsewhere, Underwood uses a similar model to identify differences between texts that librarians have labelled »fiction« and »nonfiction«. In both cases, Underwood is careful to clarify that what is being produced is a »perspectival model« – a measurement not of what should be reviewed or an essential difference between fiction and nonfiction, but rather a model of how reviewers and librarians have historically read and sorted texts (2019, 36).

In this sense, this particular kind of perspectival or predictive modeling is primarily sociological, using evidence from a corpus of texts to support claims about the texts’ production, consecration, or reception. What separates texts selected by prize juries from texts not selected by prize juries? Not: what is the essence of a prize-winning (or good or literary) novel? Karina van Dalen-Oskam, who holds the first professorial chair in computational literary studies (a position whose founding in 2012 also represents the first use of the term), is concerned with the question of whether literary quality can be measured through computational methods and therefore found in evidence from within texts themselves (2021, 18). But rather than finding an objective answer to the question of what makes literary fiction, what van Dalen-Oskam is really measuring is »what linguistic features correlate with readers’ perception of literary quality« (ibid., 9). The variables being tested in computational models – published by a big press or a small one, reviewed or not, considered by readers to be literary or not – are matters of human selection. The question predictive modeling answers is: what are readers experiencing in these texts that causes them to categorize them in particular ways? Calling the field computational literary sociology (rather than studies), as I am doing here, attempts to bring attention to the fact that the object of study here is really human behavior and its complicated interaction with culture, which it at once shapes and is shaped by.

The foundational assumption on which computational literary sociology relies is something that Franco Moretti, one of the original adopters of quantitative methods in literary studies, makes explicit, which is that the literary system »doesn’t remain outside the text: it’s embedded well into its form«, even if we understand »form« here loosely to include the number of times particular words, types of words, or combinations of words appear in texts (2013, 58 sq.). This tenet, broadly speaking, is also what authorizes allegorical sociological readings in the first place. But here the status of the text changes: rather than dramatizing the conflict of the literary field that conditioned their production, texts become a posteriori evidence for them. Texts take on a new epistemological role, and their »reading« (by a computer model) precedes sociological insights, rather than follows them.

The reexamination of the status of texts in our scholarly approach to them is also the central issue in an ongoing debate in literary studies over the continued value of critical versus what have been called postcritical approaches.[1] Scholars on both sides of this debate have positioned sociological approaches as methodologically valuable, although computational literary sociology is often, for reasons I’ll outline below, coopted by those in favor of a postcritical stance toward texts. Yet the binaries produced in discussions about the value of critique – surface vs. depth, distant vs. close, paranoid vs. reparative – don’t map straightforwardly onto the ways texts are being »read« by scholars using predictive models to make sociological claims.

Rather than continuing to rehash the well-worn merits and limitations of computational modeling in literary studies, the remainder of this essay responds to the opening put forward by postcritical advocates of computational literary sociology by asking the questions: What is the computational literary sociologist’s stance towards the text (is it reparative, suspicious, distant, close)? How should we understand computational literary sociology with respect to the tradition of critique and the tendencies of postcritique? And what does treating the text as a source of data tell us about the stakes of reading critically versus postcritically?

Although postcritical scholars have allied themselves with computational approaches due to the perceived objectivity and surface-level of machinic reading, the first part of this essay maps what have been called the »method wars« along the axes of scale and the »strength« of respective theories in order to problematize computational literary sociology’s relationship with critique. Instead, the second part of the essay argues, computational literary sociology, in an initial interpretive act, employs an inherently suspicious statistical hermeneutic rooted in historical assumptions about human behavior, its measurability, and concomitant optimization. Once this statistical signal has been received, computational literary sociology (often) tasks itself with a second act of interpretation in order to discern how the signal got into the noise. The third part of this essay surveys selected computational literary sociological projects in order to show that both critical and postcritical interpretations of the signal abound. Ultimately, this essay argues, computational literary sociology’s most valuable contribution to the debate between critique and postcritique is not surface reading, but rather a reconsideration of the relationship between statistics and human history, which seems most readily accessible when we turn our attention to the noise, rather than its statistical signal.

1 Critique and Postcritique

The tradition of hermeneutics, according to Paul Ricœur, originates between two poles (1970, 26 sq.). One, rooted in scriptural exegesis, restores meaning through interpretation; the other, traceable in the work of Marx, Freud and Nietzsche and called by Ricœur the hermeneutics of suspicion, demystifies in order to uncover hidden meaning. As this hermeneutic has been taken up by literary scholars, texts and the language they are made of have become symptomatic of meanings available for decoding by the critic. In his articulation of the tenets of what is often called ideology critique, Fredric Jameson elaborates on Louis Althusser’s identification of history as an absent cause in order to argue that this cause is »inaccessible to us except in textual form« and that »our approach to it […] necessarily passes through its prior textualization, its narrativization in the political unconscious« (1982, 35). This treatment of texts as symptomatic of an absent historical cause implicitly authorizes readings like Sinykin’s, where texts produced in the context of increasing conglomeration appear as symptoms of these changes. As Sinykin explains it: »I do not mean that authors kept these questions [about profitability, spin-offs, and other pressures related to conglomeration, A. M.] in mind when writing […]. Success depended on recognition by something like a system, so much so that fiction itself, when published by conglomerates, came to display, seen as a whole, a systematic intelligence, a systematic authorship« (2023, 103). The scale shift here, from individual text to historical publishing context, is made possible through something like Jameson’s political unconscious, which allows the »fiction itself« to reveal something about the context of its production. This constitutes a »negative hermeneutic function« insofar as it engages in »demystifying« or »unmasking« a consciousness not immediately visible on the surface of the text (Jameson 1982, 291).

This hermeneutic tradition has had a privileged relationship with the work of literary sociology coming out of US literary studies departments. In fact in James English’s 2010 account, the sociology of literature has suffered as an independent field in the US because Jamesonian approaches to texts have made every literary scholar a sociologist, whose aim, across »postcolonial studies, queer theory, new historicism«, was »to provide an account of literary texts and practices by reference to the social forces of their production, the social meanings of their formal particulars, and the social effects of their circulation and reception« (viii). Jameson himself suggests that for Marxism »literary and cultural analysis is a social science« (1981, 297). Most importantly, what Jameson values in critique as he outlines it is that it »liberates us from the empirical object, […] displacing our attention« from the text toward »its constitution as an object and its relationship to the other objects thus constituted« (ibid., emphasis original). Contrary to the postcritical position that frames ideological critique as a reduction of the text to a repeated symptom of the same hidden cause, Jameson here articulates an expansion of the text to include its political and sociological context. However, desire for liberation from the text as »empirical object« betrays the fact that the text acts as a mere material witness to the larger historical conflict whose elucidation is the scholar’s real aim.

Since around the turn of this century, and particularly as a result of its perceived appropriation by the right in the form of conspiracy theories, critique has increasingly become problematized as scholars seek alternative, non-suspicious relationships to texts. In Eve Kosofsky Sedgwick’s contribution to this genre, she desires a positive hermeneutic – reparative rather than paranoid – in which the text not only reveals previously hidden knowledge (which, in Sedgwick’s telling, is often not very hidden after all), but also provides the reader with something new, like knowledge of how best to »move among [knowledge’s] causes and effects« (2003, 124). Sociologist Bruno Latour similarly provides a model for a positive hermeneutic in which »the critic is not the one who debunks, but the one who assembles«, where humanities scholarship becomes »associated with more, not with less, with multiplication, not subtraction« (2004, 246, 248). This emphasis on quantity – the rhetoric of a critic as someone who »adds reality to matters of fact« rather than »subtract reality« (ibid., 232) – seems to anticipate the introduction of quantitative literary sociology into this debate. The positivist aspect of postcritical approaches often goes hand in hand with attention to a text’s surface, which, in Stephen Best and Sharon Marcus’s explication requires taking the text on its own terms, accepting that it does as it says rather than treating it as a symptom to be decoded or as an opaque container of hidden meanings (2009, 1 sq.).

Both critique and postcritique assert themselves as particularly sociological, but at different epistemological scales: for Jameson, textual critique constitutes a kind of sociology, whereas for Rita Felski, who has promoted a Latourian approach to literary studies, tracing social connections forms the basis for claims about textual meaning (2015, 11). Felski insists that the »connections and mediations« that produce, circulate, and remediate a text must be »tracked down and described« in order to produce »detailed accounts of the actors, groupings, assemblies, and networks« that would provide evidence for the causal connections between the text and history that ideological critique takes for granted (ibid.). The text as such is an active part of a network, which, in concert with various other actors (publishers, editors, readers, books, etc.), creates literary history, rather than presenting as a symptom of that network.

Computational approaches have been identified as a »worthy fellow traveler in the move from depth to surface reading« (Bode 2023, 519; see also Love 2010, 382 sq.). Best and Marcus present computation as a way of avoiding a hermeneutic of suspicion as »computers are weak interpreters but potent describers, anatomizers, taxonomists« (2009, 11 sq., 17). Underwood also sees computational modeling as descriptive and »deeply compatible with an antifoundational approach to interpretation« (2019, 67). »Computer-generated quantitative scholarship«, Felski writes, is »not free of the long shadow of suspicion, but [it] does not orient [itself] toward critique as [its] primary rationale and vindication« (2015, 26). Computational literary sociology, then, is regularly allied with surface reading, description, and non-suspicious reading practices, primarily because computational models are not understood to have the capacity for hermeneutic maneuvering – they take texts literally, word for word.

Postcritical scholars posit that ultimately it is the specificity and (lack of) strength of their theories that sets them apart from those engaged in critique, a distinction often described in terms of scale. Sedgwick, following Silvan Tomkins, uses the language of weak and strong theories (language not meant to be normative) in order to classify the scalability of causal explanations (2003, 133 sq.). Theories that offer repeatable large-scale explanations for global phenomena such as ideology critique are strong theories (to adapt her language to existing allegorical sociological readings: if everything can be understood as manifesting the conflict of the literary field, the conflict of the literary field is everywhere [ibid., 135]). Those that provide small-scale explanations for local phenomena – little more than a »description of the phenomena which [the theory] purports to explain« – are weak. Sedgwick herself argues for a weak reparative reading practice with minimal explanatory power. Felski similarly urges postcritical scholars to take not the birds-eye view of »the social« but rather to construct it from the ground up (she wants not to »soar like an eagle«, but to »trudge along like an ANT«) (2015, 157 sq.).

Computational literary sociology observes texts at a scale inaccessible to human readers (much more similar to an eagle), framing losses of specificity as a cost for generalizability and the »strength« of the theories it produces (Piper 2016, 6). Moretti famously calls this approach »distant reading« where distance is »a specific form of knowledge« (2007, 1, emphasis original), accessible only through an approach that takes texts as empirical instances of larger historical phenomena. What has emerged here is a portrait of a method at odds with one that, limited by its computational objectivity, remains on the surface, reading only texts as they are. In addition to operating on a distant scale, aiming thereby to produce strong and generalizable theories, computational literary sociology also treats texts as empirical objects, data points in the investigation of the distinguishing features of a corpus. In other words, texts become material witnesses in an unfolding historical drama, recalling the symptomatic treatment of the text against which postcritical scholars normally orient themselves. Is it possible that, counter to postcritical scholars’ embrace of computational methods, computers really aren’t postcritical after all? Is the computer a hermeneut?

2 The Statistical Hermeneutic

By taking the very small scale – punctuation marks, words, word groupings – as its basis for claims about the large scale – genres, periods, reception – computational literary sociology blows up the importance of small parts of a text while making descriptions of the social appear manageable. This relationship between the small and large scale in computational literary sociology is revealed in the first instance through statistical modeling. But the belief that taking a representative sample and running a statistical regression – a technique that determines the strength and nature, positive or negative, between a dependent variable and one or several independent variables – can provide insights into human behavior or cultural production is, like the assumptions made by critique or postcritique, historically contingent. It relies on and shapes a particular kind of human subjecthood and stance toward the text that appears objective only insofar as it remains unhistoricized.

When used in computational literary sociology, statistical modeling operates under the twin assumptions that something meaningful about cultural production and reception can be described through quantitative measurement and that statistics elicit meaning from within data that cannot be gleaned by the human mind alone (Nelson 2022, 857 sq.). Statistical methods are not particularly recent, dating in some respects to Galileo, who used them to determine which of a set of measurements of difficult-to-observe phenomena was most probably correct (Smith 2019, 76). When, as Simone Murray (2023) writes, »scientific positivism exerted epistemological force across disciplines«, sociology underwent a process, »whereby the anecdotalism of the Victorian man of letters was being replaced by statistical data newly available to industrialised, increasingly urbanised societies with urgent issues of public health, underemployment and mass education in need of evidence-based policy solutions«. Within this historical context, statistics came to be understood quite differently. Rather than being treated as a way to interpret measurements, statistics became a way to interpret the society that those numbers aimed to measure. This transformation introduced a normative and ideological angle into the observation and analysis of human behavior (and was what made it ideal for its use in eugenics [Smith 2019, 84; see also Gould 1981]).

This transformation required a new belief in what has become a fundamental conceit of statistical models of society, namely that meaningful norms, patterns, or distributions are created by human behavior, and that these are discoverable through statistical modeling. Just as in the European Enlightenment, when humans conveniently became autonomous creatures designed to use reason to discover the world just as the world was becoming rational and law-abiding, the statistical revolution produced an image of society extremely well-matched to the tools recently engaged to measure it.

This statistical hermeneutic has, with the rise of rapid, algorithmic computation of data, begun to shape our quotidian realities, registering within our daily lives in the language of metrics, predictions, and optimization; it has become the defining feature of the way we conceptualize the world in the computational age. Norbert Wiener, writing in 1950, traces this understanding to the discovery of quantum physics (1989, 10 sq.). Where physics used to be concerned with what will always happen, suddenly, physics became concerned with »what will happen with an overwhelming probability« (ibid.). In a discipline that aims to observe and measure reality, »chance has been admitted, not merely as a mathematical tool for physics, but as part of its warp and weft«, altering the way we understand our relationship to the world (ibid.). Kate Crawford sees the origin of a »faith that mathematical formalisms would help us understand humans as a society«, in military-funded research of signal processing and optimization during World War II (2021, 213). »The belief that accurate prediction is fundamentally about reducing the complexity of the world gave rise to an implicit theory of the social«, Crawford writes of this paradigm shift, »find the signal in the noise and make order from disorder« (ibid.). The popular application of statistics in explaining human behavior is now so ubiquitous that it has become banal. Consider, for example, the rise of statistics in the sports industry, where »savvy application of large databases of detailed historical data allows […] analysts to more precisely quantify (and, at the negotiation table, price) the past and potential future impact of a player, lineup, or tactic on outcomes, adjusting for numerous variables, which are multiplying as new tools measure ever more granularly what happens during games« (Burks 2023, 56). Our conceptualization of the world as made meaningful by statistics has brought with it a nearly insatiable desire for optimization made visible in sports and elsewhere with the constant passive capturing of data like keystrokes made, steps taken, and pages turned, as well as the implementation of computational aids toward efficiency and profit maximization.

Cultural and literary production are not immune from these effects. Richard Jean So writes that cultural institutions are using data to »analyze, market and sell novels« (2020, 184). Melanie Walsh and Maria Antoniak suggest that Amazon Studio’s productions are made with an eye toward Goodreads data, citing the filmed adaptations of Philip K. Dick’s The Man in the High Castle and J. R. R. Tolkien’s The Lord of the Rings trilogy (2021, 273 sq.). That Amazon Publishing uses customer data (which it presumably models statistically in order to make understandable at a human scale) in making publishing decisions is, in fact, part of its marketing strategy (Montgomery 2019). We could, therefore, imagine a world in which literary production was primarily controlled not by human selection, but by the same types of predictive models that determine, for example, what sets apart a prize-winning novel. In this scenario, predictive models that determine what distinguishes a prize-winning novel would eventually become tautological: their outputs would become the same as their inputs. The data points would cluster ever more closely around the regression line, and there would be no noise: only a signal being transmitted and received by a machine. The noisy variation – the messiness of human agency – is the problem that statistical models aim to »solve«.

In modern computational literary sociology, statistical patterns are identified through tools that convert the blurriness, uncertainty, and variability of outcomes that occur when measuring human behavior into a clearer, more certain, single abstraction. The p-value, for example, is a predetermined value that works to signal whether or not a model’s output is statistically significant. Like language, this value is arbitrary but conventional and communicates to practitioners the relative strength of the pattern they have identified. By rewriting a textual corpus on the scale of human understanding, this process – homologized by the regression line produced through statistical conventions and traced over a collection of dots – acts as a hermeneutic, indicating the presence of meaning where there was none before. This statistical hermeneutic allows scholars to uncover or produce – the distinction turns out to be precisely the one generating the debate between critique and postcritique – a signal in the noise, for units like genre and period. Underwood writes of this process that »numbers can extract a trend from noisy variation« (2018, 350, emphasis A. M.). For So, data is »a lot of noise« that can be turned into »hard, meaningful signals« (2020, 18).

The repetition of the phrase »signal in the noise« conjures up the idea of deep space exploration – of listening for some sign of intelligence in the seemingly random thrum of the universe. Is there something out there? If a signal were to be received, signifying the presence of intelligence, it would then also become a symbol, requiring interpretation in order to become meaningful. On the one hand, a signal transmits information on its surface: on or off, stop or go, presence or absence. A symbol, on the other, always stands for something. A statistical model’s output functions as a signal that tells us that there is information to be gleaned from the visual noise. But it is also a symbol (Andrew Piper calls it a translation [2018, x]) that asks us to interpret that information and turn it into (human, usually causal) meaning, understandable in (human) language. The frequent use of the word signal centers the (fairly new) assumption that seemingly entropic human behavior always already contains invisible information, but eschews the fact that, as Tarleton Gillespie writes, these signals are »hieroglyphs: shaped by the tool by which they are carved, requiring of priestly interpretation« (2014, 190). In other words, the output of statistical models contains both: a signal that produces order from disorder, and a symbol that prompts renewed hermeneutic attention to fulfill the promise of meaning that the signal extends.

The outputs of computational models, in mapping correlations with diverse relative strengths, provide a (historically contingent) description of the world, hence their perceived affinity with surface reading. Yet, to consider this hermeneutic postcritical misses that statistics is always already a strong theory that creates global, large-scale explanations through the demystification of seemingly random noise. Underwood’s use of the word »extract« demonstrates that computational models test relationships between variables internal to a corpus in order to unearth a signal that was presumably there all along.

Perhaps counterintuitively it is the second hermeneutic maneuver, the human translation – the work of interpreting the output as symbol – that acts as a »black box« in computational literary sociology. Whereas the assumptions made in the collection of data, the variables used, and the code for the models can be made explicit, the verbal narrative that aims to explain what the signal means requires – like critical readings of texts – an interpretive leap. The impulse to generalization, far from having been introduced by computational methods, has been part of the aims of literary studies since at least the advent of new historicism (Piper 2016, 4 sq.). But the interpretive leap required to move between the small and the large – suggesting, for example, that a dialogue in a Shakespeare play dramatizes the conflict of the play’s production at the Globe – is not in fact overcome by the suggestion that the changing frequency of a particular group of words or type of sentence across twentieth century mystery novels tells us something about the desires of readers, the pressures of the market, or the changing structure of the publishing industry. The direction of cause and effect is different here – how do textual markers that have created a signal at scale signify something about the social, rather than how does the social signify within the text – but the mechanism and the interpretive leap is the same. Moretti reveals this unity when he calls distant reading the »completion« of the hermeneutic project: »because the ›abstract objects‹ computation produces – these objects no one experiences directly, but which we all somehow know to take into account – are exactly what hermeneutics wants to raise to the level of consciousness« (2017, 7). We are starting on the opposite bank, but we still need to ford the river. Yes, computational literary sociology operates at scale, but there is a fundamental homology between critique and computation in their shared acts of interpreting a signal that becomes evidence of some meaningful (but hidden) historical cause. The question the statistical model as symbol asks is: how did the signal get into the noise?

3 Reading the Signal in the Noise

Interpretation does not always mean critique, and there are a range of approaches to interpreting the outputs presented by these models and therefore answering the question of how the signal got into the noise. It is precisely because these models are not »objective«, but tell us something about subjective assessments of prestige, literariness, fictionality or whatever else that the question of how these subjective assessments make their way into the language of the texts becomes interesting and important. What mechanism produces this signal?

For Underwood the question isn’t answerable based on a reading of the model’s output alone. The models themselves provide description, but there is no proof »that the inner workings of the model mirror deep structures of the world« (2019, 190, emphasis A. M.). The turning away from depth echoes that of the postcritique where meaning is made through description and connection (»predictive models describe a relationship between social and textual evidence« [ibid., 24, emphasis A. M.]), rather than by unearthing a relationship between texts and the world. Together with Sellers, Underwood creates a model that is able to predict with 79.2 % accuracy whether or not a piece of fiction or poetry published between 1820 and 1919 was reviewed in a prestigious venue. In doing so, they confirm that »diction is meaningfully related to reception« (2016, 328, emphasis A. M.). But this is where the questions begin rather than end: »how can a statistical model be right 79.2 percent of the time? And what was the secret to getting reviewed in this period? We have to«, they write, »content ourselves with a rough answer« (ibid., 330). The answer is rough because it contains too many variables for us to understand in their complexity, so Sellers and Underwood provide an interpretation that points to concrete language as a key indicator of reviewability from among the model’s outputs.

It is clear that the results of this model incite interpretation, even if its makers finally abjure the act itself. Part of what is meaningful to Sellers and Underwood is the discovery that a single model can be used to predict the reviewability of texts across a whole century, which they interpret to mean that tastes change more slowly than scholars previously thought. They also discover that the »model trained on 1845–69 sees works from the 1870s as more likely to be reviewed than the works it was trained on« (Underwood/Sellers 2016, 336 sq.). This provokes the »wild efflorescence of interpretive work« that Nick Seaver (2022, 113), an anthropologist of algorithms, says is prompted by the outputs of computational models:

We didn’t expect to see this, and we don’t want to claim that we understand why it happens. We might speculate, for instance, that standards tend to drift upward because critics and authors respond directly to pressure from reviewers or because they imitate, and slightly exaggerate, the standards already implicit in prominent examples. In that case, synchronic standards would produce diachronic change. But causality could also work the other way: a long-term pattern of diachronic change could itself create synchronic standards if readers in each decade formed their criteria of literary distinction partly by contrasting »the latest thing« to »the embarrassing past.« In fact, causal arrows could point in both of these directions. (Underwood/Sellers 2016, 336 sq.)

Underwood and Sellers ultimately say that »causal processes are hard to trace in detail« and »we [don’t] actually need a causal explanation of this phenomenon« (ibid, 337). Here, the signal has been received, but its significance remains uncertain.

Karina van Dalen-Oskam is similarly reticent in offering causal explanations for her computational findings, although her descriptions of them often highlight illuminating outliers. For example, after she describes the result that statistical models find patterns in texts that allow them to attribute authorship, which is a situation in which we know the signal being produced is meaningful because we can corroborate its production, she acknowledges that »we cannot yet determine exactly how it works, because there are so many different variables involved. For the time being, however, most researchers assume that the differences can partly be traced back to the unique development of the language skills of each individual author« (2021, 21). Even when we know that the signal is an accurate representation of reality, how it got there requires interpretation, because we can’t say definitively why one author writes differently from another, and yet consistently with her past self. Van Dalen-Oskam’s interpretive approach is often to highlight cases that models do not predict correctly. In discussing predictive modeling work done with her student Andreas van Cranenburgh, she writes »it is fascinating to see which titles were being misjudged [by the model]« (ibid., 265), and she suggests that models that predict genre could be productively compared to the genre labels assigned by the marketing division of publishing companies in order to »show where tension arises between marketing strategy and textual form and content« (ibid., 183). Here it is not the signal, but the texts that can’t be correctly ordered by the signal that might produce the most meaning.

Moretti is much more willing to cross the interpretive gap that exists between the signal and the symbol. Unlike Underwood, Moretti usually proposes an a priori interpretation of his findings, namely that through readers’ selections, signals in quantitative data are produced through a sort of evolutionary competition of forms in which they »fight for the limited resources of the market« (2013, 103). »If there is a pattern in the data«, Moretti writes of the signal, »it’s because behind it there is a form which repeats itself over and over again« (2017, 6, emphasis original). Moretti’s evolutionary explanation has been critiqued elsewhere (see for example Prendergast 2005), and Moretti’s explanatory emphasis on readers’ choices self-consciously ignores the possible effects of »publishing, and distribution and their various appendices (reviewing, advertising, etc.)« (2013, 140 sq.).

Yet Moretti’s approach retains the form of a hermeneutics of suspicion, epitomized in his language which places meaning behind the signal, and in which the signal needs to be demystified in order to reveal the conflict of the literary marketplace. Moretti, however, acknowledges that his interpretations are generally just hypotheses because it is unclear that readers, who are supposedly selecting the features they prefer, are even able to see the variables that the models identify (the conceit, remember, is that we cannot see these repeated features without the leverage that computation affords us). Again, the relationship between the signal and the signal’s inscription is a »black box« (this is, in fact, the term Moretti himself uses [2013, 144]). Whereas Underwood suggests that this interpretive gap might be bridged by qualitative scholarship (2019, 139), Moretti suggests that an answer might lie in cognitive science, which could tell us how the many different components of texts are interpreted without our conscious knowledge by the human brain, and therefore how the signal gets into the noise in the first place (2013, 144).

In a study that correlates the setting of novels (future, present, or past) with whether they are bestselling or prize-winning, English interprets the signal first critically and then postcritically to arrive at two different (although not necessarily incompatible) meanings. After describing his results – that around 1980 historical fiction became frequently awarded by prize committees while becoming less frequent on bestseller lists – English states that: »it remains, of course, to explain this transformation« (2016, 411, emphasis original). He first explains it by engaging »Jameson’s causal argument about the cultural effects of neoliberalism since the 1970s« (ibid., 412). However, after butting up against the problem that ideology critique would read in any signal English would have found »the favored tropes of the late capitalist era as symptoms of false historical consciousness« (ibid.), he turns instead to Latourian Actor-Network theory by tracing reviews in prestigious journals and watershed prizes, building a network that ends up centering on the publication of Salman Rushdie’s Midnight’s Children. Here, English suggests that »the critical preference for novels set in the past may not have been imposed by grand, hidden forces but was constructed by a fairly small number of well-placed actors and their concrete interactions at the ground level« (ibid., 415). English’s self-conscious reinterpretation here highlights the necessity of interpreting the model’s statistical description of the world, while also revealing the agnosticism of computational literary sociology in terms of its allegiance to either a critical or postcritical interpretation of that description.

So’s Redlining Culture, which combines qualitative and computational approaches to make claims about race in the corpus of fiction published by Random House in the second half of the twentieth century, is instructive for its treatment of a model’s »noise«. So takes a large-scale view to establish that racial inequality exists in Random House’s publishing lists, describing this corpus »at a cognitive scale well beyond what a single person can observe or read« (2020, 6). The model that So develops is able to predict which books are bestsellers and which are by Black writers 93 percent of the time, which »means that bestselling novels emit a textual signal«, one that »we likely didn’t see before«, »that makes it different from other kinds of novels, novels by black authors in particular« (ibid., 124 sq.). This textual signal (or at least part of it) is adverbs: bestselling novels use them at a higher rate than Black authors. So interprets this accumulation of adverbs in bestselling novels as »a constant deferral of signification« that renders the voice of the American bestseller »an overthinking one« (ibid., 124). His reading here is of what the signal means for the texts, but what does it mean for the production of those texts? How did this adverbial signal enter the noise? Here is So’s interpretation:

Most likely the institutional and textual dynamic exist within a feedback loop. Each day, an editor reads dozens of submissions in search of the next blockbuster or bestselling novel. Our analysis says that this editor, consciously or unconsciously, does this work in part by noticing a set of textual features that are simultaneously very common in previous bestsellers and very uncommon in novels written by black writers (ibid., 125 sq.).

There is something like a political unconscious at work here that has inscribed individual historical action such that it is able to be read traditionally as dramatized in the overthinking voice of the bestseller, as well as newly in the meaning of the signal produced by So’s model.

So then, like van Dalen-Oskam, leverages a particular affordance of the data visualization, namely its ability to identify outliers (the noise), to show that Octavia Butler’s Parable of the Sower is incorrectly identified by the model as both a bestseller and a prizewinner even though it belongs in the other group of texts (novels by Black authors). So, in this case, gets close to the text to perform a reading in which the features (primarily adverbs and lack of racialized language) that the model usually attributes to bestsellers and prize winners rather than novels by Black authors are deployed by the text to subvert the unequal reality of the literary field. Although it may seem like the novel just »adopts the semantic and syntactical style of the bestseller or prizewinner«, So claims that this reading would be »an error in human interpretation« (ibid., 139). Rather, the novel »uses [these] syntactical features – […] otherwise marked by the machine as not distinctive of novels by black authors – precisely to underscore the presence of race in its story frame, albeit in a radically transformed manner« (ibid.). We can’t say how these features – invisible at the human scale, at least consciously – got into the text, but with the help of the machine we can identify them and then interpret them through close reading. So here moves dialectically between the distant and the close, between statistical models, readings of novels, and the archive to suggest interpretations of signals at different scales. He also leverages statistical models to center his attention on the »noise« that becomes a new sort of signal, allowing So to read against, rather than with, the pattern of inequality that he finds.

4 Conclusions

What is computational literary sociology’s stance towards the text? This question is complicated by the fact that the literary text itself often disappears – it is not one of the scales that computational methods usually take on. Instead, they engage in the creation of a signal, epitomized in the example of the data visualization, that connect the scale of the word with the scale of the corpus. The statistical hermeneutic that conditions the inscription of this signal is inherently paranoid, revealing the presence of meaning that was being emitted by human acts of selection all along. The signal/symbol becomes the new object of interpretation, either eliciting critique, in which the symbol’s meaning is demystified in order to reveal the invisible forces that inscribed it, or, prompting surface reading, in which the correlation made visible by the signal is described in a way that draws new connections between previously unattached actors or variables. Despite recentering the text as the origin of the data on which it stakes its claims, computational literary sociology doubles down on the text as a material instantiation of the relationship between human agency and history.

The prevailing assumption that computational literary sociology is unproblematically aligned with postcritical attitudes toward the text misses, due to the fetish of the signal, first, that computational models are themselves never »objective« descriptions of the world but are mediated through a statistical hermeneutic perhaps as ideologically conditioned as critique itself, and second, engages in a type of technological determinism that assumes that these methods can’t be (productively) put to critical uses. Underwood aims to minimize the otherness of computational literary sociology for humanities scholars when he writes that the »twists« of humanistic research can be provided by the theories of critique in which signs prove to mean other than what we thought, or by statistical analysis that shows us that there was really a signal in the noise all along (2019, 166 sq.). When we read the signal in the noise as proving to mean other than what we thought then these two may even at times entirely overlap. For many practitioners of computational literary sociology, the answer to the question »what produces the signal in the noise?« often seems to be something like »the political unconscious«.

This is in no way to disparage either computational literary sociology or critique (or postcritique, for that matter), or to suggest that postcritical scholars who use computational methods are somehow mistaken. It is simply to suggest that the attention that has been paid to the interpretive gap between statistical models and history in critiques of biased/incomplete data collections, errors in digitization, and non-reproducible experiments (a gap that exists in a slightly different form in traditional literary studies as well) would be productively supplemented with closer attention to the nature of the gap itself, as well as to the diversity of complementary methods with which the results of these models can be interpreted. To return to the epigraph in which Wiener identifies mathematics as a metaphor, the sciences are always already ideological in the same way that our reading practices are (even, in my view, those that position themselves against ideological reading) – which ends up being all the more reason to use them in our pursuit of knowledge of ourselves and our cultural production.

Depending on how we read data visualizations – whether on the surface or behind it – we can produce an »assembling« of connections between previously disparate data points, or demystify the symbols that encode previously hidden relationships across long historical narratives. We can describe computational outputs, or we can critique them. But regardless of our reading method what makes these models meaningful is noise. In the hypothetical example of the model that becomes meaningless once its inputs are also selected by a model, not only do the noise and signal become one and the same, but surface reading and critique also become the same. A tracing of a network of actors based on the noise-less input/output of a machine is identical with a narrative uncovering its hidden motives (it has none). The tension, then, between description and critique is in the noise – the ineradicable instability of human selection – and its treatment. Access to and ability to think with this noise in a historical moment that inordinately values statistical optimization is perhaps the most valuable feature of computational approaches to literary sociology today.

References

Best, Stephen/Sharon Marcus, Surface Reading. An Introduction, Representations 108:1 (2009), 1–21.10.1525/rep.2009.108.1.1Search in Google Scholar

Bode, Katherine, What’s the Matter with Computational Literary Studies?, Critical Inquiry 49:4 (2023), 507–529.10.1086/724943Search in Google Scholar

Burks, Tosten, Carnivalesque. On the Ways We See Basketball, Los Angeles Review of Books 39 (2023), 55–59.Search in Google Scholar

Crawford, Kate, Atlas of AI. Power, Politics, and the Planetary Costs of Artificial Intelligence, New Haven, CT 2021.10.12987/9780300252392Search in Google Scholar

English, James F., Everywhere and Nowhere. The Sociology of Literature After »the Sociology of Literature«, New Literary History 41:2 (2010), v–xxiii.10.1353/nlh.2010.0005Search in Google Scholar

English, James F., Now, Not Now. Counting Time in Contemporary Fiction Studies, Modern Language Quarterly 77:3 (2016), 395–418.10.1215/00267929-3570667Search in Google Scholar

Felski, Rita, The Limits of Critique, Chicago, IL 2015.10.7208/chicago/9780226294179.001.0001Search in Google Scholar

Gillespie, Tarleton, The Relevance of Algorithms, in: Tarleton Gillespie/Pablo J. Boczkowski/Kirsten A. Foot (eds.), Media Technologies. Essays on Communication, Materiality, and Society, Cambridge, MA 2014, 167–193.10.7551/mitpress/9780262525374.003.0009Search in Google Scholar

Gould, Stephen Jay, The Mismeasure of Man, New York 1981.Search in Google Scholar

Jameson, Fredric, The Political Unconscious, Ithaca, NY 1982.Search in Google Scholar

Konstantinou, Lee, The Sociology of Literature Comes of Age. Two New Books Investigate How Capitalism and Culture Collide, The Chronicle of Higher Education, 23.10.2023, https://www.chronicle.com/article/the-sociology-of-literature-comes-of-age.Search in Google Scholar

Latour, Bruno, Why Has Critique Run out of Steam? From Matters of Fact to Matters of Concern, Critical Inquiry 30:2 (2004), 225–248.10.1086/421123Search in Google Scholar

Love, Heather, Close but Not Deep. Literary Ethics and the Descriptive Turn, New Literary History 41:2 (2010), 371–391.10.1353/nlh.2010.0007Search in Google Scholar

Montgomery, Blake, The Amazon Publishing Juggernaut, The Atlantic, 08.08.2019.Search in Google Scholar

Moretti, Franco, Graphs, Maps, Trees. Abstract Models for Literary History, London/New York 2007.Search in Google Scholar

Moretti, Franco, Distant Reading, London 2013.Search in Google Scholar

Moretti, Franco, Patterns and Interpretation, Literary Lab Pamphlet 15 (2017), https://litlab.stanford.edu/pamphlets/.Search in Google Scholar

Murray, Simone, Between Impressions and Data. Negotiating Literary Value at the Humanities/Social Sciences Frontier, Australian Literary Studies 38:2 (2023), https://www.australianliterarystudies.com.au/articles/between-impressions-and-data-negotiating-literary-value-at-the-humanitiessocial-sciences-frontier.10.20314/als.aa3f8c1d48Search in Google Scholar

Nelson, Laura K., Situated Knowledges and Partial Perspectives. A Framework for Radical Objectivity in Computational Social Science and Computational Humanities, New Literary History 53/54 (2022), 853–877.10.1353/nlh.2022.a898331Search in Google Scholar

Piper, Andrew, There Will Be Numbers, Journal of Cultural Analytics 1:1 (2016), 1–10.10.22148/16.006Search in Google Scholar

Piper, Andrew, Enumerations. Data and Literary Study, Chicago, IL/London 2018.10.7208/chicago/9780226568898.001.0001Search in Google Scholar

Prendergast, Christopher, Evolution and Literary History, New Left Review 34 (2005), 40–62.Search in Google Scholar

Ricœur, Paul, Freud and Philosophy. An Essay on Interpretation [1965], transl. by Denis Savage, New Haven, CT/London 1970.Search in Google Scholar

Seaver, Nick, Computing Taste. Algorithms and the Makers of Music Recommendation, Chicago, IL 2022.10.7208/chicago/9780226822969.001.0001Search in Google Scholar

Sedgwick, Eve Kosofsky, Paranoid Reading and Reparative Reading, or, You’re so Paranoid, You Probably Think This Essay is About You, in: E. K. S., Touching Feeling. Affect, Pedagogy, Performativity, Durham, NC 2003, 123–151.10.1215/9780822384786-005Search in Google Scholar

Sinykin, Dan, Big Fiction. How Conglomeration Changed the Publishing Industry and American Literature, New York 2023.10.7312/siny19294Search in Google Scholar

Smith, Robert Elliott, Rage Inside the Machine. The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All, London 2019.Search in Google Scholar

So, Richard Jean, Redlining Culture. A Data History of Racial Inequality and Postwar Fiction, New York 2020.10.7312/so--19772Search in Google Scholar

Underwood, Ted, Why Literary Time is Measured in Minutes, English Literary History 85:2 (2018), 341–365.10.1353/elh.2018.0013Search in Google Scholar

Underwood, Ted, Distant Horizons. Digital Evidence and Literary Change, Chicago, IL 2019.10.7208/chicago/9780226612973.001.0001Search in Google Scholar

Underwood, Ted/Jordan Sellers, The Long Durée of Literary Prestige, Modern Language Quarterly 77:3 (2016), 321–344.10.1215/00267929-3570634Search in Google Scholar

van Dalen-Oskam, Karina, The Riddle of Literary Quality. A Computational Approach, Amsterdam 2021.Search in Google Scholar

Walsh, Melanie/Maria Antoniak, The Goodreads »Classics«. A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism, Journal of Cultural Analytics 6:2 (2021), 243–287.10.22148/001c.22221Search in Google Scholar

Wiener, Nobert, The Human Use of Human Beings. Cybernetics and Society [1950], London 1989.Search in Google Scholar

Published Online: 2025-03-06
Published in Print: 2025-03-31

© 2025 the author(s), published by Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 20.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jlt-2025-2007/html
Scroll to top button