by Lillian Lu
Published by Simon & Schuster, 2017 | 288 pages
What can we understand about literature through numbers? Quite a bit, it turns out. In 1963, for example, statisticians Frederick Mosteller and David Wallace settled a long-standing historical debate, using statistical techniques to ascribe authorship to 12 essays from The Federalist Papers to James Madison, rather than Alexander Hamilton (who also claimed to have written them). With a convincing analysis of the two writers’ go-to connective phrases, Mosteller and Wallace solved a literary problem, simply by counting. Recently, a growing body of literary criticism has deployed statistical methods, mapping tools, and innovative graphics to reveal patterns in literature.
These methods—what literary critic Franco Moretti dubbed “distant reading”—are not universally celebrated. Quantitative approaches tend to ruffle critics who prize “close reading,” or intensive analysis of small selections of text, a practice that continues to center literary studies. Imaginative attention to the particularities of a literary text falls to the wayside when we zoom out to look at broader, statistical patterns. Literature’s transformative powers get swallowed by numbers. Or so the argument goes.
This argument isn’t entirely wrong. But accepting it wholesale would be a great loss for anyone who cares about literature. Distant reading has been around for a while, consistently producing results that supplement our understanding of literature. In 1851, mathematician Augustus De Morgan proposed checking authorship of the Epistles by comparing average word length across writings attributed to Paul. Since Moretti coined the term two decades ago, distant reading has flourished, with fascinating conclusions. Researchers at Stanford Literary Lab have mapped literary London in the nineteenth century, suggested that typical paragraphs have between two and four topics, and analyzed the movements of literary genre through time. Other quantitative analyses have considered paragraph length, mapped the characters in Hamlet, and investigated the relative proximity of particular emotions.
Such findings might seem over-specialized for a general audience. By contrast, Ben Blatt’s eminently accessible Nabokov’s Favorite Word is Mauve offers a series of introductory examples of quantitative readings of literature. Each of the book’s nine chapters analyzes a different question using mathematical tools. Blatt’s questions include whether male writers represent women as often as female writers represent men (they don’t), what regional variations exist linguistically between British and American writers (American writers of British fan fiction over-use words like “bloke” and “brilliant”), and what we can know about a book by its cover (the publishing clout of the author, among other things). Each of these results suggests that simple quantification produces interesting insights worthy of critical attention.
At his best, Blatt asks questions that seem too easy to be worth pursuing, and proceeds to render the nuances of these questions apparent through easy-to-follow computations and colorful graphics. In Blatt’s first and most straightforward chapter, for example, he investigates the effects of adverb usage on the quality and popularity of prose, comparing percentages of adverbs ending in “ly” in a variety of contexts. His conclusion is predictable: following Hemingway, Blatt argues against the use of most “ly” adverbs. But this result is interwoven with fascinating break-downs that address the question from new angles. Blatt compares usage of adverbs by fan fiction writers to that of professional writers, with the striking result that the differences between professional writers’ adverb usage pale in comparison to the massive uptick in adverb usage among nonprofessional writing communities. Blatt’s forays within a single author’s works are even more illuminating; Blatt ranks writers like Faulkner’s novels by adverb usage; these rankings mirror many critics’ ranking of the novels with stunning accuracy. Adverb usage, in Blatt’s framework, looks like a decent proxy for editing—in other words, for attentive rereading and rewriting, which tend to eliminate inessential adverbs. This result is both exciting and new.
While critiques of distant reading can be over-conservative, the quantitative study of literature is plagued by real methodological issues, from which Blatt’s work is by no means immune. Confining readerly attention exclusively to counts of words and punctuation becomes a strain by book’s end. Blatt scans novels for cliffhangers, for example, by searching for chapters concluding with question marks and exclamation points. It’s true that the term “cliffhanger” was coined to describe endings that leave readers with unanswered questions, or excited. But Thomas Hardy didn’t require either form of punctuation to inspire the term in A Pair of Blue Eyes in 1873. “Knight felt himself in the presence of a personalized loneliness,” concludes Hardy’s installment, leaving the novel’s hero, Henry Knight, dangling off a cliff. Blatt’s chosen sample, the Hardy boys, unsurprisingly behaves much better: riddled with exclamations and surprises, series writers increasingly tended to end chapters with punctuation to match.
As this example suggests, literary critics are far from theorizing how distant and close reading should rely on one another and, in particular, how we can best enlist statistical results to understand our lived experiences as readers. As Moretti puts it of his character mapping of Hamlet, “the point, of course, was not to present Hamlet’s centrality as a surprise; it was exactly the opposite: had the new approach not found Hamlet at the center of the play, its plausibility would have disintegrated.” From Moretti to Blatt, digital humanists tend to ask questions whose answers they know in order to demonstrate the reliability of their methods. But if we ask computers literary questions whose answers we think we know, what use is distant reading? And where does our trust fall when distant readings contradict our expectations? These puzzles are familiar; they date back to questions raised during the earliest studies of probability, when the math of supposed rationality delivered some head-scratching results, and forward to current conversations surrounding algorithmic trading. Like probability mathematics and algorithmic trading, distant reading is here to stay as a tool in the literary-critical toolbox. But our distrust of the results will remain. That nagging distrust makes digital humanism frustrating—but also generative. Its results sharply defamiliarize literary texts, rendering them newly strange, newly in need of close and careful rereading.
Margaret Kolb is a lecturer in the English department at the University of California, Berkeley, where she is completing a book manuscript tracing the near-simultaneous rises of the novel and probability mathematics.