We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Speakers often do not state requests directly but employ innuendos such as Would you like to see my etchings? Though such indirectness seems puzzlingly inefficient, it can be explained by a theory of the strategic speaker, who seeks plausible deniability when he or she is uncertain of whether the hearer is cooperative or antagonistic. A paradigm case is bribing a policeman who may be corrupt or honest: A veiled bribe may be accepted by the former and ignored by the latter. Everyday social interactions can have a similar payoff structure (with emotional rather than legal penalties) whenever a request is implicitly forbidden by the relational model holding between speaker and hearer (e.g., bribing an honest maitre d’, where the reciprocity of the bribe clashes with his authority). Even when a hearer’s willingness is known, indirect speech offers higher-order plausible deniability by preempting certainty, gossip, and common knowledge of the request. In supporting experiments, participants judged the intentions and reactions of characters in scenarios that involved fraught requests varying in politeness and directness.
Words, grammar, and phonology are linguistically distinct, yet their neural substrates are difficult to distinguish in macroscopic brain regions. We investigated whether they can be separated in time and space at the circuit level using intracranial electrophysiology (ICE), namely by recording local field potentials from populations of neurons using electrodes implanted in language-related brain regions while people read words verbatim or grammatically inflected them (present/past or singular/plural). Neighboring probes within Broca’s area revealed distinct neuronal activity for lexical (~200 milliseconds), grammatical (~320 milliseconds), and phonological (~450 milliseconds) processing, identically for nouns and verbs, in a region activated in the same patients and task in functional magnetic resonance imaging. This suggests that a linguistic processing sequence predicted on computational grounds is implemented in the brain in fine-grained spatiotemporally patterned activity.
Why do compounds containing regular plurals, such as rats-infested, sound so much worse than corresponding compounds containing irregular plurals, such as mice-infested? Berent and Pinker (2007) reported five experiments showing that this theoretically important effect hinges on the morphological structure of the plurals, not their phonological properties, as had been claimed by Haskell, MacDonald, and Seidenberg (2003). In this note we reply to a critique by these authors. We show that the connectionist model they invoke to explain the data has nothing to do with compounding but exploits fortuitous properties of adjectives, and that our experimental results disconfirm explicit predictions the authors had made. We also present new analyses which answer the authors’ methodological objections. We conclude that the interaction of compounding with regularity is a robust effect, unconfounded with phonology or semantics.