We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
Speakers often do not state requests directly but employ innuendos such as Would you like to see my etchings? Though such indirectness seems puzzlingly inefficient, it can be explained by a theory of the strategic speaker, who seeks plausible deniability when he or she is uncertain of whether the hearer is cooperative or antagonistic. A paradigm case is bribing a policeman who may be corrupt or honest: A veiled bribe may be accepted by the former and ignored by the latter. Everyday social interactions can have a similar payoff structure (with emotional rather than legal penalties) whenever a request is implicitly forbidden by the relational model holding between speaker and hearer (e.g., bribing an honest maitre d’, where the reciprocity of the bribe clashes with his authority). Even when a hearer’s willingness is known, indirect speech offers higher-order plausible deniability by preempting certainty, gossip, and common knowledge of the request. In supporting experiments, participants judged the intentions and reactions of characters in scenarios that involved fraught requests varying in politeness and directness.
Words, grammar, and phonology are linguistically distinct, yet their neural substrates are difficult to distinguish in macroscopic brain regions. We investigated whether they can be separated in time and space at the circuit level using intracranial electrophysiology (ICE), namely by recording local field potentials from populations of neurons using electrodes implanted in language-related brain regions while people read words verbatim or grammatically inflected them (present/past or singular/plural). Neighboring probes within Broca’s area revealed distinct neuronal activity for lexical (~200 milliseconds), grammatical (~320 milliseconds), and phonological (~450 milliseconds) processing, identically for nouns and verbs, in a region activated in the same patients and task in functional magnetic resonance imaging. This suggests that a linguistic processing sequence predicted on computational grounds is implemented in the brain in fine-grained spatiotemporally patterned activity.