Startseite Linguistik & Semiotik The 400 million word Corpus of Historical American English (1810–2009)
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

The 400 million word Corpus of Historical American English (1810–2009)

  • Mark Davies
Weitere Titel anzeigen von John Benjamins Publishing Company
English Historical Linguistics 2010
Ein Kapitel aus dem Buch English Historical Linguistics 2010

Abstract

The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to decade. Because of its size and its advanced architecture and interface, it allows researchers to look at an extremely wide range of changes – many of which would not be possible with a small 2–4 million word corpus. These include the frequency of any word or phrase by decade and mass comparison of all words in different periods (to examine lexical changes), morphological shifts (via wildcards and pattern matching), syntactic shifts (due to very accurate lemmatization and part of speech tagging), and semantic change (by comparing collocates over time, as well as searches that use data from the integrated thesaurus and customized word lists).

Abstract

The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to decade. Because of its size and its advanced architecture and interface, it allows researchers to look at an extremely wide range of changes – many of which would not be possible with a small 2–4 million word corpus. These include the frequency of any word or phrase by decade and mass comparison of all words in different periods (to examine lexical changes), morphological shifts (via wildcards and pattern matching), syntactic shifts (due to very accurate lemmatization and part of speech tagging), and semantic change (by comparing collocates over time, as well as searches that use data from the integrated thesaurus and customized word lists).

Heruntergeladen am 2.1.2026 von https://www.degruyterbrill.com/document/doi/10.1075/cilt.325.11dav/html
Button zum nach oben scrollen