The 400 million word Corpus of Historical American English (1810–2009)
-
Mark Davies
Abstract
The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to decade. Because of its size and its advanced architecture and interface, it allows researchers to look at an extremely wide range of changes – many of which would not be possible with a small 2–4 million word corpus. These include the frequency of any word or phrase by decade and mass comparison of all words in different periods (to examine lexical changes), morphological shifts (via wildcards and pattern matching), syntactic shifts (due to very accurate lemmatization and part of speech tagging), and semantic change (by comparing collocates over time, as well as searches that use data from the integrated thesaurus and customized word lists).
Abstract
The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to decade. Because of its size and its advanced architecture and interface, it allows researchers to look at an extremely wide range of changes – many of which would not be possible with a small 2–4 million word corpus. These include the frequency of any word or phrase by decade and mass comparison of all words in different periods (to examine lexical changes), morphological shifts (via wildcards and pattern matching), syntactic shifts (due to very accurate lemmatization and part of speech tagging), and semantic change (by comparing collocates over time, as well as searches that use data from the integrated thesaurus and customized word lists).
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword & Acknowledgements vii
- Introduction 1
- Norse influence on English in the light of general contact linguistics 15
- The Germanic roots of the Old English sound system 43
- Monetary policy and Old English dialects 73
- The order and schedule of nominal plural formation transfer in three Southern dialects of Early Middle English 95
- The temporal and regional contexts of the numeral ‘two’ in Middle English 115
- Grammaticalisation, contact and corpora 131
- Discourse organization and the rise of final then in the history of English 153
- The origins of how come and what…for 177
- “Providing/provided that” 197
- Prefer 215
- The 400 million word Corpus of Historical American English (1810–2009) 231
- Gender change from Old to Middle English 263
- “Please tilt me-ward by return of post” 289
- Multilingualism in the vocabulary of dress and textiles in late medieval Britain 313
- “No man entreth in or out” 327
- Beyond questions and answers 349
- The demise of gog and cock and their phraseologies in dramatic discourse 369
- Index 383
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword & Acknowledgements vii
- Introduction 1
- Norse influence on English in the light of general contact linguistics 15
- The Germanic roots of the Old English sound system 43
- Monetary policy and Old English dialects 73
- The order and schedule of nominal plural formation transfer in three Southern dialects of Early Middle English 95
- The temporal and regional contexts of the numeral ‘two’ in Middle English 115
- Grammaticalisation, contact and corpora 131
- Discourse organization and the rise of final then in the history of English 153
- The origins of how come and what…for 177
- “Providing/provided that” 197
- Prefer 215
- The 400 million word Corpus of Historical American English (1810–2009) 231
- Gender change from Old to Middle English 263
- “Please tilt me-ward by return of post” 289
- Multilingualism in the vocabulary of dress and textiles in late medieval Britain 313
- “No man entreth in or out” 327
- Beyond questions and answers 349
- The demise of gog and cock and their phraseologies in dramatic discourse 369
- Index 383