Computationally Discriminating Literary from Non-Literary Texts
-
Max M. Louwerse
, Nick Benesh and Bin Zhang
Abstract
Three computational linguistic methods are presented to discriminate literary from non-literary texts. In the first study, a hierarchical clustering technique of results obtained from Latent Semantic Analysis showed a clustering of literary versus non-literary texts. The second study used the frequencies of shared bigrams across the text, resulting in a 100% correct classification of literary versus non-literary texts. The third study used unigrams yielding a 94% correct classification into literary versus non-literary texts. The final two studies using a larger sample of texts showed that the high classification performance cannot be attributed to specific texts. These findings provide evidence that distinguishing literature from non-literature can be done with high accuracy and with relatively simple computational linguistic techniques.
Abstract
Three computational linguistic methods are presented to discriminate literary from non-literary texts. In the first study, a hierarchical clustering technique of results obtained from Latent Semantic Analysis showed a clustering of literary versus non-literary texts. The second study used the frequencies of shared bigrams across the text, resulting in a 100% correct classification of literary versus non-literary texts. The third study used unigrams yielding a 94% correct classification into literary versus non-literary texts. The final two studies using a larger sample of texts showed that the high classification performance cannot be attributed to specific texts. These findings provide evidence that distinguishing literature from non-literature can be done with high accuracy and with relatively simple computational linguistic techniques.
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction ix
-
Part I. Theoretical and philosophical perspectives
- Studying literature and being empirical: A multifaceted conjunction 7
- Empirical research into the processing of free indirect discourse and the imperative of ecological validity 21
- Notes toward a new philology 35
- A theory of expressive reading 49
-
Part II. Psychology, foregrounding and literature
- Textual and extra-textual manipulations in the empirical study of literary response 75
- Foregrounding and feeling in response to narrative 89
- Two levels of foregrounding in literary narratives 103
- Narrative empathy and inter-group relations 113
- Effects of reading on knowledge, social abilities, and selfhood: Theory and empirical studies 127
- Imagining what could happen: Effects of taking the role of a character on social cognition 139
-
Part III. Computers and the humanities
- An automated text analysis: Willie Van Peer's academic contributions 161
- Computationally Discriminating Literary from Non-Literary Texts 175
- Metaphors and software-assisted cognitive stylistics 193
- Searching for style in modern American poetry 211
- The laws governing the history of poetry 229
- Consolidating empirical method in data-assisted stylistics: Towards a corpus-attested glossary of literary terms. 243
-
Part IV. REDES Project: The new generation
- Empirical evaluation: Towards an automated index of lexical variety 271
- Language allergy: Myth or reality 283
- Proper names in the translation of The Lord of the Rings 297
- Threat and geographical distance: the case of North Korea 309
- The Apology of Popular Fiction: Everyday Uses of Literature in Poland 317
- Afterword. A Matter of versifying: Tradition, innovation and the sonnet form in English 329
- About the contributors 343
- Index of authors 351
- Index of keywords 355
Chapters in this book
- Prelim pages i
- Table of contents v
- Introduction ix
-
Part I. Theoretical and philosophical perspectives
- Studying literature and being empirical: A multifaceted conjunction 7
- Empirical research into the processing of free indirect discourse and the imperative of ecological validity 21
- Notes toward a new philology 35
- A theory of expressive reading 49
-
Part II. Psychology, foregrounding and literature
- Textual and extra-textual manipulations in the empirical study of literary response 75
- Foregrounding and feeling in response to narrative 89
- Two levels of foregrounding in literary narratives 103
- Narrative empathy and inter-group relations 113
- Effects of reading on knowledge, social abilities, and selfhood: Theory and empirical studies 127
- Imagining what could happen: Effects of taking the role of a character on social cognition 139
-
Part III. Computers and the humanities
- An automated text analysis: Willie Van Peer's academic contributions 161
- Computationally Discriminating Literary from Non-Literary Texts 175
- Metaphors and software-assisted cognitive stylistics 193
- Searching for style in modern American poetry 211
- The laws governing the history of poetry 229
- Consolidating empirical method in data-assisted stylistics: Towards a corpus-attested glossary of literary terms. 243
-
Part IV. REDES Project: The new generation
- Empirical evaluation: Towards an automated index of lexical variety 271
- Language allergy: Myth or reality 283
- Proper names in the translation of The Lord of the Rings 297
- Threat and geographical distance: the case of North Korea 309
- The Apology of Popular Fiction: Everyday Uses of Literature in Poland 317
- Afterword. A Matter of versifying: Tradition, innovation and the sonnet form in English 329
- About the contributors 343
- Index of authors 351
- Index of keywords 355