Chapter 7. Automatic text classification of disciplinary texts
-
René Venegas
Abstract
The aim of this research is to classify, using and comparing two automatic classification methods, the academic texts included in the PUCV-2006 Corpus of Spanish. The methods are based on shared lexical-semantic content words present in the corpus of academic texts. The classification methods compared in this study are Multinomial Naive Bayes and Support Vector Machine. Both enable the identification of a small group of shared words that help, according to statistical weights, to classify a new text into the four disciplinary areas involved in the corpora. The results allow us to establish that Support Vector Machine classifies academic texts efficiently. Using this method, we were able to automatically identify the disciplinary domain of an academic text – based on a reduced number of shared content lexemes – delivering high performance even in highly-refined disciplines such as Psychology and Social Work.
Abstract
The aim of this research is to classify, using and comparing two automatic classification methods, the academic texts included in the PUCV-2006 Corpus of Spanish. The methods are based on shared lexical-semantic content words present in the corpus of academic texts. The classification methods compared in this study are Multinomial Naive Bayes and Support Vector Machine. Both enable the identification of a small group of shared words that help, according to statistical weights, to classify a new text into the four disciplinary areas involved in the corpora. The results allow us to establish that Support Vector Machine classifies academic texts efficiently. Using this method, we were able to automatically identify the disciplinary domain of an academic text – based on a reduced number of shared content lexemes – delivering high performance even in highly-refined disciplines such as Psychology and Social Work.
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword ix
- About the authors xi
- Introduction 1
- Acknowledgments 5
- Chapter 1. Discourse genres, academic and professional discourses 7
- Chapter 2. Written discourse genres 17
- Chapter 3. Discourse genres in the PUCV-2006 Academic and Professional Corpus of Spanish 37
- Chapter 4. Academic and professional genres 65
- Chapter 5. University academic genres 83
- Chapter 6. Multi-dimentional analysis of an academic corpus in Spanish 101
- Chapter 7. Automatic text classification of disciplinary texts 121
- Chapter 8. Rhetorical organisation of Textbooks 143
- Chapter 9. The Textbook genre and its rhetorical organisation in four scientific disciplines 171
- Chapter 10. The Disciplinary Text genre as a means for accessing disciplinary knowledge 189
- Chapter 11. Academic discourse comprehension in Spanish and English 213
- Chapter 12. Corollary 233
- References 239
- Index 253
Chapters in this book
- Prelim pages i
- Table of contents v
- Foreword ix
- About the authors xi
- Introduction 1
- Acknowledgments 5
- Chapter 1. Discourse genres, academic and professional discourses 7
- Chapter 2. Written discourse genres 17
- Chapter 3. Discourse genres in the PUCV-2006 Academic and Professional Corpus of Spanish 37
- Chapter 4. Academic and professional genres 65
- Chapter 5. University academic genres 83
- Chapter 6. Multi-dimentional analysis of an academic corpus in Spanish 101
- Chapter 7. Automatic text classification of disciplinary texts 121
- Chapter 8. Rhetorical organisation of Textbooks 143
- Chapter 9. The Textbook genre and its rhetorical organisation in four scientific disciplines 171
- Chapter 10. The Disciplinary Text genre as a means for accessing disciplinary knowledge 189
- Chapter 11. Academic discourse comprehension in Spanish and English 213
- Chapter 12. Corollary 233
- References 239
- Index 253