29 Word classes and corpus linguistics
-
Lorenzo Gregori
, Walter Paci and Massimo Moneglia
Abstract
This chapter discusses the relevance of Part-of-speech (PoS) tagsets for disambiguating word occurrences and presents the main computational linguistics (CL) standards for annotating word classes in Romance corpora. We highlight the availability of corpora and CL tools for studying the quantitative distribution of PoS in language usage, demonstrating the feasibility of this perspective for Romance languages. It emerged, in both written and spoken variety, that open-class words are similarly distributed in Italian, Spanish, Portuguese, and French, while their quantitative variation is mainly dependent on the linguistic register. Moreover, quantitative trends in the relative frequency of PoS were found in the open class lexicon for both language varieties. Contextual needs are examined to see their influence on the relative frequency of word classes. Lastly, we discuss some linguistic phenomena observed in spoken corpora, showing there are still open challenges for PoS-tagging algorithms.
Abstract
This chapter discusses the relevance of Part-of-speech (PoS) tagsets for disambiguating word occurrences and presents the main computational linguistics (CL) standards for annotating word classes in Romance corpora. We highlight the availability of corpora and CL tools for studying the quantitative distribution of PoS in language usage, demonstrating the feasibility of this perspective for Romance languages. It emerged, in both written and spoken variety, that open-class words are similarly distributed in Italian, Spanish, Portuguese, and French, while their quantitative variation is mainly dependent on the linguistic register. Moreover, quantitative trends in the relative frequency of PoS were found in the open class lexicon for both language varieties. Contextual needs are examined to see their influence on the relative frequency of word classes. Lastly, we discuss some linguistic phenomena observed in spoken corpora, showing there are still open challenges for PoS-tagging algorithms.
Chapters in this book
- Frontmatter I
- Manuals of Romance Linguistics V
- Table of Contents VII
- Abbreviations XI
- Introduction 1
-
I. Romance word classes: theoretical and historical foundations
- 1 Theoretical foundation for a classification of words 13
- 2 How to classify words 41
- 3 Word classes in the history of Western grammar 69
- 4 Parts of speech in the Romance grammars of the Renaissance 97
-
II. Word classes in the major Romance languages
- 5 Nouns 117
- 6 Adjectives 147
- 7 Determiners 177
- 8 Pronouns 207
- 9 Quantifiers 237
- 10 Negation and negative expressions 265
- 11 Verb classes 297
- 12 Auxiliary verbs 325
- 13 Grammatical categories of the verb 345
- 14 Verbal categories expressing syntactic dependencies 367
- 15 Adverbs 401
- 16 Focalisers 431
- 17 Modal particles 449
- 18 Prepositions 471
- 19 Conjunctions 499
-
III. Word classes in smaller Romance varieties
- 20 Word classes in Occitan 527
- 21 Word classes in Sardinian 549
- 22 Word classes in Romansh 575
- 23 Word classes in Ladin 607
- 24 Word classes in Northern Italian dialects 633
- 25 Word classes in Southern Italian dialects 661
- 26 Word classes in Romance-related Creoles 689
-
IV. Romance word classes and their interfaces: new horizons
- 27 Word classes and psycholinguistics 725
- 28 Word classes and learner varieties 743
- 29 Word classes and corpus linguistics 769
- 30 Word classes and neurolinguistics 797
- 31 Exploring the behaviour of connectives within a textometric perspective 819
- Index 843
Chapters in this book
- Frontmatter I
- Manuals of Romance Linguistics V
- Table of Contents VII
- Abbreviations XI
- Introduction 1
-
I. Romance word classes: theoretical and historical foundations
- 1 Theoretical foundation for a classification of words 13
- 2 How to classify words 41
- 3 Word classes in the history of Western grammar 69
- 4 Parts of speech in the Romance grammars of the Renaissance 97
-
II. Word classes in the major Romance languages
- 5 Nouns 117
- 6 Adjectives 147
- 7 Determiners 177
- 8 Pronouns 207
- 9 Quantifiers 237
- 10 Negation and negative expressions 265
- 11 Verb classes 297
- 12 Auxiliary verbs 325
- 13 Grammatical categories of the verb 345
- 14 Verbal categories expressing syntactic dependencies 367
- 15 Adverbs 401
- 16 Focalisers 431
- 17 Modal particles 449
- 18 Prepositions 471
- 19 Conjunctions 499
-
III. Word classes in smaller Romance varieties
- 20 Word classes in Occitan 527
- 21 Word classes in Sardinian 549
- 22 Word classes in Romansh 575
- 23 Word classes in Ladin 607
- 24 Word classes in Northern Italian dialects 633
- 25 Word classes in Southern Italian dialects 661
- 26 Word classes in Romance-related Creoles 689
-
IV. Romance word classes and their interfaces: new horizons
- 27 Word classes and psycholinguistics 725
- 28 Word classes and learner varieties 743
- 29 Word classes and corpus linguistics 769
- 30 Word classes and neurolinguistics 797
- 31 Exploring the behaviour of connectives within a textometric perspective 819
- Index 843