Home Linguistics & Semiotics 29 Word classes and corpus linguistics
Chapter
Licensed
Unlicensed Requires Authentication

29 Word classes and corpus linguistics

  • Lorenzo Gregori , Walter Paci and Massimo Moneglia
Become an author with De Gruyter Brill
Manual of Romance Word Classes
This chapter is in the book Manual of Romance Word Classes

Abstract

This chapter discusses the relevance of Part-of-speech (PoS) tagsets for disambiguating word occurrences and presents the main computational linguistics (CL) standards for annotating word classes in Romance corpora. We highlight the availability of corpora and CL tools for studying the quantitative distribution of PoS in language usage, demonstrating the feasibility of this perspective for Romance languages. It emerged, in both written and spoken variety, that open-class words are similarly distributed in Italian, Spanish, Portuguese, and French, while their quantitative variation is mainly dependent on the linguistic register. Moreover, quantitative trends in the relative frequency of PoS were found in the open class lexicon for both language varieties. Contextual needs are examined to see their influence on the relative frequency of word classes. Lastly, we discuss some linguistic phenomena observed in spoken corpora, showing there are still open challenges for PoS-tagging algorithms.

Abstract

This chapter discusses the relevance of Part-of-speech (PoS) tagsets for disambiguating word occurrences and presents the main computational linguistics (CL) standards for annotating word classes in Romance corpora. We highlight the availability of corpora and CL tools for studying the quantitative distribution of PoS in language usage, demonstrating the feasibility of this perspective for Romance languages. It emerged, in both written and spoken variety, that open-class words are similarly distributed in Italian, Spanish, Portuguese, and French, while their quantitative variation is mainly dependent on the linguistic register. Moreover, quantitative trends in the relative frequency of PoS were found in the open class lexicon for both language varieties. Contextual needs are examined to see their influence on the relative frequency of word classes. Lastly, we discuss some linguistic phenomena observed in spoken corpora, showing there are still open challenges for PoS-tagging algorithms.

Downloaded on 9.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/9783110746389-030/html
Scroll to top button