23. Combined statistical and grammatical criteria for the retrieval of phraseological units in an electronic corpus
-
José-Manuel Pazos Bretaña
Abstract
The aim of this study is to refine and optimise the mainly statistical and distributional approach to the automatic extraction of phraseological units (PUs) from text corpora, by introducing minimal linguistic elements (lemmatisation and grammatical tagging). These operations were first tested using the same corpora as in our previous research (Pamies & Pazos 2003 & 2004). This provided us with a new set of results, which we compared with the previous ones.We found that the detection ability had improved substantially, especially when dealing with verb + noun and verb + adjective collocations. This methodology was then applied to a larger corpus. Again, the results were encouraging, with phraseological densities up to 64.5% for the verb + noun category.
Abstract
The aim of this study is to refine and optimise the mainly statistical and distributional approach to the automatic extraction of phraseological units (PUs) from text corpora, by introducing minimal linguistic elements (lemmatisation and grammatical tagging). These operations were first tested using the same corpora as in our previous research (Pamies & Pazos 2003 & 2004). This provided us with a new set of results, which we compared with the previous ones.We found that the detection ability had improved substantially, especially when dealing with verb + noun and verb + adjective collocations. This methodology was then applied to a larger corpus. Again, the results were encouraging, with phraseological densities up to 64.5% for the verb + noun category.
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
- List of contributors xi
- Acknowledgements xiii
- Preface xv
- Introduction: The many faces of phraseology xix
-
Part I. Phraseology: theory, typology and terminology
- 1. Phraseology and linguistic theory: A brief survey 3
- 2. Disentangling the phraseological web 27
- 3. A unified approach to semantic frames and collocational patterns 51
- 4. Processing of idioms and idiom modifications: A view from cognitive linguistics 67
- 5. A very complex criterion of fixedness: Non-compositionality 81
- 6. Reassessing the canon: 'Fixed' phrases in general reference corpora 95
-
Part II. Corpus-based analyses of phraseological units
- 7. Adjective + Noun sequences in attributive or NP-final positions: Observations on lexicalization 111
- 8. Phrasal similes in the BNC 127
- 9. Foot and Mouth: The phrasal patterns of two frequent nouns 143
- 10. The Good Lord and his works: A corpus-driven study of collocational resonance 159
- 11. Fixed expressions, extenders and metonymy in the speech of people with Alzheimer's disease 175
-
Part III. Phraseology across languages and cultures
- 12. Cross-linguistic phraseological studies: An overview 191
- 13. Figurative phraseology and culture 207
- 14. Critical observations on the culture-boundness of phraseology 229
- 15. Phraseology in a European framework: A cross-linguistic and cross-cultural research project on widespread idioms 243
- 16. Free and bound prepositions in a contrastive perspective. The case of with and avec 259
- 17. Contrastive idiom analysis: The case of Japanese and English idioms of anger 275
- 18. Automatic extraction of translation equivalents of phrasal and light verbs in English and Russian 293
-
Part IV. Phraseology in lexicography and natural language processing
- 19. Dictionaries and collocation 313
- 20. Computational phraseology: An overview 337
- 21. A computational lexicography approach to phraseologisms 361
- 22. Extracting specialized collocations using lexical functions 377
- 23. Combined statistical and grammatical criteria for the retrieval of phraseological units in an electronic corpus 391
-
Envoi
- The phrase, the whole phrase and nothing but the phrase 407
- Author index 411
- Subject index 417
Kapitel in diesem Buch
- Prelim pages i
- Table of contents v
- List of contributors xi
- Acknowledgements xiii
- Preface xv
- Introduction: The many faces of phraseology xix
-
Part I. Phraseology: theory, typology and terminology
- 1. Phraseology and linguistic theory: A brief survey 3
- 2. Disentangling the phraseological web 27
- 3. A unified approach to semantic frames and collocational patterns 51
- 4. Processing of idioms and idiom modifications: A view from cognitive linguistics 67
- 5. A very complex criterion of fixedness: Non-compositionality 81
- 6. Reassessing the canon: 'Fixed' phrases in general reference corpora 95
-
Part II. Corpus-based analyses of phraseological units
- 7. Adjective + Noun sequences in attributive or NP-final positions: Observations on lexicalization 111
- 8. Phrasal similes in the BNC 127
- 9. Foot and Mouth: The phrasal patterns of two frequent nouns 143
- 10. The Good Lord and his works: A corpus-driven study of collocational resonance 159
- 11. Fixed expressions, extenders and metonymy in the speech of people with Alzheimer's disease 175
-
Part III. Phraseology across languages and cultures
- 12. Cross-linguistic phraseological studies: An overview 191
- 13. Figurative phraseology and culture 207
- 14. Critical observations on the culture-boundness of phraseology 229
- 15. Phraseology in a European framework: A cross-linguistic and cross-cultural research project on widespread idioms 243
- 16. Free and bound prepositions in a contrastive perspective. The case of with and avec 259
- 17. Contrastive idiom analysis: The case of Japanese and English idioms of anger 275
- 18. Automatic extraction of translation equivalents of phrasal and light verbs in English and Russian 293
-
Part IV. Phraseology in lexicography and natural language processing
- 19. Dictionaries and collocation 313
- 20. Computational phraseology: An overview 337
- 21. A computational lexicography approach to phraseologisms 361
- 22. Extracting specialized collocations using lexical functions 377
- 23. Combined statistical and grammatical criteria for the retrieval of phraseological units in an electronic corpus 391
-
Envoi
- The phrase, the whole phrase and nothing but the phrase 407
- Author index 411
- Subject index 417