Skip to main content

Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Visit our Partner Page See all our books

Chapter

Collocations and statistical analysis of n-grams

Multiword expressions in newspaper text

and

Abstract

Multiword expressions (MWEs) are words that co-occur so often that they are perceived as a linguistic unit. Since MWEs pervade natural language, their identification is pertinent for a range of tasks within lexicography, terminology and language technology. We apply various statistical association measures (AMs) to word sequences from the Norwegian Newspaper Corpus (NNC) in order to rank two-and three-word sequences (bigrams and trigrams) in terms of their tendency to co-occur. The results show that some statistical measures favour relatively frequent MWEs (e.g. i motsetning til ‘as opposed to’), whereas other measures favour relatively low-frequent units, which typically comprise loan words (de facto), technical terms (notaries publicus) and phrasal anglicisms (practical jokes; cf. G. Andersen this volume). On this basis we evaluate the relevance of each of these measures for lexicography, terminology and language technology purposes.

You are currently not able to access this content.

Abstract

Multiword expressions (MWEs) are words that co-occur so often that they are perceived as a linguistic unit. Since MWEs pervade natural language, their identification is pertinent for a range of tasks within lexicography, terminology and language technology. We apply various statistical association measures (AMs) to word sequences from the Norwegian Newspaper Corpus (NNC) in order to rank two-and three-word sequences (bigrams and trigrams) in terms of their tendency to co-occur. The results show that some statistical measures favour relatively frequent MWEs (e.g. i motsetning til ‘as opposed to’), whereas other measures favour relatively low-frequent units, which typically comprise loan words (de facto), technical terms (notaries publicus) and phrasal anglicisms (practical jokes; cf. G. Andersen this volume). On this basis we evaluate the relevance of each of these measures for lexicography, terminology and language technology purposes.

You are currently not able to access this content.

Chapters in this book

Prelim pages i
Table of contents v
Building a large corpus based on newspapers from the web 1
Part I. Exploiting the web as a corpus – Methods and tools
Corpuscle – a new corpus management platform for annotated corpora 31
OBT+stat 51
Exploring corpora through syntactic annotation 67
Collocations and statistical analysis of n-grams 79
Automatic topic classification of a large newspaper corpus 111
A data-driven approach to anglicism identification in Norwegian 131
Part II. Corpus-based case studies
A corpus-based study of the adaptation of English import words in Norwegian 157
Norm clusters in written Norwegian 193
Lexical neography in modern Norwegian 221
Ash compound frenzy 241
Financial jargon in a general newspaper corpus 257
Metonymic extension and vagueness 285
Spatial metaphors in present-day Norwegian newspaper language 307
Doing historical linguistics using contemporary data 331
Name index 351
Subject index 353

Exploring Newspaper Language

This chapter is in the book Exploring Newspaper Language

https://doi.org/10.1075/scl.49.05lys

Chapters in this book

Prelim pages i
Table of contents v
Building a large corpus based on newspapers from the web 1
Part I. Exploiting the web as a corpus – Methods and tools
Corpuscle – a new corpus management platform for annotated corpora 31
OBT+stat 51
Exploring corpora through syntactic annotation 67
Collocations and statistical analysis of n-grams 79
Automatic topic classification of a large newspaper corpus 111
A data-driven approach to anglicism identification in Norwegian 131
Part II. Corpus-based case studies
A corpus-based study of the adaptation of English import words in Norwegian 157
Norm clusters in written Norwegian 193
Lexical neography in modern Norwegian 221
Ash compound frenzy 241
Financial jargon in a general newspaper corpus 257
Metonymic extension and vagueness 285
Spatial metaphors in present-day Norwegian newspaper language 307
Doing historical linguistics using contemporary data 331
Name index 351
Subject index 353

Downloaded on 2.5.2026 from https://www.degruyterbrill.com/document/doi/10.1075/scl.49.05lys/html?lang=en