Quantitative analysis of bibliographic corpora: Statistical features, semantic profiles, word spectra

Adam Pawłowski; Krzysztof Topolski; Elżbieta Herden

Chapter

Quantitative analysis of bibliographic corpora

Statistical features, semantic profiles, word spectra

Adam Pawłowski , Krzysztof Topolski and Elżbieta Herden

Published by

John Benjamins Publishing Company

View more publications by John Benjamins Publishing Company

This chapter is in the book Language and Text

Abstract

The subject of this chapter is bibliographic corpus analysis, with data from the Polish national bibliography from the period 1801–2019. The research allowed us to discover and compare quantitative characteristics of the bibliographic corpus and of the reference corpus of general language. It was shown that the two corpora differ significantly. In particular, differences in the share of particular parts of speech and of the frequency distribution of lexemes were demonstrated. The statistical distributions of word spectra were also studied. The best fit was obtained for generalized inverse Gauss-Poisson and Zipf-Mandelbrot distributions. The analysis of parameters of both distributions for bibliographic and reference corpora also revealed differences between them. The best perspective for future research on bibliographic corpora is, apart from quantitative linguistics, semantic analysis and text-mining.

You are currently not able to access this content.

Abstract

You are currently not able to access this content.

Chapters in this book

https://doi.org/10.1075/cilt.356.16paw

Quantitative analysis of bibliographic corpora

Abstract

Chapter PDF View

Abstract

Chapters in this book

Chapters in this book

Chapters in this book