Quantitative Linguistics [QL]
-
Edited by:
Reinhard Köhler
, George Mikros and Arjuna Tuzzi
Founding Editor: Gabriel Altmann
The series Quantitative Linguistics publishes books on all aspects of quantitative methods and models in linguistics, text analysis and related research fields. Specifically, the scope of the series covers the whole spectrum of theoretical and empirical research, ultimately striving for an exact mathematical formulation and empirical testing of hypotheses: observation and description of linguistic data, application of methods and models, discussion of methodological and epistemological issues, modelling of language and text phenomena.
Topics
This book explores the syntactic variation across various styles and genres in contemporary written Czech using quantitative methods. The research is based on the large, balanced corpus, SYN2020, which is part of the Czech National Corpus. The aim is to demonstrate the potential of corpus-based quantitative analysis as a complement to the qualitative methods that have traditionally dominated research in stylistics. Although the empirical focus is on Czech, the methodologies presented are intended to serve as a model for quantitative syntactic stylistics in other languages as well.
Have you ever wondered how the principles behind Shannon's groundbreaking Information Theory can be interwoven with the intricate fabric of linguistic communication? This book takes you on a fascinating journey, offering insights into how humans process and comprehend language. By applying Information Theory to the realm of natural language semantics, it unravels the connection between regularities in linguistic messages and the cognitive intricacies of language processing. Highlighting the intersections of information theory with linguistics, philosophy, cognitive psychology, and computer science, this book serves as an inspiration for anyone seeking to understand the predictive capabilities of Information Theory in modeling human communication. It elaborates on the seminal works from giants in the field like Dretske, Hale, and Zipf, exploring concepts like surprisal theory and the principle of least effort.
With its empirical approach, this book not only discusses the theoretical aspects but also ventures into the application of Shannon's Information Theory in real-world language scenarios, strengthened by advanced statistical methods and machine learning. It touches upon challenging areas such as the distinction between mathematical and semantic information, the concept of information in linguistic utterances, and the intricate play between truth, context, and meaning.
Whether you are a linguist, a cognitive psychologist, a philosopher, or simply an enthusiast eager to dive deep into the world where language meets information, this book promises a thought-provoking journey.
Quantitative linguistic research reveals fascinating patterns in contemporary and historical linguistic data. The book offers insights from a broad range of languages, including Japanese, Slovene and Catalan. The reader is convinced that statistic empirical analysis – and increasingly also machine learning and big data – should be an essential part of any serious linguistic enquiry.
This volume contains the most important theoretical and methodological works of Gabriel Altmann (1931-2019). He is the founder of a specific school of quantitative linguistics, which focuses on the statistical analysis and interrelationship of linguistic features and characteristics. His approach concentrates on the construction of a general theory of linguistics. The theory is based on the relevance of linguistic laws (Zipf's, Menzerath's and Piotrowski's) and concepts of language as a self-regulating system. In contrast to approaches where quantitative methods are used as standard methodological tools, Altmann favours a "holistic" and epistemological view of problems of quantification of linguistic and textual phenomena.
Dependency analysis is increasingly used in computational linguistics and cognitive science. Surprisingly, compared with studies based on phrase structures, quantitative methods and dependency structure are rarely integrated in research.This is the first book that collects original contributions which quantitatively analyze dependency structures across different languages and text genres.
The edited volume Motifs in Language and Text is the first collection of original research in the area of the quantitative analysis of motifs. It hosts a collection of contributions that give insight to linguistic motifs theoretically across different languages, text genres, and structural levels, such as lexical, syntactic, semantic etc., and also to the tentative efforts upon the practical applications of the linguistic motifs.
.
Quantitative Linguistics is a rapidly developing discipline covering more and more areas of linguistic and textological research. The book represents an overview of the state of the art in Quantitative Linguistics, its scope and reach. Some of the topics: linguistic laws, frequency analyses, synergetic models of language, networks, part-of-speech systems, authorship attribution, polyfunctionality and polysemy, and opinion target identification.
The edited volume Sequences in Language and Text is the first collection of original research in the area of the quantitative analysis of sequentially organized linguistic data.
Linguistic sequences are extremely useful textual structures in almost all areas of Language Technology. Character and word n-grams are by far the most successful features in text classification tasks such as authorship identification, text categorization, genre classification, sentiment analysis etc. Furthermore character linguistic sequences are the basis for linguistic modeling and subsequent applications such as speech recognition, language identification etc.
In addition to the above language technology oriented research, the present volume aims to give insight to the theoretical value of linguistic sequences. Sequences in texts can be produced by a number of different factors, either external to the linguistic system or by its own grammatical structure. This volume hosts contributions which will analyze linguistic sequences using quantitative methods under the synergetic theoretical framework that can explain their role in the linguistic system.
The present volume presents objective methods to detect and analyse various forms of repetitions. Repetition of textual elements is more than a superficial phenomenon. It may even be considered as constitutive for units and relations in a text: on a primary level when no other way exists to establish a unit – as in a musical composition (a motif can be recognised as such only after at least one repetition) – and on a secondary, artistic level, where repetition is a consequence of the transfer of the equivalence principle from the paradigmatic axis to the syntagmatic one as showed by R. Jakobson.
The analysis of repetitive elements and structures in texts with objective mathematical means can serve several practical and theoretical purposes, among them:
Characterisation of texts by means of parameters (measures, indicators) as taken from established mathematical statistics or specifically constructed ones in individual cases.
Comparison of texts on the basis of their quantitative characteristics and classification of the texts by the results.
Research for the laws of text, which control the mechanisms connected to text creation. As a remote aim, the construction of a theory of text consisting of a system of text laws. The final attempt of every possible quantitative text analysis is the construction of a text theory. The book illustrates this on examples of such laws and corresponding empirical tests.
The book presents methods for the objective analysis of poetic language. Common objects of literary studies such as rhythm, semantic explications, interpretation and personal impressions are avoided. Only those properties of poetic texts are taken into account that could be quantified. The major chapters contain the analysis of phonic phenomena (frequency, euphony, assonance, alliteration, aggregation, rhyme), word properties (aspects of frequency, length, richness, word classes, sequences of word properties, characterisations). The synergetic control cycle is the result of the study of mutual links between properties. For all methods both statistical tests (evaluation, comparison), theoretical derivations (models), and examples are presented.
The book is dedicated to the work of the famous Romanian poet Mihai Eminescu whose complete work was analysed, which made detailed illustrations of the method possible. The methods can be used mutatis mutandis for any language and text. It is the first comprehensive quantitative analysis of a poetic work.
The standard scientific methodology in linguistics is empirical testing of falsifiable hypotheses. As such the process of hypothesis generation is central, and involves formulation of a research question about a domain of interest and statement of a hypothesis relative to it. In corpus linguistics the domain is text, and generation involves abstraction of data from text, data analysis, and formulation of a hypothesis based on inference from the results. Traditionally this process has been paper-based, but the advent of electronic text has increasingly rendered it obsolete both because the size of digital corpora is now at or beyond the limit of what can efficiently be used in the traditional way, and because the complexity of data abstracted from them can be impenetrable to understanding. Linguists are increasingly turning to mathematical and statistical computational methods for help, and cluster analysis is such a method. It is used across the sciences for hypothesis generation by identification of structure in data which are too large or complex, or both, to be interpretable by direct inspection. This book aims to show how cluster analysis can be used for hypothesis generation in corpus linguistics, thereby contributing to a quantitative empirical methodology for the discipline.
This is the first book which brings together the fields of theoretical and empirical studies in syntax on the one hand and the methodology of quantitative linguistics on the other hand. The author provides the theoretical background for this enterprise on the basis of the philosophy of science and of linguistic considerations including a discussion of Chomsky’s attitude against the application of statistical methods to syntactic phenomena. He gives a short introduction into the aims and methods of the quantitative approach to linguistics in general and to syntax in particular. The following chapters inform the reader about the measurement of syntactic properties, possibilities to acquire empirical data from syntactically annotated text corpora and the most common mathematical models and methods for the analysis of syntactic and syntagmatic material. Then, a number of prominent approaches and hypotheses about interrelations between properties of syntactic constructions are presented and evaluated on material from various languages and text kinds. Finally, the theory of synergetic linguistics and its application to syntax is introduced including the integration of such famous hypotheses as Yngve’s depth hypothesis and Hawkins’s "Early immediate constituent" principle.
The book concludes with a number of perspectives with respect to follow-up studies and extensions to the presented models with interfaces to neighbouring disciplines.
The present book finds and collects absolutely new aspects of word frequency. First, eminent characteristics (such as the h-point, first used in scientometrics, the k-, m-, and n-points) are introduced – it can be shown that the geometry of word frequency is fundamentally based on them. Furthermore, various indicators of text properties are proposed for the first time, such as thematic concentration, autosemantic text compactness, autosemantic density, etc. In detail, the autosemantic structure of a given text is evaluated by means of a graph representation and its properties (according to a problem from network research). Special emphasis is given to the part-of-speech differentiation, which plays a significant role in stylistics.
On the basis of a general theory, which has been developed especially for linguistic research, problems of the frequency structure of texts with respect to word occurrence are investigated and discussed in detail. Methodologically, specific reference is made to synergetic linguistics, including some exemplary analyses, showing that there are points of contact with this field. A separate chapter is dedicated to within-sentence word position; this issue considers grammar as well as language genesis; another chapter is dedicated to the type-token ratio, discussing all established methods and their relevance for word frequency analysis.
All methods presented in the book are statistically tested; to this end, some new tests have been developed. All procedures and calculations are conducted for 20 languages, ranging from Polynesia, Indonesia, India, and Europe to a North American Indian language. The broad distribution of the data and texts from all genres allows generalizations with respect to language typology.
This volume presents 12 papers on a new approach to the analysis of writing systems. For the first time, quantitative methods are introduced into this area of research in a systematic way. The individual contributions give an overview about quantitative properties of symbols and of writing systems, introduce methods of analysis, study individual writing systems as used for different languages, set up an explanatory model of phenomena connected to script development/evolution, and give a perspective to a general theory of writing systems.
The collection contains more than 60 original papers and reflects current research topics in linguistics and text analysis. Most of the papers present recent results of empirical quantitative investigations; others focus on methodological issues, whereas some of them are of a more theoretical, systems-theoretical/semiotic character. Finally, a number of contributions form typical integrative deductive-inductive studies. The volume is a valuable source of information about the current state-of-the-art in quantitative linguistic research, presented by renowned representatives of the field.