Abstract
The present paper is a pilot study introducing a concept of computational biology – the closest string problem (CSP) – into linguistics. In quantitative studies, texts can be represented as sequences, and for any given set of sequences one can solve a CSP. The solution is called a consensus string (CS), which is a sequence that is „as close as possible to all the given strings“. In this article we solve some CSP’s resulting from the study of a Hungarian poem and illustrate possible interpretations.
References
Altmann, Gabriel and Reinhard Köhler. 2015. Forms and Degrees of Repetitions in Texts. Berlin: de Gruyter.10.1515/9783110411942Search in Google Scholar
Chen, Ruina and Gabriel Altmann. 2015. Conceptual inertia in texts. Glottometrics 30. 73–88.Search in Google Scholar
Frumkina, Revekka M. 1962. O zakonach raspredelenija slov i klassov slov (On the laws of word and word class distributions). In Tat’jana Mološnaja (ed.), Strukturno-tiplogičeskie issledovanija, 124–133. Moskva: ANSSSR.Search in Google Scholar
Hosangadi, Sandeep. 2012. Distance measures for sequences. [=arxiv:org/ftp/arxiv/papers/1208/1208.571.pdf] (accessed February 23, 2016).Search in Google Scholar
Hudík, Tomas, Mathej Lexa. 2005. Segmentation of texts and biological sequences into lexical and structural units using a machine learning approach. In Ladislav Dušek, Luděk Hřebíček and Jiri Jarkovsky (eds.), Proceedings of the 1st International Summer School on Computational Biology, 50–56. Brno: Institute of Biostatistics and Analysis, Masaryk University, Brno, Czech Republic.Search in Google Scholar
Köhler, Reinhard. 2008a. Word length in text. A study in the syntagmatic dimension. In Sibyla Mislovičová (ed.), Jazyk a jazykoveda v pohybe, 416–421. Bratislava: VEDA: Vydavatel’stvo SAV.Search in Google Scholar
Köhler, Reinhard. 2008b. Sequences of linguistic quantities. Report on a new unit of investigation. Glottotheory 1(1). 115–119.10.1515/glot-2008-0018Search in Google Scholar
Köhler, Reinhard, Sven Naumann. 2008. Quantitative text analysis using L-, F- and T-segments. In: Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme and Reinhold Decker (eds.), Data Analysis, Machine Learning and Applications, 637–646. Berlin, Heidelberg: Springer.10.1007/978-3-540-78246-9_75Search in Google Scholar
Köhler, Reinhard, Sven Naumann. 2010. A syntagmatic approach to automatic text classification. Statistical properties of F- and L-motifs as text characteristics. In Peter Grzybek, Emmerich Kelih and Jan Mačutek (eds.), Text and Language, 81–89. Wien: Praesens.Search in Google Scholar
Sereno, Marty I. 1991. Four analogies between biological and cultural/linguistic evolution. Journal of Theoretical Biology 151. 457–507.10.1016/S0022-5193(05)80366-2Search in Google Scholar
Skinner, Berres F. 1939. The alliteration in Shakespeare’s sonnets: A study in literary behaviour. Psychological Record 3. 186–192.10.1037/11324-020Search in Google Scholar
Skinner, Berres F. 1941. A quantitative estimate of certain types of sound-patterning in poetry. The American Journal of Psychology 54. 64–79.10.1037/11324-021Search in Google Scholar
Skinner, Berres F. 1957. Verbal Behaviour. Acton: Copley.10.1037/11256-000Search in Google Scholar
Zörnig, Peter. 1984. The distribution of distances between like elements in a sequence, part I in Joachim Boy and Reinhard Köhler (eds.), Glottometrika 6, 1–15; part II in Ursula Rothe (ed.), Glottometrika 7, 1–14. Brockmeyer: Bochum.Search in Google Scholar
Zörnig, Peter. 1987. A theory of distances between like elements in a sequence. In Ingeborg Fickermann (ed.), Glottometrika 8, 1–22. Brockmeyer: Bochum.Search in Google Scholar
Zörnig, Peter. 2010. Statistical simulation and the distribution of distances between identical elements in a random sequence. Computational Statistics & Data Analysis 54, 2317–2327.10.1016/j.csda.2010.01.005Search in Google Scholar
Zörnig, Peter. 2011. Improved optimization modelling for the closest string and related problems. Applied Mathematical Modelling 35. 5609–5617.10.1016/j.apm.2011.05.015Search in Google Scholar
Zörnig, Peter. 2013a. Distances between words of equal length in a text. In Reinhard Köhler and Gabriel Altmann (eds.), Issues in Quantitative Linguistics 3–Dedicated to Karl-Heinz Best on the Occasion of his 70th birthday, 117–129. Lüdenscheid: RAM-Verlag.Search in Google Scholar
Zörnig, Peter. 2013b. A continuous model for the distances between coextensive words in a text. Glottometrics 25. 54–68.Search in Google Scholar
Zörnig, Peter. 2015. Reduced-size integer linear programming models for string selection problems: Application to the farthest string problem. Journal of Computational Biology 22(8). 729–742.10.1089/cmb.2014.0265Search in Google Scholar
Zörnig, Peter, Ian-Iovitz Popescu and Gabriel Altmann. 2015. Statistical approach to measure stylistic centrality. Glottometrics 32. 21–54.Search in Google Scholar
©2016 by De Gruyter Mouton
Articles in the same Issue
- Frontmatter
- The Multiplanar Nature of Frequency
- The increase of English-German hybrid compounds
- On Consensus Strings in Text Analysis
- Classification of Serbian texts based on lexical characteristics and multivariate statistical analysis
- „ich schreib das mal hier rein ähm“Modality-taking – Schreibhinweise in professionellen mündlichen Interaktionssituationen
- Bibliography
- Bibliography – Piotrowski’s law
- Book Reviews
- McLelland, Nicola: German Through English Eyes. A History of Language Teaching and Learning in Britain
- Kluck, Nora: Der Wert der Vagheit
- Ludwig M. Eichinger: Sprachwissenschaft im Fokus. Positionsbestimmungen und Perspektiven. IDS Jahrbuch 2014
- Jakobs Eva-Maria Daniel Perrin: Handbook of Writing and Text Production
- Prädikative Kopula+Infinitiv-Formen und ihre Funktionen im Deutschen. Die Kopula unter Bühlerscher Desambigierung
Articles in the same Issue
- Frontmatter
- The Multiplanar Nature of Frequency
- The increase of English-German hybrid compounds
- On Consensus Strings in Text Analysis
- Classification of Serbian texts based on lexical characteristics and multivariate statistical analysis
- „ich schreib das mal hier rein ähm“Modality-taking – Schreibhinweise in professionellen mündlichen Interaktionssituationen
- Bibliography
- Bibliography – Piotrowski’s law
- Book Reviews
- McLelland, Nicola: German Through English Eyes. A History of Language Teaching and Learning in Britain
- Kluck, Nora: Der Wert der Vagheit
- Ludwig M. Eichinger: Sprachwissenschaft im Fokus. Positionsbestimmungen und Perspektiven. IDS Jahrbuch 2014
- Jakobs Eva-Maria Daniel Perrin: Handbook of Writing and Text Production
- Prädikative Kopula+Infinitiv-Formen und ihre Funktionen im Deutschen. Die Kopula unter Bühlerscher Desambigierung