Home Linguistics & Semiotics Mathematical modeling of the frequencies of letters for their occurrence in corpora, words (types) and in the initial positions of words of corpora
Article
Licensed
Unlicensed Requires Authentication

Mathematical modeling of the frequencies of letters for their occurrence in corpora, words (types) and in the initial positions of words of corpora

  • Hemlata Pande EMAIL logo
Published/Copyright: July 27, 2020
Become an author with De Gruyter Brill

Abstract

In the present paper an attempt has been made to determine the mathematical model for the frequencies of occurrence of letters in the corpora, in the word types of the corpora and in the initial positions of words of the corpora while both the word tokens and word types have been taken into account. In the current study corpora written in American English have been used by the selection of the entities from ‘The Open American National Corpus (OANC)’.


Corresponding author: Hemlata Pande, Department of Mathematics, Govt. P. G. College Bageshwar, Bageshwar, Uttarakhand, India, E-mail:

References

Bourne, Charles P. & Donald F. Ford. 1961. A study of the statistics of letters in English words. Information and Control 4(1). 48–67. https://doi.org/10.1016/s0019-9958(61)80036-3.Search in Google Scholar

Broerse, Aleid C. & E. J. Zwaan. 1966. The information value of initial letters in the identification of words. Journal of Verbal Learning and Verbal Behavior 5. 441–446. https://doi.org/10.1016/s0022-5371(66)80058-0.Search in Google Scholar

Eftekhari, Ali. 2006. Fractal geometry of texts: An initial application to the works of Shakespeare. Journal of Quantitative Linguistics 13(2–3). 177–193. https://doi.org/10.1080/09296170600850106.Search in Google Scholar

Grzybek, Peter & Emmerich Kelih. 2005. Towards a general model of grapheme frequencies in Slavic languages. In Radovan Garabík (ed.), Computer treatment of Slavic and East European languages, 73–87. Bratislava: Veda.Search in Google Scholar

Grzybek, Peter & Milan Rusko. 2009. Letter, grapheme and (allo-)phone frequencies: The case of Slovak. Glottotheory 2(1). 30–48. https://doi.org/10.1515/glot-2009-0004.Search in Google Scholar

Grzybek, Peter, Emmerich Kelih & Ernst Stadlober. 2009. Slavic letter frequencies: A common discrete model and regular parameter behavior? In Reinhard Köhler (ed.), Issues in quantitative linguistics. 17–33. Lüdenscheid: RAM-Verlag.Search in Google Scholar

Grzybek, Peter. 2005. A study on Russian graphemes. http://www.peter-grzybek.eu/science/publications/2005/grzybek_2005_russian_graphemes.pdf. (accessed 28 July 2017).Search in Google Scholar

Grzybek, Peter. 2007. On the systematic and system-based study of grapheme frequencies: A re-analysis of German letter frequencies. Glottometrics 15. 82–91. https://pdfs.semanticscholar.org/124d/a239d6b1f2c424fb518e5d4b252704892c56.pdf.Search in Google Scholar

Li, Wentian & Pedro Miramontes. 2011. Fitting ranked English and Spanish letter frequency distribution in U.S. and Mexican presidential speeches. Journal of Quantitative Linguistics 18(4). 359–380. https://doi.org/10.1080/09296174.2011.608606.Search in Google Scholar

Mačutek, JÁN. 2008. A generalization of the geometric distribution and its application in quantitative linguistics. Romanian Reports in Physics. 60(3). 501–509. http://www.rrp.infim.ro/2008_60_3/09-501-509.pdf.Search in Google Scholar

Martindale, Colin, S. M. Gusein-Zade, Dean McKenzie & Mark Yu Borodovsky. 1996. Comparison of equations describing the ranked frequency distributions of graphemes and phonemes. Journal of Quantitative Linguistics. 3(2). 106–112. https://doi.org/10.1080/09296179608599620.Search in Google Scholar

Mikros, George, Nick Hatzigeorgiu & George Carayannis. 2005. Basic quantitative characteristics of the modern Greek language using the Hellenic National corpus. Journal of Quantitative Linguistics. 12(2–3). 167–184. https://doi.org/10.1080/09296170500172478.Search in Google Scholar

Ohlman, Herbert M. 1959. Subject-word letter frequencies with applications to superimposed coding. In Proceedings of the international conference on scientific information. Available at: http://books.nap.edu/openbook.php?record_id=10866&page=903.Search in Google Scholar

Pande, Hemlata & Hoshiyar S. Dhami. 2009. Generation of a model for grapheme frequencies and its refinement and validation by group theoretic aspects, Journal of Quantitative Linguistics 16(4). 307–326. https://doi.org/10.1080/09296170903211485.Search in Google Scholar

Pande, Hemlata & Hoshiyar S. Dhami. 2010. Mathematical modelling of occurrence of letters and word’s initials in texts of Hindi language. SKASE Journal of Theoretical Linguistics 7(2). 19–38. https://doi.org/10.1080/09296174.2012.754596.Search in Google Scholar

Popescu, Ioan-Iovitz, Ján Mačutek & Gabriel Altmann. 2009. Aspects of word frequencies. Studies in Quantitative Linguistics 3. http://library2.nipne.ro/sites/default/files/iovitzubook2-aspects_of_word_frequencies-july_2009.pdf.Search in Google Scholar

Riyal, Manoj Kumar, Nikhil Kumar Rajput, Vinod Prasad Khanduri & Laxmi Rawat. 2016. Rank-frequency analysis of characters in Garhwali text: Emergence of Zipf’s law. Current Science 110(3). 429–434. https://doi.org/10.18520/cs/v110/i3/429-443.Search in Google Scholar

Rubin, David C. 1978. Word—initial and word—final ngram frequencies. Journal of Literacy Research 10(2). 171–183. https://doi.org/10.1080/10862967809547266.Search in Google Scholar

Solso, Robert L., Connie Juel & David C. Rubin. 1982. The frequency and versatility of initial and terminal letters in English words. Journal of Verbal Learning and Verbal Behavior 21. 220–235. https://doi.org/10.1016/s0022-5371(82)90581-3.Search in Google Scholar

Wilson, Andrew. 2013. Probability distributions of grapheme frequencies in Irish and Manx. Journal of Quantitative Linguistics 20(3). 169177. https://doi.org/10.1080/09296174.2013.799919.Search in Google Scholar

Wimmer, Gejza & Gabriel Altmann. 1999. Thesaurus of univariate discrete probability distributions. Essen: Stamm.Search in Google Scholar

Ycart, Bernard 2012. Letter counting: A stem cell for cryptology, quantitative linguistics, and statistics. https://arxiv.org/ftp/arxiv/papers/1211/1211.6847.pdf. (accessed 16 March 2018).Search in Google Scholar

Published Online: 2020-07-27

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 22.1.2026 from https://www.degruyterbrill.com/document/doi/10.1515/glot-2020-2010/html?lang=en
Scroll to top button