Abstract
The object case inflection in Koalib (Niger-Congo) represents complex patterns that involve phoneme position, syllable structure, and tonal pattern. Few attempts have been made with qualitative and quantitative approaches to identify the rules of the object case paradigms in Koalib. In the current study, information on phonemes, tones, and syllables are automatically extracted from a Koalib sample of 2,677 lexemes. The data is then fed to decision-tree-based classifiers to predict the object case paradigms and extract the interactive patterns between the variables. The results improve the predicting accuracy of existing studies and identify the case paradigms predicted by linguistic hypotheses. New case paradigms are also found by the computational classifiers and explained from a linguistic perspective. Our work demonstrates that the combination of linguistic theoretical knowledge with machine learning techniques can become one of the methodological approaches for linguistic analyses.
Acknowledgements
The authors are thankful for the comments of the editors and the reviewers, which helped to significantly improve the content of the paper. They are also grateful to Siddig Ali Karmal Koko for sharing with them his knowledge of and expertise on his mother tongue, Koalib.
-
Research funding: The first author is thankful for the support of the following grants: (i) PICS franco-soudanais Les langues du Soudan: à la croisée des aires et types linguistiques [The languages of the Sudan: a typological and areal crossroad]; (ii) PHC-Napata Kin terms and anthroponyms in the Nuba Mountain languages; (iii) Labex EFL, Strand 3, Workpackage RT1 – Language genealogy (Niger-Congo, Austronesian): Reconstruction, internal classification and grammatical description in the world’s two biggest phyla: Niger-Congo and Austronesian (ANR-10-LABX-0083). This last grant contributes to the IdEx Université de Paris – ANR-18-IDEX-0001. The second author is also thankful for the support of grants from the Université de Lyon (ANR-10-LABX-0081, NSCO ED 476), the IDEXLYON Fellowship (2018–2021, 16-IDEX-0005), and the French National Research Agency (ANR-11-IDEX-0007, ANR-20-CE27-0021).
References
1967. T̠ikitad̠iza t̠iaŋ [The New Testament in Koalib]. Khartoum: The Bible Society of the Sudan.Search in Google Scholar
1993. Wa@d̠ wiyaŋ [The New Testament in Koalib]. Khartoum: The Bible Society in Sudan.Search in Google Scholar
Abdalla, Jummize & Abdalla Komi. 2000. Yəwə na Nyaamin Nyəthi Kithilə Kir 2000 [A calendar for the year 2000, lit. ‘Months and days of the year that is 2000’]. Khartoum: Khartoum Workshop Programme.Search in Google Scholar
Abdalla Omer, Jummeiz, Abdalla Komi Kodi & Ibrahim El-Haimer. 1995. Ŋwɔɔli Ŋwiyaŋ Kandisa-Gi Kət̠hi Kouliib [A new Koalib alphabet]. Khartoum: Kouliib Language Development Committee.Search in Google Scholar
Abdalla Omer, Jummeiz, Shanan Suliman Kodi & Abdalla Komi Kodi. 1998. Riŋerɔŋ Rəthi ŋwɔɔli ŋwiyaŋ kandisa-gi Kəthi kwəliib [A new Koalib alphabet illustrated by short stories]. Khartoum: Kwəliib Language Development Committee.Search in Google Scholar
Aharoni, Roee & Yoav Goldberg. 2017. Morphological inflection Generation with Hard Monotonic Attention. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2004–2015. Vancouver, Canada: Association for Computational Linguistics.10.18653/v1/P17-1183Search in Google Scholar
Ahlberg, Malin, Markus Forsberg & Mans Hulden. 2015. Paradigm classification in supervised learning of morphology. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1024–1029. Denver, Colorado: Association for Computational Linguistics.10.3115/v1/N15-1107Search in Google Scholar
Boychev, Georgi. 2013. Case inflection in Koalib: Discovering the rules. University of Lorraine MA thesis.Search in Google Scholar
Breiman, Leo, Jerome Friedman, Charles J. Stone & Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.Search in Google Scholar
Corbett, Greville G. 1991. Gender. Cambridge: Cambridge University Press.Search in Google Scholar
Corbett, Greville G. 2013. Systems of gender assignment. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar
Cotterell, Ryan, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sebastian Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner & Mans Hulden. 2018. The CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection. In Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, 1–27. Brussels: Association for Computational Linguistics.10.18653/v1/K17-2001Search in Google Scholar
Dimmendaal, Gerrit J. 2015. Accretion zones and the absence of language union. In Gerrit J. Dimmendaal (ed.), The leopard’s spots, 25–63. Leiden: Brill.10.1163/9789004224148_004Search in Google Scholar
Dowle, Matt & Arun Srinivasan. 2019. data.table: Extension of data.frame. R package version 1.12.2. Available at: https://CRAN.R-project.org/package=data.table.Search in Google Scholar
Eddelbuettel, Dirk. 2017. random: True random numbers using random.org. R package version 0.2.6. Available at: https://CRAN.R-project.org/package=random.Search in Google Scholar
Gower, John C. 1971. A General Coefficient of Similarity and Some of its Properties. Biometrics 27(4). 857–871.10.2307/2528823Search in Google Scholar
Hammarström, Harald. 2013. Noun class parallels in Kordofanian and Niger-Congo: Evidence of genealogical inheritance? In Thilo Schadeberg & Roger Blench (eds.), Nuba mountain language studies, 549–569. Cologne: Rüdiger Köppe.Search in Google Scholar
Hammarström, Harald, Robert Forkel & Martin Haspelmath. 2019. Glottolog 4.1. Jena: Max Planck Institute for the Science of Human History.Search in Google Scholar
Kann, Katharina & Hinrich Schütze. 2016. MED: The LMU System for the SIGMORPHON 2016 Shared Task on Morphological Reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 62–70. Berlin, Germany: Association for Computational Linguistics.10.18653/v1/W16-2010Search in Google Scholar
Karshola Omar, Hussein, Hassan Komi & Susan Estifanus. 2000. Riŋeroŋ Kandsagi ked̠i Kawaliib [Koalib stories]. Khartoum: Kwaliib Language Committee.Search in Google Scholar
Kassambara, Alboukadel & Fabian Mundt. 2020. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.7. Available at: https://CRAN.R-project.org/package=factoextra.Search in Google Scholar
Kaufman, Leonard & Peter Rousseuw. 1990. Finding groups in data. New York: Wiley.10.1002/9780470316801Search in Google Scholar
Kodi, Ismail. 2000. Tijaɽina [Traditional Celebration]. Khartoum: Kwaliib Language Committee.Search in Google Scholar
Krijthe, Jesse. 2018. Rtsne: T-distributed stochastic neighbor embedding using a Barnes-Hut implementation. R package version 0.15. Available at: https://github.com/jkrijthe/Rtsne.Search in Google Scholar
Kuhn, Matt & Davis Vaughan. 2019. parsnip: A common API to modeling and analysis functions. R package version 0.0.3.1. Available at: https://CRAN.R-project.org/package=parsnip.10.32614/CRAN.package.parsnipSearch in Google Scholar
Kuhn, Max, Fanny Chow & Hadley Wickham. 2019. rsample: General resampling infrastructure. R package version 0.0.5. Available at: https://CRAN.R-project.org/package=rsample.Search in Google Scholar
Kuhn, Max & Hadley Wickham. 2019. recipes: Preprocessing tools to create design matrices. R package version 0.1.6. Available at: https://CRAN.R-project.org/package=recipes.Search in Google Scholar
Liaw, Andy & Matthew Wiener. 2002. Classification and regression by randomForest. R News 2(3). 18–22.Search in Google Scholar
Maechler, Martin, Peter Rousseuw, Anja Struyf, MiaHubert & KurtHornik. 2019. cluster: Cluster analysis basics and extensions. R package version 2.1.0.Search in Google Scholar
Makarov, Peter & Simon Clematide. 2018. UZH at CoNLL–SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection. In Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, 69–75. Brussels: Association for Computational Linguistics.10.18653/v1/K17-2004Search in Google Scholar
Milborrow, Stephen. 2019. rpart.plot: Plot rpart models: An enhanced version of plot.rpart. R package version 3.0.8. Available at: https://CRAN.R-project.org/package=rpart.plot.Search in Google Scholar
Paluszynska, Aleksandra & Przemyslaw Biecek. 2017. randomForestExplainer: Explaining and visualizing random forests in terms of variable importance. R package version 0.9. Available at: https://CRAN.R-project.org/package=randomForestExplainer.10.32614/CRAN.package.randomForestExplainerSearch in Google Scholar
Perry, Patrick. 2017. corpus: Text corpus analysis. R package version 0.10.0. Available at: https://CRAN.R-project.org/package=corpus.Search in Google Scholar
Quint, Nicolas. 2006. Phonologie de la langue koalibe, Dialecte réré (Soudan). Paris: L’Harmattan.Search in Google Scholar
Quint, Nicolas. 2010a. Benefactive and malefactive verb extensions in the Koalib very system. In Fernando Zúñiga & Seppo Kittilä (eds.), Typological Studies in Language, Vol. 92, 295–316. Amsterdam: John Benjamins Publishing Company.10.1075/tsl.92.12quiSearch in Google Scholar
Quint, Nicolas. 2010b. Case in Koalib (a Kordofanian language) and related Heibanian languages. In The 40th Colloquium on African Languages and Linguistics. Leiden: Leiden University.Search in Google Scholar
Quint, Nicolas. 2013. Integration of borrowed nouns in Koalib, a noun class language. In Thilo Schadeberg & Roger Blench (eds.), Nuba mountain language studies, 115–134. Cologne: Rüdiger Köppe.Search in Google Scholar
Quint, Nicolas. 2018. An assessment of the Arabic lexical contribution to contemporary spoken Koalib. In Stefano Manfredi & Mauro Tosco (eds.), Arabic in contact, 189–205. Amsterdam: John Benjamins.10.1075/sal.6.10quiSearch in Google Scholar
Quint, Nicolas. 2020. Kordofanian. In Rainer Vossen (ed.), The Oxford handbook of African languages, 239–268. Oxford: Oxford University Press.10.1093/oxfordhb/9780199609895.013.56Search in Google Scholar
Quint, Nicolas. 2022. Classes nominales dans deux langues Niger-Congo: le baïnouck djifanghorois (atlantique) et le koalib (kordofanien) [Nominal classes in two Niger-Congo languages: baïnouck and Koalib]. Faits de Langue 53. 1–29.10.1163/19589514-05202010Search in Google Scholar
Quint, Nicolas & Siddig Ali Karmal Kokko. 2009. The phonology of Koalib: a Kordofanian language of the Nuba Mountains (Sudan) (Grammatical analyses of African languages; Grammatische Analysen afrikanischer Sprachen v. 36 = Bd. 36). Cologne: Rüdiger Köppe. OCLC: ocn517262760.Search in Google Scholar
Quint, Nicolas & Siddig Ali Karmal Kokko. 2022. Koalib-French dictionary forthcoming. Paris: L’Harmattan.Search in Google Scholar
R-Core-Team. 2021. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.Search in Google Scholar
Schadeberg, Thilo. 1981. A survey of Kordofanian Vol 1: The Heiban group. Hamburg: Helmut Buske.Search in Google Scholar
Sorokin, Alexey. 2016. Using longest common subsequence and character models to predict word forms. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, 54–61. Berlin, Germany: Association for Computational Linguistics.10.18653/v1/W16-2009Search in Google Scholar
Suliman, Istifanus. 2000. Riŋerɔŋw [Stories]. Khartoum: Kwaliib Language Committee.Search in Google Scholar
Tagliamonte, Sali A. & Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language Variation and Change 24. 135–178. https://doi.org/10.18653/v1/w16-2009.Search in Google Scholar
Therneau, Terry & Beth Atkinson. 2019. rpart: Recursive partitioning and regression trees. R package version 4.1-15. Available at: https://CRAN.R-project.org/package=rpart.Search in Google Scholar
Ting, Kai Ming. 2010. Precision and Recall. In Claude Sammut & Geoffrey I. Webb (eds.), Encyclopedia of Machine Learning, 781. Boston, MA: Springer US.10.1007/978-0-387-30164-8_652Search in Google Scholar
Wickham, Hadley. 2017. tidyverse: Easily install and load the Tidyverse. R package version 1.2.1. Available at: https://CRAN.R-project.org/package=tidyverse.10.32614/CRAN.package.tidyverseSearch in Google Scholar
Wickham, Hadley. 2019. stringr: Simple, consistent wrappers for common string operations. R package version 1.4.0. Available at: https://CRAN.R-project.org/package=stringr.Search in Google Scholar
Wickham, Hadley, Jim Hester & Romain Francois. 2018. readr: Read rectangular text data. R package version 1.3.1. Available at: https://CRAN.R-project.org/package=readr.Search in Google Scholar
© 2021 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Transitivity on a continuum: the transitivity index as a predictor of Spanish causatives
- Modelling incipient probabilistic grammar change in real time: the grammaticalisation of possessive pronouns in European Spanish locative adverbial constructions
- The theme-recipient alternation in Chinese: tracking syntactic variation across seven centuries
- Inferring case paradigms in Koalib with computational classifiers
- The distribution of /w/ and /ʍ/ in Scottish Standard English
- Towards a dynamic behavioral profile of the Mandarin Chinese temperature term re: a diachronic semasiological approach
Articles in the same Issue
- Frontmatter
- Transitivity on a continuum: the transitivity index as a predictor of Spanish causatives
- Modelling incipient probabilistic grammar change in real time: the grammaticalisation of possessive pronouns in European Spanish locative adverbial constructions
- The theme-recipient alternation in Chinese: tracking syntactic variation across seven centuries
- Inferring case paradigms in Koalib with computational classifiers
- The distribution of /w/ and /ʍ/ in Scottish Standard English
- Towards a dynamic behavioral profile of the Mandarin Chinese temperature term re: a diachronic semasiological approach