Preliminaries to Finnish Word Prediction
-
P. Väyrynen
Abstract
Commercial word prediction software is thus far mainly available for uninflected languages such as English. In the present study, we investigate word prediction in highly inflected languages, using Finnish as an example. Despite its high degree of case inflection, about one third of word tokens in a Finnish text appear in their uninflected base form. As a result, simple prediction techniques such as word completion, originally developed for English, can be used for investigating characteristics of word prediction in inflected languages. Our preliminary results show that about 45% of characters can roughly be saved in Finnish word prediction in general for uninflected and inflected tokens. The most interesting result of our prediction experiments is, however, showing the distribution of character savings to the most common cases and their cumulative effect on the total percentage of character savings that may be achievable in Finnish word prediction. The major conclusions of the study are that word prediction in a highly inflected language such as Finnish is feasible provided that the case form used with a word appearing in a given context of use can be predicted correctly, at least in some cases, and the cognitive load of the resulting prediction system for the user is not too high when the prediction of the case form fails.
© 2014 Akademie Verlag GmbH, Markgrafenstr. 12-14, 10969 Berlin.
Articles in the same Issue
- Titelei
- Gesetzmäßigkeiten Der Lautdauer
- Menzerath–Altmann Law for Syntactic Structures in Ukrainian
- Using Finite-state Automata for Text Lexicons Building
- Word Length and Word Frequency in Slovak
- Types of Interaction between Meter and Language in Relation to the Spread of the Syllabo-tonic in European Verse from the End of the 16th Century to the Mid 18th Century
- Diskretes Modell für die Polysemie: Neue empirische Evidenz
- Prolegomena to the History of Corpus and Quantitative Linguistics. Greek Antiquity.
- On Word Length: The Influence of a „Boundary“ Condition on the Modelling
- Preliminaries to Finnish Word Prediction
- The Combinatorics of Word Order in Flexible Parts-of-speech Systems
- Gabriel Altmann – bridge between linguistics and mathematics
- REPORTS
- Dictionary of Karel Čapek (ed. F. Čermák)
- The Concept of Stopwords in Persian Chemistry Articles: A Discussion in Automatic Indexing
- Measuring and Modeling the Complexity of Polysynthetic Language Learning: A Non-Extensive Neural Network Approach
- A Morphological Analyser of Slovak
- Tanglish (Tamil - English Mix) – The Language of Youngsters in Tamilnadu
- The Influence of the Reform in Dutch Verse at the Beginning of the 17th Century on the Subsequent Future of European Versification (A Typology of the Development of Syllabo-tonicism in Dutch, German and Russian Versification)
- Sequences of Linguistic Quantities Report on a New Unit of Investigation
- WORD FREQUENCY STUDIES
- S Curve Analysis with Multiple Logistic Regression for Language Change
- Addresses of Members of Editorial Board
- Instructions for Authors
Articles in the same Issue
- Titelei
- Gesetzmäßigkeiten Der Lautdauer
- Menzerath–Altmann Law for Syntactic Structures in Ukrainian
- Using Finite-state Automata for Text Lexicons Building
- Word Length and Word Frequency in Slovak
- Types of Interaction between Meter and Language in Relation to the Spread of the Syllabo-tonic in European Verse from the End of the 16th Century to the Mid 18th Century
- Diskretes Modell für die Polysemie: Neue empirische Evidenz
- Prolegomena to the History of Corpus and Quantitative Linguistics. Greek Antiquity.
- On Word Length: The Influence of a „Boundary“ Condition on the Modelling
- Preliminaries to Finnish Word Prediction
- The Combinatorics of Word Order in Flexible Parts-of-speech Systems
- Gabriel Altmann – bridge between linguistics and mathematics
- REPORTS
- Dictionary of Karel Čapek (ed. F. Čermák)
- The Concept of Stopwords in Persian Chemistry Articles: A Discussion in Automatic Indexing
- Measuring and Modeling the Complexity of Polysynthetic Language Learning: A Non-Extensive Neural Network Approach
- A Morphological Analyser of Slovak
- Tanglish (Tamil - English Mix) – The Language of Youngsters in Tamilnadu
- The Influence of the Reform in Dutch Verse at the Beginning of the 17th Century on the Subsequent Future of European Versification (A Typology of the Development of Syllabo-tonicism in Dutch, German and Russian Versification)
- Sequences of Linguistic Quantities Report on a New Unit of Investigation
- WORD FREQUENCY STUDIES
- S Curve Analysis with Multiple Logistic Regression for Language Change
- Addresses of Members of Editorial Board
- Instructions for Authors