Home Preliminaries to Finnish Word Prediction
Article
Licensed
Unlicensed Requires Authentication

Preliminaries to Finnish Word Prediction

  • P. Väyrynen , K. Noponen and T. Seppänen
Published/Copyright: July 15, 2014
Become an author with De Gruyter Brill

Abstract

Commercial word prediction software is thus far mainly available for uninflected languages such as English. In the present study, we investigate word prediction in highly inflected languages, using Finnish as an example. Despite its high degree of case inflection, about one third of word tokens in a Finnish text appear in their uninflected base form. As a result, simple prediction techniques such as word completion, originally developed for English, can be used for investigating characteristics of word prediction in inflected languages. Our preliminary results show that about 45% of characters can roughly be saved in Finnish word prediction in general for uninflected and inflected tokens. The most interesting result of our prediction experiments is, however, showing the distribution of character savings to the most common cases and their cumulative effect on the total percentage of character savings that may be achievable in Finnish word prediction. The major conclusions of the study are that word prediction in a highly inflected language such as Finnish is feasible provided that the case form used with a word appearing in a given context of use can be predicted correctly, at least in some cases, and the cognitive load of the resulting prediction system for the user is not too high when the prediction of the case form fails.

Published Online: 2014-7-15
Published in Print: 2008-7-1

© 2014 Akademie Verlag GmbH, Markgrafenstr. 12-14, 10969 Berlin.

Articles in the same Issue

  1. Titelei
  2. Gesetzmäßigkeiten Der Lautdauer
  3. Menzerath–Altmann Law for Syntactic Structures in Ukrainian
  4. Using Finite-state Automata for Text Lexicons Building
  5. Word Length and Word Frequency in Slovak
  6. Types of Interaction between Meter and Language in Relation to the Spread of the Syllabo-tonic in European Verse from the End of the 16th Century to the Mid 18th Century
  7. Diskretes Modell für die Polysemie: Neue empirische Evidenz
  8. Prolegomena to the History of Corpus and Quantitative Linguistics. Greek Antiquity.
  9. On Word Length: The Influence of a „Boundary“ Condition on the Modelling
  10. Preliminaries to Finnish Word Prediction
  11. The Combinatorics of Word Order in Flexible Parts-of-speech Systems
  12. Gabriel Altmann – bridge between linguistics and mathematics
  13. REPORTS
  14. Dictionary of Karel Čapek (ed. F. Čermák)
  15. The Concept of Stopwords in Persian Chemistry Articles: A Discussion in Automatic Indexing
  16. Measuring and Modeling the Complexity of Polysynthetic Language Learning: A Non-Extensive Neural Network Approach
  17. A Morphological Analyser of Slovak
  18. Tanglish (Tamil - English Mix) – The Language of Youngsters in Tamilnadu
  19. The Influence of the Reform in Dutch Verse at the Beginning of the 17th Century on the Subsequent Future of European Versification (A Typology of the Development of Syllabo-tonicism in Dutch, German and Russian Versification)
  20. Sequences of Linguistic Quantities Report on a New Unit of Investigation
  21. WORD FREQUENCY STUDIES
  22. S Curve Analysis with Multiple Logistic Regression for Language Change
  23. Addresses of Members of Editorial Board
  24. Instructions for Authors
Downloaded on 16.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/glot-2008-0009/html
Scroll to top button