Home Linguistics & Semiotics A method for the comparison of general sequences via type-token ratio
Chapter
Licensed
Unlicensed Requires Authentication

A method for the comparison of general sequences via type-token ratio

  • Vladimír Matlach , Diego Gabriel Krivochen and Jiří Milička
View more publications by John Benjamins Publishing Company
Language and Text
This chapter is in the book Language and Text

Abstract

This article proposes a new method for analyzing and comparing general linear sequences with the minimum prior knowledge on the sequences needed. Sequence analysis is a broad problem studied by various fields from sociology and computer security to linguistics or biology. The method presented here applies the simplest quantitative linguistic tools in order to achieve methods transparency and easily interpretable results. The results form a vector describing the sequence and allow their clustering, machine learning and simple visualizations by line charts or multidimensional methods as MDS or tSNE. For completeness, artifacts and several formal models are derived to describe methods behavior in both common and extreme cases.

Abstract

This article proposes a new method for analyzing and comparing general linear sequences with the minimum prior knowledge on the sequences needed. Sequence analysis is a broad problem studied by various fields from sociology and computer security to linguistics or biology. The method presented here applies the simplest quantitative linguistic tools in order to achieve methods transparency and easily interpretable results. The results form a vector describing the sequence and allow their clustering, machine learning and simple visualizations by line charts or multidimensional methods as MDS or tSNE. For completeness, artifacts and several formal models are derived to describe methods behavior in both common and extreme cases.

Downloaded on 29.12.2025 from https://www.degruyterbrill.com/document/doi/10.1075/cilt.356.03mat/html
Scroll to top button