Skip to main content
Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Chapter
Licensed
Unlicensed Requires Authentication

A method for the comparison of general sequences via type-token ratio

  • , and

Abstract

This article proposes a new method for analyzing and comparing general linear sequences with the minimum prior knowledge on the sequences needed. Sequence analysis is a broad problem studied by various fields from sociology and computer security to linguistics or biology. The method presented here applies the simplest quantitative linguistic tools in order to achieve methods transparency and easily interpretable results. The results form a vector describing the sequence and allow their clustering, machine learning and simple visualizations by line charts or multidimensional methods as MDS or tSNE. For completeness, artifacts and several formal models are derived to describe methods behavior in both common and extreme cases.

Abstract

This article proposes a new method for analyzing and comparing general linear sequences with the minimum prior knowledge on the sequences needed. Sequence analysis is a broad problem studied by various fields from sociology and computer security to linguistics or biology. The method presented here applies the simplest quantitative linguistic tools in order to achieve methods transparency and easily interpretable results. The results form a vector describing the sequence and allow their clustering, machine learning and simple visualizations by line charts or multidimensional methods as MDS or tSNE. For completeness, artifacts and several formal models are derived to describe methods behavior in both common and extreme cases.

Downloaded on 27.4.2026 from https://www.degruyterbrill.com/document/doi/10.1075/cilt.356.03mat/html?lang=en
Scroll to top button