Home Wikidition: Automatic lexiconization and linkification of text corpora
Article
Licensed
Unlicensed Requires Authentication

Wikidition: Automatic lexiconization and linkification of text corpora

  • Alexander Mehler

    Alexander Mehler is professor for Computational Humanities at Goethe University and head of the Text Technology Lab. He is member of the executive committee of the Center for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR). His research interests include computational models of linguistic networks.

    Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

    EMAIL logo
    , Rüdiger Gleim

    Rüdiger Gleim is scientific assistant at Goethe University. He worked within the Special Research Center Alignment in Communication at Bielefeld University. His research interests include semantic databases and text mining.

    Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

    , Tim vor der Brück

    Dr. Tim vor der Brück studied Computer Science at Saarland University. Currently, he is research associate at the Lucerne University of Applied Sciences and Arts. His research interests include text mining and multimodal computing.

    Hochschule Luzern, Technikumstr. 21, 6048 Horw

    , Wahed Hemati

    Wahed Hemati is project member of the CEDIFOR at Goethe University and works on machine reading and text mining.

    Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

    , Tolga Uslu

    Tolga Uslu is project member of the CEDIFOR at Goethe University and works on image-giving methods of text mining.

    Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

    and Steffen Eger

    Dr. Steffen Eger is project member of the CompHistSem project at Goethe University. His research interests concern mathematical methods of computational linguistics and social network analysis.

    Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Published/Copyright: April 9, 2016

Abstract

We introduce a new text technology, called Wikidition, which automatically generates large scale editions of corpora of natural language texts. Wikidition combines a wide range of text mining tools for automatically linking lexical, sentential and textual units. This includes the extraction of corpus-specific lexica down to the level of syntactic words and their grammatical categories. To this end, we introduce a novel measure of text reuse and exemplify Wikidition by means of the capitularies, that is, a corpus of Medieval Latin texts.

About the authors

Alexander Mehler

Alexander Mehler is professor for Computational Humanities at Goethe University and head of the Text Technology Lab. He is member of the executive committee of the Center for the Digital Foundation of Research in the Humanities, Social, and Educational Sciences (CEDIFOR). His research interests include computational models of linguistic networks.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Rüdiger Gleim

Rüdiger Gleim is scientific assistant at Goethe University. He worked within the Special Research Center Alignment in Communication at Bielefeld University. His research interests include semantic databases and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Tim vor der Brück

Dr. Tim vor der Brück studied Computer Science at Saarland University. Currently, he is research associate at the Lucerne University of Applied Sciences and Arts. His research interests include text mining and multimodal computing.

Hochschule Luzern, Technikumstr. 21, 6048 Horw

Wahed Hemati

Wahed Hemati is project member of the CEDIFOR at Goethe University and works on machine reading and text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Tolga Uslu

Tolga Uslu is project member of the CEDIFOR at Goethe University and works on image-giving methods of text mining.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Steffen Eger

Dr. Steffen Eger is project member of the CompHistSem project at Goethe University. His research interests concern mathematical methods of computational linguistics and social network analysis.

Goethe-Universität Frankfurt, Robert-Mayer-Straße 10, D-60325 Frankfurt am Main

Acknowledgement

This work has been funded by the German Federal Ministry of Education via the projects CompHistSem (www.comphistsem.org) and CEDIFOR (www.cedifor.de).

Received: 2015-9-19
Accepted: 2016-1-11
Published Online: 2016-4-9
Published in Print: 2016-3-1

©2016 Walter de Gruyter Berlin/Boston

Downloaded on 15.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/itit-2015-0035/html
Scroll to top button