OCR bei Inkunabeln – Offizinspezifischer Ansatz der Universitätsbibliothek Würzburg

Felix Kirchner; Marco Dittrich; Phillip Beckenbauer; Maximilian Nöth

doi:10.1515/abitech-2016-0036

Article

OCR bei Inkunabeln – Offizinspezifischer Ansatz der Universitätsbibliothek Würzburg

Felix Kirchner
Felix Kirchner
Universitätsbibliothek Würzburg
Am Hubland
97074 Würzburg
orcid.org/0000-0002-2653-6554
, Marco Dittrich
Marco Dittrich
Universitätsbibliothek Würzburg
Am Hubland
97074 Würzburg
orcid.org/0000-0002-8681-6443
, Phillip Beckenbauer
Phillip Beckenbauer
Universitätsbibliothek Würzburg
Am Hubland
97074 Würzburg
orcid.org/0000-0003-4379-1669
and Maximilian Nöth
Maximilian Nöth
Universitätsbibliothek Würzburg
Am Hubland
97074 Würzburg
orcid.org/0000-0002-5048-7238

Published/Copyright: September 12, 2016

Published by

Become an author with De Gruyter Brill

Author Information Explore this Subject

From the journal ABI Technik Volume 36 Issue 3

Zusammenfassung:

Im Rahmen des BMBF-geförderten Projekts KALLIMACHOS an der Universität Würzburg soll unter anderem die Textgrundlage für digitale Editionen per OCR gewonnen werden. Das Bearbeitungskorpus besteht aus deutschen, französischen und lateinischen Inkunabeln. Dieser Artikel zeigt, wie man mit bereits heute existierenden Methoden und Programmen den Problemen bei der OCR von Inkunabeln entgegentreten kann. Hierzu wurde an der Universitätsbibliothek Würzburg ein Verfahren erprobt, mit dem auf ausgewählten Werken einer Druckerwerkstatt bereits Zeichengenauigkeiten von bis zu 95 Prozent und Wortgenauigkeiten von bis zu 73 Prozent erzielt werden.

Abstract:

In the context of the Kallimachos project at the University of Würzburg, the textual base for digital editions of incunabula is obtained via OCR. The corpus of texts to be worked on consists of German, French, and Latin opera. This article shows how problems with OCR of incunabula can be tackled with already existing methods and programs. The developed method focuses on setting up a type-specific OCR training to be reused with different medieval printings from one printshop. Following this method we achieved letter accuracies of up to 95 percent and word accuracies of up to 73 percent.

Schlüsselwörter: : OCR; Tesseract; Inkunabel

Keywords: : OCR; Tesseract; Incunabula