Harmonizing language data: Standards for linguistic resources

Buch Open Access

Harmonizing language data

Standards for linguistic resources

Herausgegeben von: Piotr Bański , Ulrich Heid und Laura Herzberg
Gefördert durch: VolkswagenStiftung

Sprache: Englisch

Veröffentlicht/Copyright: 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Erkunden Sie dieses Fachgebiet So veröffentlichen Sie bei uns

Dieses Buch ist Teil der Reihe

Band 4 | Digital Linguistics

Über dieses Buch

Standards function as safeguards to ensure that data remains interpretable, uniformly queryable, and archivable over time – a critical challenge for digital humanists working with complex linguistic resources. This book provides an overview of essential standards for ensuring the sustainability of data in the Digital Humanities (DH). It addresses the selection of data encoding formats, methods of annotating primary data, and approaches to making resources findable and accessible. The focus is on various forms of linguistic data, such as texts, lexicons, or parallel arrangements (e.g., translations or transcribed recordings). The work explains the role of annotations and metadata in structuring and contextualizing data and examines the influence of diverse data formats, shaped by local academic or industrial practices. In contrast to neural language models, which often yield impressive but opaque results, DH projects aim for transparency, reproducibility, and sustainability. Achieving these goals requires interoperability – the seamless interaction between data and tools. The book demonstrates how clear guidelines and best practices help ensure the long-term usability of data. It offers digital humanists practical approaches and well-founded standards to sustainably archive and efficiently utilize their data, making it an indispensable resource for the field.

Information zu Autoren / Herausgebern

Ulrich Heid, Univ. of Hildesheim; Piotr Bański and Laura Herzberg, Leibniz Institute for the German Language, Mannheim, Germany.

Fachgebiete

Open Access

Frontmatter
I

PDF downloaden
Open Access

Acknowledgments

PDF downloaden
Open Access

Contents
VII

PDF downloaden
Open Access

Towards an optimum degree of order in the field of language resources
1
Piotr Bański, Ulrich Heid und Laura Herzberg

PDF downloaden
Open Access

Character encoding and its importance for text resources
17
Christian Wartena

PDF downloaden
Open Access

International standards for the identification and the description of languages and their varieties
35
Laurent Romary

PDF downloaden
Open Access

Part-of-speech tagging and related annotation
61
Nikola Ljubešić und Tomaž Erjavec

PDF downloaden
Open Access

Named entity recognition and entity linking
89
Pia Schwarz

PDF downloaden
Open Access

Annotated audiovisual language data: data quality and data maturity
115
Vera Ferreira, Hanna Hedeland und Kelsey Neely

PDF downloaden
Open Access

From spoken language data to TEI-based ISO standard
145
Antonina Werthmann

PDF downloaden
Open Access

Dealing with multiple annotations
169
Piotr Bański und Nils Diewald

PDF downloaden
Open Access

Standards and practices for long-term digital archiving
201
Ines Pisetta und Thorsten Trippel

PDF downloaden
Open Access

Conversion into the archival format I5
229
Harald Lüngen und Ines Pisetta

PDF downloaden
Open Access

Metadata for research data
251
Thorsten Trippel

PDF downloaden
Open Access

Linguistic linked (open) data
281
Anas Fahad Khan

PDF downloaden
Open Access

Data exploitation: corpus queries
303
Stephanie Evert, Timm Weber, Steffen Bothe, Philipp Heinrich und Alexander Piperski

PDF downloaden
Open Access

Querying spoken language data
339
Elena Frick und Thomas Schmidt

PDF downloaden
Open Access

Accessing linguistic content in distributed research environments
377
Erik Körner und Thomas Eckart

PDF downloaden
Open Access

Taxonomy of legal and ethical metadata for language resources
401
Paweł Kamocki

PDF downloaden
Open Access

The life of an ISO standard
427
Annette Preissner und Ulrich Heid

PDF downloaden
Open Access

Index

PDF downloaden
Open Access

Author index

PDF downloaden

Informationen zur Veröffentlichung

Seiten und Bilder/Illustrationen im Buch

eBook veröffentlicht am:

15. Dezember 2025

eBook ISBN:

9783112208212

Gebunden veröffentlicht am:

15. Dezember 2025

Gebunden ISBN:

9783119148023

Seiten und Bilder/Illustrationen im Buch

Frontmatter:

8

Inhalt:

462

Abbildungen:

80

Tabellen:

28

https://doi.org/10.1515/9783112208212

eBook ISBN: 9783112208212

Gebunden ISBN: 9783119148023

Schlagwörter für dieses Buch

Datennachhaltigkeit; Interoperabilität; Metadaten und Annotationen; Datennormen

Zielgruppe(n) für dieses Buch

Researchers

Creative Commons

BY 4.0

Sicherheits- und Produktressourcen

Herstellerinformationen:
Walter de Gruyter GmbH
Genthiner Straße 13
10785 Berlin
productsafety@degruyterbrill.com

Harmonizing language data

Übersicht

Über dieses Buch

Information zu Autoren / Herausgebern

Fachgebiete

Inhaltsverzeichnis

Frontmatter

Acknowledgments

Contents

Towards an optimum degree of order in the field of language resources

Character encoding and its importance for text resources

International standards for the identification and the description of languages and their varieties

Part-of-speech tagging and related annotation

Named entity recognition and entity linking

Annotated audiovisual language data: data quality and data maturity

From spoken language data to TEI-based ISO standard

Dealing with multiple annotations

Standards and practices for long-term digital archiving

Conversion into the archival format I5

Metadata for research data

Linguistic linked (open) data

Data exploitation: corpus queries

Querying spoken language data

Accessing linguistic content in distributed research environments

Taxonomy of legal and ethical metadata for language resources

The life of an ISO standard

Index

Author index

Bibliographische Daten