Harmonizing language data
-
Herausgegeben von:
Piotr Bański
, Ulrich Heid und Laura Herzberg -
Gefördert durch:
VolkswagenStiftung
Über dieses Buch
Standards function as safeguards to ensure that data remains interpretable, uniformly queryable, and archivable over time – a critical challenge for digital humanists working with complex linguistic resources. This book provides an overview of essential standards for ensuring the sustainability of data in the Digital Humanities (DH). It addresses the selection of data encoding formats, methods of annotating primary data, and approaches to making resources findable and accessible. The focus is on various forms of linguistic data, such as texts, lexicons, or parallel arrangements (e.g., translations or transcribed recordings). The work explains the role of annotations and metadata in structuring and contextualizing data and examines the influence of diverse data formats, shaped by local academic or industrial practices. In contrast to neural language models, which often yield impressive but opaque results, DH projects aim for transparency, reproducibility, and sustainability. Achieving these goals requires interoperability – the seamless interaction between data and tools. The book demonstrates how clear guidelines and best practices help ensure the long-term usability of data. It offers digital humanists practical approaches and well-founded standards to sustainably archive and efficiently utilize their data, making it an indispensable resource for the field.
Information zu Autoren / Herausgebern
Ulrich Heid, Univ. of Hildesheim; Piotr Bański and Laura Herzberg, Leibniz Institute for the German Language, Mannheim, Germany.
Fachgebiete
-
PDF downloadenOpen Access
Frontmatter
I -
PDF downloadenOpen Access
Acknowledgments
-
PDF downloadenOpen Access
Contents
VII -
PDF downloadenOpen Access
Towards an optimum degree of order in the field of language resources
1 -
PDF downloadenOpen Access
Character encoding and its importance for text resources
17 -
PDF downloadenOpen Access
International standards for the identification and the description of languages and their varieties
35 -
PDF downloadenOpen Access
Part-of-speech tagging and related annotation
61 -
PDF downloadenOpen Access
Named entity recognition and entity linking
89 -
PDF downloadenOpen Access
Annotated audiovisual language data: data quality and data maturity
115 -
PDF downloadenOpen Access
From spoken language data to TEI-based ISO standard
145 -
PDF downloadenOpen Access
Dealing with multiple annotations
169 -
PDF downloadenOpen Access
Standards and practices for long-term digital archiving
201 -
PDF downloadenOpen Access
Conversion into the archival format I5
229 -
PDF downloadenOpen Access
Metadata for research data
251 -
PDF downloadenOpen Access
Linguistic linked (open) data
281 -
PDF downloadenOpen Access
Data exploitation: corpus queries
303 -
PDF downloadenOpen Access
Querying spoken language data
339 -
PDF downloadenOpen Access
Accessing linguistic content in distributed research environments
377 -
PDF downloadenOpen Access
Taxonomy of legal and ethical metadata for language resources
401 -
PDF downloadenOpen Access
The life of an ISO standard
427 -
PDF downloadenOpen Access
Index
-
PDF downloadenOpen Access
Author index
-
Herstellerinformationen:
Walter de Gruyter GmbH
Genthiner Straße 13
10785 Berlin
productsafety@degruyterbrill.com