When Data Meets the Past: Data Collection, Sharing, and Reuse in Ancient World Studies

Andrea Farina; Paola Marongiu; Mathilde Bru; Daniele Borkowski

doi:10.1515/opis-2025-0014

Article Open Access

When Data Meets the Past: Data Collection, Sharing, and Reuse in Ancient World Studies

Andrea Farina , Paola Marongiu , Mathilde Bru and Daniele Borkowski

Published/Copyright: March 28, 2025

Published by

Become an author with De Gruyter Brill

Author Information Explore this Subject

From the journal Open Information Science Volume 9 Issue 1

Abstract

This article explores the challenges and opportunities of adopting data-driven approaches in Ancient World (AW) studies, focusing on the complexities of data collection, curation, and analysis in the field. We address issues such as defining data for AW studies, as well as data fragmentation, standardization, and interoperability. We propose solutions to enhance data accessibility, collaboration, and reuse, demonstrating that adopting standardized formats and adhering to FAIR principles can improve data sharing and enable large-scale, interdisciplinary research. Importantly, we highlight how qualitative and quantitative approaches can coexist, enriching the field. We also review different past and ongoing initiatives supporting data-driven methodologies in AW studies and advocate for their continued expansion. Lastly, we discuss the rise of data papers as a transformative tool for bridging traditional scholarship and digital methodologies, emphasizing the importance of data sets and their potential for reuse in advancing the field.

Keywords: data-driven methodologies; Ancient World studies; FAIR principles; data collection and reuse; data paper

1 Introduction

The study of the Ancient World (AW) has traditionally been rooted in qualitative methodologies, with scholars relying mainly on close reading and interpretation of their sources. However, the rise of computational methods and digital humanities has introduced a complementary, data-driven approach to the study of the AW and to the field of Humanities in general, transforming the way in which ancient texts, artifacts, and cultural phenomena can be analyzed and interpreted. This turn of events has encouraged researchers focused on the AW to view their work and objects of study in a new light. It has also prompted the academic community to reconsider the notion of data to include the products of human culture and experience, such as texts, artifacts, documentary evidence, and similar items.

A defining feature of this data-driven turn is the focus on the creation and reuse of data sets. Data sets serve as structured collections of information that can be analyzed systematically, often using computational tools. For textual data, this might involve annotated corpora or databases containing linguistic information, whereas for material culture data sets may include detailed metadata describing, for instance, archaeological artifacts, epigraphic records, or numismatic collections. These resources allow scholars to uncover patterns, trends, and connections that would be difficult or impossible to discern through traditional methods alone. Conceiving the data set as the primary outcome of a research process, that can be analyzed to address specific research questions, represents a methodological shift for the AW community. It requires scholars to make explicit decisions about how to organize, categorize, and encode the information retrieved during their research process. This evolving perspective on the use of quantitative methods and data set use in the Humanities has manifested in multiple ways. One notable development is the growing prominence of data papers and, consequently, the rise of data journals dedicated to the Humanities and Social Sciences (Ma, 2024; Marongiu, Pedrazzini, Ribary, & McGillivray, 2025; McGillivray et al., 2022a; Wigdorowitz et al., 2024). In addition, large databases, digital libraries, and archives have been created to preserve diverse forms of cultural heritage, including textual, archaeological, and iconographic materials. Platforms such as Europeana.eu^[1] provide access to a great part of Europe’s digital cultural heritage. Other resources like the British Library’s Digital Collections,^[2] the Digital Public Library of America,^[3] or Gallica (Bibliothèque Nationale de France)^[4] offer extensive archives of historical documents, manuscripts, and multimedia materials from various periods and regions. Another important aspect of this transformation is the creation of research infrastructures facilitating the storage and sharing of humanities data in full compliance with the FAIR principles (Wilkinson et al., 2016). For instance, CLARIN supports linguistic data (Branco et al., 2023), and DARIAH helps the Arts and Humanities more broadly (Blanke, Bryant, Hedges, Aschenbrenner, & Priddy, 2011; Digital Research Infrastructure for the Arts and Humanities, 2024; Tasovac et al., 2023). This shift has also impacted the field of AW studies. In particular, historical linguistics has been an early beneficiary, largely due to the pioneering work of Father Roberto Busa, whose Index Thomisticus (Busa, 1974–1980) laid the foundations of computational linguistics (see Section 2.1). As a result, numerous initiatives in the field now focus on the application of quantitative and computational methods to the study of ancient languages. An example is the workshop on Language Technologies for Historical and Ancient Languages (LT4HALA), now in its third edition (Sprugnoli & Passarotti, 2024). Beyond linguistics, data-driven approaches have rapidly flourished across AW disciplines. Notable examples include the Linked Pasts symposium,^[5] now in its tenth edition, as well as other workshops^[6] designed for early-career researchers seeking to integrate quantitative methods into their studies of the AW.

In this article, we will show how the growing tendency to adopt data-driven approaches in AW studies has significantly expanded research horizons in the field. We also aim to emphasize the benefits of including the creation and sharing of a data set in the lifecycle of a project, especially in enhancing collaboration, interdisciplinarity, and data reuse among different scholars in the field. The article is structured as follows. Section 2 provides an overview of data-driven approaches in AW studies, offering examples of relevant online databases in different disciplines (e.g., linguistics, papyrology, and archaeology) and stressing the collaborative and interdisciplinary character of data-driven approaches to AW studies. Section 3 acknowledges the hard task of defining and collecting data for this field. We offer a possible definition of “data” in AW studies, alongside examples from previous studies. We also address the challenges related to data collection and standardization for AW studies, insisting on the importance of data set creation and the possibility for qualitative and quantitative approaches to coexist in the field. In Section 4, the limitations of a data-driven approach in AW studies are discussed, proposing solutions based on selected case studies. More importantly, the advantages offered by this methodology are shown. Finally, we advocate for broader participation in data-driven approaches to AW studies, providing some examples of different initiatives to illustrate their potential. In Section 5, we present the publication of a data paper as part of the best practices to enhance the impact and reuse potential of a data set issued from a research project. Section 6 elaborates on the future perspectives of data-driven approaches for AW studies. Finally, in Section 7, we present the main conclusions of this work.

2 Digital Humanities (DH) and Data-Driven Approaches to AW Studies: An Overview

Since the twentieth century, the advent of the DH has transformed the field of AW studies (Brunner, 1993; Crane, 2004; McDonough, 1959; Romano, 2011), expanding the ways researchers can access, collect, analyze, and interpret ancient data. Computational methods and digital databases (refer to Revellio, 2015 for a survey on existing resources for Latin) now play a central role, offering AW scholars tools for systematic analyses and enabling more accessible and collaborative research environments. This section examines the digital landscape in AW studies, highlighting key databases, tools, and recent trends that are reshaping the discipline and driving new approaches to AW data. It further considers the impact of these innovations on accessibility and collaboration, as well as the unique challenges involved in adapting digital methods to ancient sources.

2.1 The Digital Turn for AW Studies

The DH have evolved rapidly, integrating technology with traditional humanities research methods. In AW studies, this integration has opened up new dimensions for research, allowing scholars to manage and study vast amounts of ancient material in ways that were previously unimaginable. Since the digitization of the work by Thomas Aquinas with the Index Thomisticus (Busa, 1974–1980), many other projects such as the Perseus Digital Library (Crane et al., 2006; Crane, 1987) digitized large corpora of ancient texts, making them freely available for scholars worldwide. These resources have enabled researchers to apply new digital tools to ancient texts, conducting analyses that reveal patterns across languages, cultures, and historical periods. One core objective of DH in AW studies is to facilitate access to primary sources and secondary literature, bringing previously isolated or difficult-to-obtain materials into the public domain. Databases containing digitized texts, inscriptions, archaeological data, and other similar sources of information on the AW (for our definition of “data” in AW studies, refer to Section 3.1) now serve as essential resources for AW specialists. They provide a foundation for a range of computational analyses, from linguistic studies to geographic information system (GIS) mapping of ancient sites. Additionally, the integration of standardized and interoperable annotation systems, metadata standards, and data formats has strengthened the field’s capacity to interpret and share research findings on an unprecedented scale. The DH in AW studies also encourage a data-driven mindset that promotes collaboration and reproducibility. By enabling scholars to organize and publish data systematically, digital resources allow for interdisciplinary research and shared data sets that are reusable and expandable by others in the field. This collective approach to data management offers classicists the opportunity to contribute to and draw from a shared pool of knowledge, which in turn fosters new lines of inquiry and a more interconnected research community.

2.2 Databases and Digital Tools as Foundations for Data-Driven Research on the AW: Some Examples

Digital databases have become indispensable resources in the field of AW studies, providing structured information essential to data-driven analysis. These databases are not just repositories of ancient texts and artifacts, but they allow AW scholars to collect and study data that may reveal new insights into language, culture, and material history. Each database type exemplifies the diverse range of structured data on which digital AW studies research depends. Here, we comment on a few of these databases, stressing their potential across disciplines.

One of the most well-known resources for Classicists, the Perseus Digital Library, hosts an extensive collection of Ancient Greek and Latin texts, alongside translations and linguistic tools. It allows researchers to search for specific words and cross-reference sources, facilitating data-driven linguistic analysis. The structured data provided by Perseus and its integration with other resources such as the Morpheus system (Crane, 1991) support studies in different disciplines, ranging from philology to historical linguistics and cultural history, enabling classicists to quantitatively trace linguistic trends and textual relationships across time periods and authors. Similarly, the Thesaurus Linguae Graecae (Pantelia, 2000) and the Thesaurus Latinae Linguae (Baraz, 2007; Krebs, 2009) digitize vast corpora of ancient texts, enabling systematic analysis on a larger scale. New efforts towards a quantitative representation of Latin and Ancient Greek have been made, with the creation of the LatinISE corpus for the former (McGillivray & Kilgarriff, 2013) and the Opera Graeca Adnotata for the latter (Celano, 2024). These resources allow classicists to explore not only individual texts, but also patterns and variations across languages, authors, and genres, making them extremely important for comparative studies and long-term linguistic analyses.

Beyond textual databases, specialized platforms such as Papyri.info (Sosin, 2010) or the Europeana network of Ancient Greek and Latin Epigraphy (Amato et al., 2014; Mannocci, Casarosa, Manghi, & Zoppi, 2014; Orlandi, 2016) extend data-driven approaches into the realms of documentary evidence. Papyri.info focuses on ancient papyri, providing transcription tools and searchable data to facilitate papyrological studies. By organizing documents from various regions and time periods, both platforms enable researchers to conduct large-scale analyses of documentary texts, revealing, for instance, shifts in administrative practices, social relationships, and economic trends across the ancient world. Other databases such as the Portable Antiquities Scheme (Hobbs, 2003; PAS, 2012) by the British Museum and Amgueddfa Cymru (Museum Wales) enable data-driven approaches to archaeological studies. The Portable Antiquities Scheme provides detailed information on artifacts including their descriptions, imagery, and find locations. This structured data are essential for analyzing distribution patterns of material artifacts, supporting research in archaeology, economic history, and cultural exchange.

Digital methods in AW studies rely on large, structured data sets allowing for innovative analyses, such as natural language processing (NLP) and GIS mapping. NLP tools such as the Classical Language Toolkit (Johnson et al., 2021) enable researchers to apply morphological tagging, syntactic parsing, and frequency analysis to ancient texts. By processing classical languages computationally, NLP opens new avenues for large-scale linguistic studies, providing insights into vocabulary distribution, syntactic structures, and semantic patterns across different corpora. These tools can rely on databases of digitized texts (see above) that can be used to train the models to systematically identify and analyze patterns in language use over time and across genres. GIS technologies have integrated spatial analysis into historical studies. Projects such as ORBIS: The Stanford Geospatial Network Model of the Roman World (Scheidel, Meeks, & Weiland, 2012) allow researchers to visualize ancient travel routes, trade networks, and geographic boundaries, revealing how spatial relationships influenced ancient societies. GIS projects rely on databases that organize spatial data (e.g., place names, coordinates, historical maps), allowing scholars to explore the intersection of geography and human activity in the ancient world. By combining spatial data with historical and textual information, GIS mapping enables more comprehensive studies of regional interactions, political boundaries, and economic flows.

Through the structured and searchable data sets provided by these and other similar platforms, databases and digital tools enable AW scholars to take a data-driven approach to ancient materials. These resources facilitate large-scale analyses uncovering unique patterns and supporting interdisciplinary collaborations that draw on methodologies from linguistics, archaeology, history, and data science. However, integrating these digital resources into AW studies is not without challenges, as issues of standardization, accessibility, and technical expertise continue to shape the field (Sections 2.3 and 3.2).

2.3 Fostering Collaboration, Accessibility, and Interdisciplinarity

As outlined in Section 2.2, one of the major outcomes of digital methods in AW studies is the potential to foster interdisciplinary collaboration and broad accessibility. The vast, interconnected data within databases about the AW enables scholars to draw on shared resources and expand each other’s research horizons. For instance, resources such as those described in Section 2.2 provide scholars with immediate access to digitized texts, inscriptions, and archaeological data that others have already collected, cataloged, and structured. This ready availability of data encourages researchers to leverage previous work to explore new questions and patterns, rather than starting from scratch. Such collective access enhances the potential for shared discoveries, making it possible for AW scholars to derive new insights from other people’s contributions, fostering a collaborative environment that builds on a common foundation of AW data. The open-access nature of many digital resources has further democratized AW studies, ensuring that such data is (as much as possible) Findable, Accessible, Interoperable, and Reusable according to the FAIR principles (Wilkinson et al., 2016). By adhering to FAIR standards, digital AW studies projects ensure that data can be easily located, accessed, and utilized in new research contexts, supporting sustainability and promoting a wide-ranging impact within and beyond the academic community.

As digital archives continue to grow, AW specialists can explore new possibilities for large-scale studies, enabling them, for instance, to trace linguistic, cultural, or societal changes over centuries and across vast data sets. This capacity to analyze long-term trends across a large body of data represents a significant advancement in AW studies, underscoring the transformative potential of DH. However, despite these advancements, data collection remains challenging for many AW researchers, and a substantial number still hesitate to rely on digital approaches as primary research methods. It is important to note, however, that computational methods are not strictly necessary for data collection in AW studies. Many scholars still gather extensive data manually on computers, yet often without recognizing that they are in fact creating data sets. This can lead to a few notable issues. First, they may structure their data in ways that hinder reusability, such as by using Word tables instead of interoperable formats (Section 4.1). Second, they might not check whether similar data has already been compiled in related studies, thus missing the chance to align with and possibly expand existing resources (Section 4.1). Finally, they often value the intellectual work of interpretation of their data and overlook the work lying behind data collection, so they keep their data private rather than sharing them with the wider scholarly community (Section 5).

Beyond these technical and methodological challenges, a fundamental debate persists over what exactly constitutes “data” in the context of AW studies. The lack of consensus on this issue raises important questions about how information from ancient texts, artifacts, and historical records should be defined, structured, and analyzed in digital form. Understanding what counts as “data” in AW studies is critical to advancing shared methodologies of data collection and sharing. This complex question will be explored in Section 3.

3 The Data Dilemma: Complexities of Creating Data for AW Studies

Providing an exhaustive definition of data for AW studies is a challenging task. Defining data for disciplines related to the AW involves recognizing their alignment with Humanities research practices while acknowledging the distinctive challenges posed by the field. These include data scarcity, fragmentation, interpretability, and the need for standardized methods to encode and analyze diverse data types. Addressing these challenges is vital for AW studies to fully benefit from advancements in digital and computational methods. This section defines data in the Humanities, emphasizing the specificities of AW studies, explores challenges in data collection, curation, standardization, and sharing, and offers best practices for adopting a data-driven approach in the field.

3.1 What is “Data” in the Humanities and in AW Studies?

To define data in the field of AW studies, it is essential to first establish a general understanding of the term “data” and its application within the Humanities, comparing it to its use in Science, Technology, Engineering, and Mathematics (STEM), and Health disciplines. The difference between what is considered data in STEM and Humanities heavily depends on the difference between these two research areas. By nature, STEM disciplines are based on numerical values, measurable experimental results, structured data sets, and formulas, which better allow for the exploitation of quantitative methods (Valiela, 2001). The main purpose across STEM disciplines is to test hypotheses, find patterns, develop models, predict certain outcomes, and give empirical validation to research questions (Develaki, 2020; NSTA, 2011). On the other hand, Humanities aim to interpret, contextualize, and understand human experiences or cultural products through comparative and/or analytical approaches (Given, 2008). However, data-driven approaches in the Humanities are experiencing significant growth (Section 2.1). They are always used in combination with interpretive methods to analyze complex and often ambiguous research objects. Such complexity and ambiguity arise from several factors intrinsic to the nature of the humanistic enquiry. Human experiences, texts, and artifacts are always embedded in unique cultural, historical, and social contexts that cannot be fully understood in isolation. These contexts introduce layers of meaning that are often interdependent and require careful interpretation – think, for instance, of the meaning of a single word varying depending on its linguistic, social, historical, and even political contexts. Moreover, many Humanities research objects are inherently open to multiple interpretations (Section 4.1). This ambiguity increases when dealing with incomplete or fragmentary sources, such as damaged manuscripts or archaeological finds, where researchers must fill gaps relying on their assumptions and interpretations. The Humanities include a wide array of fields such as History, Literature, Art, Linguistics, Anthropology, and Film Studies, along with specialized subfields defined by time, geography, or language. This heterogeneity is reflected in the great variety of research data used and produced by each of them. Data sources from Humanities research areas range from texts, images, artifacts, oral records, video recordings, among others. The presence of so many different types of data, often not comparable and not measurable with similar metrics or systems, makes the effort of finding a good definition of data for the Humanities an extremely challenging task (McGillivray et al., 2022a, p. 5). In literature, there is no consensus on how data is defined within Humanities research. Apart from a few studies (e.g., Kalinin & Skvortsov, 2023), research on how humanists conceptualize their research data is extremely limited (Poljak Bilić & Posavec, 2024, p. 3). From some surveys aiming at enquiring the attitude of scholars in the Humanities towards the concept of data in their fields, a notable reluctance has emerged to even define their evidence as data per se (Allen & Hartland, 2018). For this reason, DARIAH-DE^[7] proposes an umbrella definition of data for the Humanities as “all sources, materials, and results that have been collected, recorded, or evaluated in the context of research and answers to research questions in the field of humanities and culture, as well as computer-processed data for permanent storage, citation, and further processing”.^[8] The challenge of identifying and defining Humanities data also applies to AW disciplines since, as subcategories of the Humanities, they share their interpretive and analytical approach to data. However, the field of AW studies presents unique characteristics and challenges due to its interdisciplinary nature and the heterogeneity of its data sources. AW studies encompass a wide range of disciplines, including, for instance, History, Archaeology, Linguistics, Literature, Philology, Art History, and Museum Studies, each with its own subfields. Correspondingly, the types of data widely vary and include textual materials (e.g., manuscripts, papyri, inscriptions, translations, annotated linguistic data, word lists), physical artifacts (e.g., tablets, coins, artworks, archaeological objects such as vases or miniatures), and conceptual entities (dates, events, geographical locations). Multiple types of data may often coexist within a single source. For instance, a cuneiform tablet may provide archaeological data (e.g., its provenance, chronological context, and physical features) and linguistic/literary data (e.g., its textual content, linguistic elements, and historical themes). This multiplicity necessitates the application of diverse methodologies, theoretical frameworks, and interpretive standards tailored to the specific nature of each data set.

3.2 Collecting, Curating, and Analyzing Data in AW Studies: A Hard Task

Given the peculiar nature of data in AW studies (Section 3.1), a number of challenges arise with respect to its collection, curation, and analysis. A significant challenge in AW studies is often represented by the scarcity and fragmentation of data (Oldman, Doerr, de Jong, Norton, & Wikman, 2014). Unlike other Humanities disciplines, where data sets can be expanded through methods such as web crawling, e.g. the TenTen corpora (Jakubíček, Kilgarriff, Kovář, Rychlý, & Suchomel, 2013) or interviews with speakers, AW studies must rely on fixed-size corpora. For example, linguists working on modern languages can create extensive corpora containing millions of tokens or generate new data through interviews with native speakers. In contrast, researchers studying historical languages are limited to the finite textual evidence preserved from antiquity. New discoveries are rare, and existing corpora are often small in scale, significantly restricting the potential for large-scale quantitative analyses compared to the resources available for modern languages. Additionally, many texts remain inaccessible in their digital form and only exist as physical editions. Even when digitization is feasible via OCR techniques, this process demands significant time and resources, especially for non-Latin scripts, and may face legal and copyright barriers if critical editions are held by publishers that do not permit open access (Tóth-Czifra, 2019). Similarly, archaeologists rely on the finite evidence preserved from excavations. New discoveries are infrequent and often localized, while existing data are frequently fragmented or incomplete. Moreover, much of the archaeological record is inaccessible, either because it remains unexcavated, is poorly or inaccurately documented in earlier reports, or is restricted by legal or institutional barriers. Even digitization of archaeological material requires substantial investment in resources and faces issues such as inconsistencies in recording standards or loss of context due to incomplete documentation (Kintigh, 2006). These limitations have cascading effects. The lack of standardized encoding for linguistic and textual data in ancient languages, coupled with insufficient data sets, hinders the development of computational tools such as lemmatizers, morphological analyzers, and syntactic parsers for historical languages. Efforts to expand data sets often face challenges related to interoperability and adherence to FAIR principles. The inability to ensure open and standardized formats for digital resources further restricts collaborative research and the creation of derivative tools, such as TEI-XML^[9] encoded digital editions of ancient texts. Moreover, data in AW studies often demands the preservation not only of the digital or physical object (e.g., papyri, artifacts) but also of the scholarly interpretation embedded in its study. Digital editions and databases, therefore, must encompass both the raw data and the scholarly work that contextualizes and analyzes these materials (Almas & Beaulieu, 2013). For instance, Papyri.info (Section 2.2), offers both digitized images of papyri and their philological reconstruction, encoding textual problems such as lacunae in TEI-XML language. In general, traditional approaches within AW studies often primarily focus on the interpretive analysis of data, with the ultimate aim of producing scholarly articles for prestigious journals that would provide valuable academic recognition. These publications, while essential, tend to focus on the scholar’s intellectual and interpretive work, while overlooking the processes underlying the production of the data resources themselves. Yet, these resources represent the foundation that enables both individual scholarship and the broader advancement of the field. As a result, AW scholars sometimes lack awareness not only of what data is but also of what constitutes a data set and of the potential impact that it may have on the wider academic community. This potential for reuse i.e., the property of a data set developed for a specific project to become a resource for possibly entirely different studies, is often underestimated (McGillivray et al., 2022b). There is no definitive minimum size for a collection of data to be considered a data set. However, while a data set does not necessarily need to be vast in scope, it should still have enough depth and coverage to describe a specific phenomenon in a representative way. This is essential for the data set to be impactful and useful to the community. For instance, a data set with only a handful of entries would generally not be considered robust, unless the research object inherently involves only a small amount of data. Consequently, producing data sets that are both specific to their original research questions and broadly reusable is key to fostering collaborative and scalable scholarship in AW studies.

3.3 Data Collection and Data Standardization

The first step an AW researcher faces when conducting data-driven analyses is data collection. For more traditional specialists, who may not typically rely on computational methods, learning to systematically collect data across different areas of AW studies is essential. This process involves developing a structured and intentional approach to data handling, which not only benefits their immediate research but also enhances the reusability and impact of their data sets within the broader academic community. Firstly, it is essential to approach data collection with the understanding that any observed phenomenon in a textual passage, tablet, or artifact can be transformed into a valuable data set, regardless of the methodology employed. Understanding this potential is key to adopting a more data-driven mindset. A straightforward way to adopt this approach is to shift from unstructured documents or personal notes to data formats that are interoperable and open for reuse by other researchers, such as tab-separated formats, e.g. CoNLL-U files for syntactic annotation (Buchholz & Marsi, 2006),^[10] or XML-compliant formats for encoding philologic or semantic annotation (TEI Consortium, 2024). This allows for a more systematic and easier way to retrieve data and patterns, through programming languages and tools for visualization and distant reading. In more recent times, research on data structures is moving towards the creation of Linked Open Data infrastructures by encoding information in RDF^[11] format. This approach aims at reducing data fragmentation, so that different resources, originally scattered across various project websites and resource infrastructures can be linked and “talk” to each other, providing researchers with a large amount of information in one single place. Latin is a particularly notable case, as it is arguably the most well-resourced historical language, benefiting from a range of linguistic resources and NLP tools. In this context, the ERC project “Lila: Linking Latin” (2018–2023) has been building a knowledge base of linguistic resources using the Linked Open Data model (Passarotti et al., 2020). Providing AW scholars with training on new advancements in the field would not only help integrate them into the broader discussion on open data-sharing practices and resource infrastructures but also inspire others to follow the example set by LiLa. Consistency is another critical factor in the collection and creation of new data sets. Often, especially in the past, each research project defined its own labels for encoding the phenomena under study. AW researchers should strive to use controlled and standard vocabularies for encoding the phenomena for which the data set is being created. This would ensure clarity and uniformity, making the data more accessible for both personal use and external collaboration. A significant obstacle to this lies in the fact that many datasets remain inaccessible to the public, as they are often closely guarded by the researchers who created them. This lack of widely shared open-access practices makes it challenging to obtain information that could otherwise contribute to the establishment of shared standards for encoding similar – or even identical – phenomena across related fields, while highlighting best practices and providing examples of how to structure data effectively. In relation to the notion of open data, another fundamental step is that the community gains awareness about existing FAIR principles for data collecting and sharing (see Section 2.3). Considering FAIR principles when collecting data enhances the ease of data sharing and increases the potential impact of both the data set and its associated research. This approach involves learning about data standards in one’s field, such as TEI-XML for philological and linguistic data, as well as familiarizing oneself with data-sharing platforms like DARIAH for the Arts and Humanities (Section 3.1) or CLARIN for linguistic data. CLARIN not only serves as a repository for humanities data but also offers training resources for scholars, covering the use of tools within its infrastructure and the implementation of FAIR principles throughout a research project’s lifecycle.^[12]

Understanding the importance of metadata is another crucial aspect of adopting a data-driven and FAIR-oriented approach to the study of the AW. A clear and comprehensive description of data sets ensures that they remain intelligible to researchers who are unfamiliar with the original study for which they were created, facilitating reuse and interdisciplinary collaboration (Section 5). One of the major challenges in managing data within AW disciplines lies in the inherent heterogeneity of these fields (Section 3.1). The diversity of disciplines and the variety of their data types make it difficult to devise universal standards. As it occurs within the Humanities in general (Section 3.1), this is further compounded by the broad temporal range covered by AW studies, often spanning millennia. For instance, Latin and Greek have a tradition of more than two millennia, where each period reflects not only significant linguistic changes, but also transformations in art, culture, and history – consider, for instance, the profound influence of Christianity on the evolution of Latin (McGillivray et al., 2022b). This heterogeneity is reflected in the nature of the data, necessitating an interdisciplinary approach to the collection, metadata creation, processing, and curation of data sets. Additional challenges include the scarcity of data (Section 3.2), its proliferation without adherence to common standards, and the fragmentation that characterizes many AW disciplines. For instance, in historical linguistics, particularly for Latin, syntactically annotated corpora (treebanks) are an essential resource for researchers. There are currently six Latin treebanks: the Latin Dependency Treebank (Bamman & Crane, 2006), PROIEL (Haug & Jøhndal, 2008), the Index Thomisticus treebank (Passarotti, 2011), UDante (Cecchini, Sprugnoli, Moretti, & Passarotti, 2020), the Late Latin Charter Treebank (Korkiakangas, 2021), and UD_Latin-CIRCSE^[13] (still ongoing). Initially, each Latin treebank (with the exception of the more recent UDante and UD_Latin-CIRCSE) was annotated using different annotation standards. However, the lack of common annotation guidelines highly limited the interoperability of these resources, affecting both ancient and modern languages. Efforts towards a standardization process have been tied in the last 8 years to the Universal Dependencies project (Nivre et al., 2016),^[14] which aims at establishing a cross-linguistically valid framework for treebank annotation. While significant progress has been made in harmonizing these resources for Latin, discrepancies in the annotation of specific syntactic phenomena remain, alongside overlaps in the texts included in different treebanks (Gamba & Zeman, 2023). Such inconsistencies can hinder interoperability and limit the potential for large-scale comparative studies. The field of Assyriology presents additional unique challenges, such as the coexistence of multiple languages (e.g., Sumerian, Akkadian, Hurrian, Babylonian, Assyrian, and Hittite) and the intricate relationship between written and spoken forms. Scribes, for example, often wrote in ways that diverged from everyday spoken language, introducing a layer of complexity in interpreting the data. One of the most significant obstacles is the identification of cuneiform signs. Variations in the interpretation of a single sign across studies can make it challenging to standardize and compare data sets (Section 4.1), complicating the broader goal of establishing shared standards. In conclusion, developing a mindset oriented towards collaboration and reuse can significantly benefit the field. Researchers should consider how their data sets might be useful to scholars in other, related areas. By adopting practices that promote transparency, consistency, and accessibility, traditional classicists can transform their data collection methods into a more structured, impactful process, bridging the gap between traditional and computational approaches in AW studies. The following section will discuss the limitations and advantages of data-driven approaches to AW studies, providing solutions to the former through examples of previous works.

4 Balancing Challenges and Opportunities for Data-Driven Approaches to AW Studies

As we have shown so far, taking a data-driven approach in AW studies is not always easy and straightforward, due to the complexity of the subject and its inherently humanistic character. Nonetheless, the benefits of this approach seem to outnumber its limitations. Not only quantitative analyses derived from a data-driven approach can coexist with qualitative interpretations, but they are also useful in order to enhance interdisciplinarity and collaboration, thus advancing the field.

4.1 Overcoming Barriers: Addressing Common Data-Driven Hesitations in AW Studies

One of the main issues usually raised by AW researchers when discussing data-driven approaches is the reduction of more complex information. For example, in the case of structured data sets created from literary or documentary texts, the overall meaning of the text, its context, and its themes might be lost when prioritizing the quantitative over the qualitative. An over-reliance on quantitative approaches may lead to the risk of overlooking qualitative analyses, favoring data over interpretation and flattening complexity, ambiguity, and rich qualitative analysis. To address this, some practical solutions have been implemented in specific research contexts. An example of historical linguistics is offered by the annotation of modal passages in a corpus of Latin texts, carried out in the framework of the project “WoPoss: A world of possibilities. Modal pathways over an extra-long period of time: the diachrony of modality in the Latin language” (Dell’Oro, 2019–2025). In the framework of the WoPoss project, modality is defined as the expression of possibility, necessity, probability and volition. The annotation of modality is a challenging task, as the type of modality expressed in a passage highly depends on its linguistic and extralinguistic context, and sometimes implies a certain degree of ambiguity. Moreover, the WoPoss annotation schema is very fine-grained, including 25 labels for types and sub-types of modality (Dell’Oro, 2023). For those cases in which the modal passage is considered ambiguous between two modal readings, or an agreement is not reached among annotators on the reading of the same passage, the WoPoss annotation guidelines allow for double annotation. This type of solution ensures the preservation of the linguistic data’s complexity, rather than oversimplifying it to fit the need for assigning unambiguous labels to a complex linguistic phenomenon – in this case, the semantic ambiguity of modal expressions in context. Adaptations such as these ensure that qualitative research is still possible, even with data-driven approaches. Furthermore, the problems associated with the simplification of complex information vary across the different fields that comprise AW studies. In Assyriology, for instance, cuneiform signs can be read in different ways, and scholars choose which readings fit best in the context of the said manuscript through a process called “normalization” (Finkel & Taylor, 2015). By the first millennium BCE, the learned scribes of Assyria and Babylon had developed incredibly sophisticated literary compositions full of duplicitous meanings (Worthington, 2020), rendering modern “normalization” of these manuscripts incredibly difficult. Choosing a single consistent reading of a cuneiform sign can oversimplify the nuanced meanings embedded in Mesopotamian linguistic and cultural contexts. For example, double consonants in a word can alter its meaning, such as changing a verb from active to passive, though cuneiform spelling does not always indicate this distinction. Additionally, the adaptation of cuneiform from Sumerian to Akkadian introduced further ambiguities, as the same symbols represented different sounds, with or without emphasis. This can lead to multiple interpretations, such as in manuscript K2821, where the scribe alternates between the spellings for šahāṭu ‘attack’ and šahātu ‘rinse’. Such complexities require subjective judgment by Assyriologists but can often be overlooked in standardized, data-driven approaches. Another key challenge faced by AW researchers in embracing data-driven approaches to the field is, sometimes, the lack of more technical expertise in areas such as data collection and systematic data organization. For instance, some AW scholars may not be familiar with basic data management tools like Excel. Instead of filling in spreadsheets to collect their data, they may rely on tables created on Word, which, however, make it extremely difficult for their data sets to be queried. Unlike Excel spreadsheets, Word tables do not allow for filtering, so data cannot be systematically organized, retrieved, or analyzed. Moreover, simple text files are not easily shared or integrated with other resources. On the other hand, a well-structured data set stored in a format such as comma separated values (CSV) allows for easy sharing, cross-referencing, and analysis by others in the field. Creating a CSV file is not as difficult as it might seem to AW researchers who are not used to digital methods:.xlsx files or spreadsheets manually created with Google Sheets may be easily converted into a CSV file. Thus, an AW specialist could easily create a CSV file where each column holds a specific piece of information (e.g., title, author, location, date, sentence) and each row represents an entry (e.g., an inscription, a word, an artifact). The resulting data set could then be shared and reused by other scholars in the field (see Sections 3.2 and 5). Examples of different data sets compiled for AW studies can be retrieved from the articles published by Burns, Farina, Marongiu, and Rodda (2023–2024). Usefully, once the data set is in CSV format, it can also be imported into various software tools for further and finer analysis, enhancing its potential. This approach allows classicists to work with their data in a flexible, scalable, and accessible manner, enabling them to make valuable contributions to digital classics projects without needing extensive technical expertise. Although CSV is a beginner-friendly format, it does not fully meet interoperability standards, limiting its long-term integration with other data sets and digital resources. More robust and interoperable formats, such as XML – particularly TEI for textual data – offer greater flexibility and sustainability for data structuring and exchange. TEI-XML is widely used in digital humanities projects (Cummings, 2018), ensuring consistency and compatibility with other scholarly resources. To address this, workshops and training sessions introduce traditional AW researchers^[15] or humanists more in general^[16] to the implementation of the TEI-XML format in their workflows, alongside simpler formats like CSV. Such initiatives help these scholars gradually develop digital skills, making it easier to transition from basic to more advanced data management and digital skills. A third challenge may be represented by the lack of interoperability between existing data sets. Different digital repositories in the AW fields often use incompatible formats or structures, making it difficult for researchers to combine data from multiple sources. For example, a researcher might have a data set on inscriptions from a particular region, but this data set may not align with a larger collection of archaeological data from a different source. If these data sets do not share common standards or metadata (e.g., location, date, type of artifact), it becomes extremely difficult to merge them (Section 3.3). However, working towards standardization and adopting shared labels and frameworks is an immensely valuable endeavor. By aligning data sets with standardized metadata fields or classifications, researchers can enhance the usability of their data across different projects and disciplines. Standardization requires effort and adaptation, but it unlocks the potential for larger-scale, collaborative analyses that would be impossible – or that would require more effort and therefore more time – with isolated or non-integrated data sets. One example is given by Farina (2023a), where the morphological features of the annotated word tokens follow the same notation as the Universal Dependencies (see Section 3.3). It is essential for AW scholars to research existing data sets before creating their own, in the same or in similar fields (e.g., searching data sets on modern languages before annotating historical languages). This helps avoid duplication and promotes integration with established resources: this way, scholars may ensure that new data aligns with broader disciplinarity efforts, rather than remaining isolated. Another challenge for traditional researchers in AW studies may be a lack of familiarity with secure data archiving standards. The urgency of data security was highlighted by the data breach at the British Library, carried out by the hacker group Rhysida – one of the most severe data security incidents in the United Kingdom (British Library, 2024). As a result, approximately 600GB of copyrighted material was illegally leaked, and the British Library suffered significant financial losses in restoring its security systems. While incidents of this scale are unlikely to directly impact AW researchers, the academic community has increasingly adopted a framework of data preservation and sharing fully aligned with the FAIR principles. In most cases, aside from specific circumstances involving copyright restrictions or ethical concerns, the primary goal is to ensure that data sets remain accessible and reusable for scholarly research. To achieve this, established research infrastructures such as CLARIN offer step-by-step guidance for depositing data sets, ensuring compliance with data and metadata standards. These platforms also require careful selection of an appropriate license, and, where access restrictions are needed, they manage access control on behalf of researchers, granting entry only to authorized users. This approach relieves researchers of the burden of acquiring complex technical expertise on data security while simultaneously requiring them to develop an informed understanding of best practices in open science and the licensing frameworks applicable to research data. Finally, the integration of digital AW studies within the AW curricula is still in its infancy but is growing as research increasingly adopts data-driven approaches. Many Classics departments still need to incorporate training on digital tools, data management, and interoperability into their programs, leaving students and researchers unaware of new approaches. Participation in data-driven AW studies is therefore limited, with a generalized lack of awareness about the usefulness of sharing data and the misconception that publishing data sets requires advanced technical expertise or should only be done by those specializing in digital methods. This perspective overlooks the fact that data sharing does not need to be one’s main research focus, but it can be a natural product of a broader project. For instance, data sets compiled during textual or archaeological analyses can be cleaned, structured, and published alongside traditional research outputs (Farina, Marongiu, & Rodda, 2024), greatly enhancing their accessibility and reusability.

4.2 The Benefits of Adopting a Data-Driven Approach in AW Studies

Building and utilizing data sets in AW studies offers numerous advantages, transforming the ways in which scholars approach their research. First, data sets enable more systematic analyses, allowing AW researchers to uncover patterns, trends, and connections that might remain invisible through traditional qualitative methods. This systematic approach fosters a deeper and more comprehensive understanding of AW-related phenomena. Digitizing AW data is particularly relevant in this context, as it enhances accessibility, making it easier for researchers to access, share, and utilize materials. Moreover, digitization helps mitigate issues of data sparsity and redundancy, ensuring that resources are not only preserved but also effectively utilized by a wider community. The creation and sharing of data sets play a pivotal role in fostering interdisciplinary research, providing AW scholars with opportunities to collaborate across disciplines and bringing fresh perspectives and methodologies to the study of the AW. By structuring data in accessible formats and making it openly available, data sets enable scholars from various fields to engage with AW data, facilitating the exchange of ideas and methods. Interdisciplinary collaboration, increasingly encouraged by universities, funders, and research groups, benefits AW scholars by introducing new tools and techniques while offering insights into how their work might intersect with other domains. Importantly, AW scholars are not expected to shoulder the entire burden of acquiring technical skills in data science and DH. Many successful projects in AW studies have received substantial funding precisely because they incorporate interdisciplinary teams, bringing together domain experts alongside specialists in computer science and DH. The ERC-funded project LiLa (Passarotti et al., 2020), the SNSF-funded project WoPoss (Dell’Oro, 2019–2025), and the project Ithaca, funded by the EU Horizon 2020 (Assael et al., 2022) exemplify how collaboration between AW scholars and technical specialists can effectively manage the complexities of creating, sustaining, and analyzing data sets or integrating them with AI. Such partnerships allow AW scholars to focus on their subject’s expertise while benefiting from the technical proficiency of DH and computer science professionals, ensuring that data sets are not only created but also maintained and enhanced with the latest methodologies. For example, partnerships with DH or computer science can yield advanced computational tools, such as taggers and parsers for ancient languages, which are essential for scaling up research and analyzing vast corpora of ancient texts. This is particularly critical in areas like papyrology and Assyriology, where enormous quantities of texts remain unstudied (Cobanoglu et al., 2024). Furthermore, making data sets reusable and reproducible not only enhances the reliability and replicability of findings but also supports the development of resources that amplify the reach and impact of AW scholarship. The process of publishing data sets also offers significant benefits to AW specialists, especially those with limited experience in this area. Data set publication typically follows a structured layout, providing a clear framework for organizing and presenting data in line with best practices. This process also fosters professional growth by exposing scholars to constructive feedback from peer reviewers, a practice traditionally reserved for research articles. Applying classical peer review to data sets not only strengthens the resource itself but also enhances its usability and reusability, while promoting collaboration and networking. A critical step toward fostering interdisciplinary collaboration is the sharing of one’s data. Publishing a data set in open-access repositories such as Zenodo or Figshare ensures it is freely available and subject to scrutiny by experts, enriching its value to the academic community. For even greater impact, scholars are encouraged to publish an accompanying data paper in conjunction with an open repository (Section 5).

4.3 Encouraging a Broader Participation in Data-Driven Approaches to AW Studies

Some of the limitations discussed in Section 4.1 can be, to a large degree, offset by encouraging a broader participation in data-driven AW studies. This can be done by presenting examples of data sharing and showing the benefits of this approach through organized events. Successful initiatives such as the workshop “Data Driven Classics: Exploring the Power of Shared Datasets,” organized by Andrea Farina and George Oliver (King’s College London) in July 2024, have gathered together PhD students and early career researchers interested in learning more about creating and sharing data sets. Such events not only introduce participants to practical approaches to data set creation but also provide hands-on opportunities for training in data conceptualization and collection, interoperability standards, and data publishing. Communication among scholars is essential to the process of data sharing: other events such as the “Data in Historical Linguistics Seminar Series,” launched by Andrea Farina and Mathilde Bru (King’s College London) in 2023–2024 have successfully created research collaborations and partnerships, as well as spreading awareness of the sorts of data being collected within the field of historical linguistics. Promoting greater participation requires consistent and systematic efforts to engage scholars at various stages of their careers, from students to senior researchers. Initiatives like DARIAH-EU, which offers resources and training for DH, are pivotal in equipping AW specialists with the tools and methodologies necessary for successful data set creation and publication. Encouraging institutions to incorporate such training into their curricula could ensure that emerging scholars are better prepared to engage with data-driven research from the outset of their careers. Academic working groups such as that of Pleiades for ancient geographical data (NYU’s Institute for the Study of the AW; cf. Elliott, 2021) are vital for advancing shared standards in data set creation and interoperability. Participating in collaborative efforts – such as contributing to repositories or establishing discipline-specific metadata standards – can empower scholars to make their data sets more widely usable and integrated with larger research projects. As these collaborations grow, they create a feedback loop, where best practices evolve and become embedded within the discipline, ultimately broadening the impact of data-driven approaches in AW studies. In addition to academic events and working groups, data journals are increasingly being considered as a means of promoting data-driven approaches in the field. For example, the Journal of Open Humanities Data’s special collection “Representing the AW through Data” (Burns et al., 2023–2024) attracted considerable interest within the field of AW studies (Farina et al., 2024), with the subdisciplines of data sets ranging from Latin literature to archaeological data from Roman rural landscapes to historical linguistics and cuneiform studies. For many contributors, this was their first time publishing a data set and accompanying data paper. For example, Bru (2023a, b) presented a data set on the average length of Greek words from the 5th century BCE to the 2nd century CE, demonstrating an increase in average word length over time. The publication of this data set in the Harvard Dataverse repository was prompted by the journal guidelines. As a direct result of publishing her data set, the author collaborated with other linguists and received invaluable feedback from peer reviewers, notably regarding visually presenting the data through boxplots and performing further statistical tests on the data to strengthen the validity of her conclusion. The call for papers for this collection made it clear what constituted a data set and gave helpful instructions to authors publishing one for the first time. By providing detailed submission guidelines and peer review processes, these journals also play an educational role, ensuring that contributors develop essential skills in organizing, documenting, and structuring their data.

5 Data Papers for AW Studies: A Bridge Between Tradition and Innovation

So far, we have explored the role of data in AW studies, its collection, and the benefits it offers. However, many AW scholars are uncertain about what to do with the data sets they have gathered. Recognizing that data sets are a vital component of research, one potential solution to this challenge is the publication of a data paper. This would not only highlight the importance of the data itself but also contribute to the broader academic community by enhancing the visibility and accessibility of valuable data sets.

As demonstrated by McGillivray et al. (2022a), data papers are highly effective in enhancing a data set’s scope and reusability, making them increasingly popular in the field of AW studies. While data papers have been a well-established academic genre in STEM fields for over 25 years, particularly in areas like life and medical sciences (Schöpfel, Farace, Prost, & Zane, 2019), they are increasingly gaining traction in the Humanities and AW studies (García-García, López-Borrull, & Peset, 2015; Walters, 2020). The purpose of a data paper is to describe data sets compiled during research and enhance their discoverability, usability, and reproducibility. These papers provide a dedicated platform to highlight a data set’s importance and its potential applications, offering a bridge for AW specialists unfamiliar with DH to share their data easily. Peer-reviewed data papers also invite valuable expert commentary, helping to fine-tune both the data set and its proposed applications. This approach has already proven impactful also in fields like Assyriology, where data sets are essential for processing and sharing large volumes of information, making research more efficient and accessible. Sharing these resources, through both data sets and data papers, not only preserves knowledge but also fosters collaboration, interdisciplinary engagement, and new interpretations, ultimately advancing AW studies. Publishing a data set in journals such as the Journal of Open Humanities Data requires the author to provide transparency about their data collection methods and to deposit the data set in an open-access repository, such as Zenodo or Figshare, prior to the publication of the data paper itself (Sections 4.2 and 4.3). A data paper is part of a virtuous cycle, interlinking the data set it describes, associated codes, and the research results derived from it (Farina et al., 2024; McGillivray et al., 2022a). The open repository’s data set description references the data paper, which, in turn, cites the data set. Research articles then cite both the data set and the data paper. This not only boosts data visibility, but also ensures that the data can be accessed, scrutinized, and referenced by other scholars. The publication of a data set allows the creation of metadata, such as the article’s title, authors, keywords, and ORCID, making data sets more discoverable through academic search engines like eDiscovery or Google Scholar, increasing their visibility among scholars in classical studies and related fields. Additionally, publishing in a peer-reviewed journal provides scholars with academic credit for their data collection efforts, recognizing data sets as legitimate scholarly contributions and addressing gaps in traditional publication practices (Ruediger & MacDougall, 2024, p. 3). In AW studies and other fields, traditional research articles have typically received more academic recognition than data papers, as the former discuss hypotheses and intellectual inquiry, while the latter focus on describing data (Rowley & Hartley, 2008; Schöpfel et al., 2019). However, data papers should be recognized as valuable contributions that align with FAIR practices and provide insights into the challenges of data set creation in any field (Schöpfel et al., 2019). This is particularly important in AW studies, where data collection, standardization, and sharing are complex, as discussed in Sections 3 and 4. Detailing the data collection process and sharing adopted solutions with the community significantly enriches the field’s knowledge base. Nonetheless, as discussed in this section, these considerations should not deter the publication of data papers, which play an essential role in advancing the field. A data paper also requires authors to consider how their data sets might be potentially reused by the academic community in different fields, fostering collaboration and encouraging further research. For instance, Farina (2023b) describes a data set collecting occurrences of Ancient Greek and Latin words connected to the semantic field of the sea, analyzed across different linguistic parameters. Its potential for reuse, however, spans different disciplines: synchronic/diachronic linguistics, literary-geographical and anthropological studies, and cultural history. Journals publishing data papers also contribute, alongside their repository, to the long-term preservation of data sets, safeguarding them for future researchers. During the publication process, data sets undergo peer review, ensuring their quality, accuracy, and relevance. This scrutiny enhances trust among scholars, who can safely rely on the work and methodology of other scholars, and establishes a standard for data sharing within the field of AW studies.

6 Looking Ahead: The Future of Data-Driven Approaches to AW Studies

As we have explored so far, the creation and dissemination of data sets have become central to the evolution of AW studies. The increasing availability of digitized resources, alongside new technologies and interdisciplinary collaborations, is expected to reshape research methodologies and enhance the accessibility of AW studies. This final section considers future directions in how data sets are created, shared, and used in the study of the ancient world.

The continued digitization of ancient texts and artifacts will provide scholars with unprecedented access to diverse data sets. Ancient texts are increasingly being digitized and made available in interoperable formats, allowing researchers to conduct large-scale corpus analyses. The field of Assyriology provides a compelling example in this respect. Platforms like the Electronic Babylonian Library (eBL, 2023) are revolutionizing the study of Babylonian texts by making scholarly compositions on science, mathematics, and astronomy widely accessible. These platforms also integrate artificial intelligence tools to assist in deciphering cuneiform texts (for the integration of technology/AI and ancient texts, see e.g., Assael et al., 2022; Brusuelas, 2021; Chapman et al., 2021a; Chapman, Parker, Parsons, & Seales, 2021b; Parker et al., 2019; Sommerschield, 2020; Vesuvius Challenge, 2024), helping scholars explore the multiple meanings of signs and words. Beyond textual data, data sets encompassing 3D scans of artifacts, coins, or tablets, as well as digital editions of manuscripts, are emerging as invaluable resources (e.g., Palladino & Bodard, 2023), enabling detailed studies of material culture and further fostering interdisciplinary approaches. These resources not only broaden the scope of research, but also ensure that data sets are more accessible to AW scholars.

While linguistic research has probably been at the forefront of data-driven approaches to AW studies, especially Classics, other subfields are beginning to embrace these methods. For instance, historical data on trade routes, migration patterns, and economic systems could be integrated into geospatial models to visualize and analyze ancient networks (e.g., Holleran, 2021). This integration of data sets into more traditional research frameworks will likely dissolve the division between qualitative and quantitative methodologies in AW studies. Scholars in areas such as linguistics, history, and archaeology are beginning to adopt computational tools for data analytics while maintaining the interpretive richness that characterizes AW studies. For instance, structured linguistic data sets can complement traditional close readings, creating a more holistic approach to ancient texts and contexts. Such integration emphasizes the dual role of data sets as both research tools and shared resources that can be reused across disciplines.

Data sets will also play a pivotal role in democratizing access to AW studies. By making curated data sets freely available and easy to use, scholars can engage wider audiences and enhance educational practices. For instance, data sets containing interactive 3D models of artifacts may be used in high schools and universities, offering students hands-on experiences with primary sources. Intuitive tools for exploring data sets could also foster greater public interest, enabling non-specialists to engage with the ancient world in meaningful ways.

7 Conclusion

In this article, we have explored the challenges and opportunities presented by adopting data-driven approaches in AW studies. We have also presented the best practices for data collection, sharing, and reuse in the field. As we explained in Section 1, the gradual yet successful integration of quantitative and computational methods into the Humanities has also influenced AW studies, leading to a growing presence of journals, special collections, and events dedicated to this scholarly community. The success of the Journal of Open Humanities Data’s special collection “Representing the AW through Data” demonstrates the community’s increasing interest in this new methodological approach to the discipline (Farina et al., 2024).

Our starting point was to highlight the complexities involved in defining, collecting, curating, and analyzing data within AW studies. The diverse and heterogeneous nature of AW data poses significant challenges in terms of standardization, fragmentation, and accessibility. The scarcity and fragmentation of data, together with the difficulty of ensuring interoperability between different data sets, complicate efforts to conduct large-scale analyses and create reusable resources. Additionally, the absence of widely shared standards for data encoding and the reluctance to define research materials as “data” in AW studies and, more widely, in the Humanities might hinder progress in adopting data-driven approaches in the field. Nonetheless, we have pointed to valuable solutions to overcome these challenges. AW scholars can benefit from embracing standardized formats, such as TEI-XML and RDF, and promoting adherence to FAIR principles for data collection and sharing. These practices enhance data accessibility, facilitate collaboration, and foster the reuse of data sets, thus contributing to the broader advancement of AW studies. Interdisciplinary collaboration and the adoption of common data standards can also help harmonize data sets, making them more interoperable and suitable for large-scale research. Employing a more structured approach to data collection and curation may transform the work of AW specialists into a more impactful resource for the academic community. It may also ensure that their findings are not only preserved, but also accessible for future research across disciplines.

We have then discussed the key difficulties scholars face, such as the risk of oversimplification when reducing complex phenomena into structured data sets, the loss of nuanced interpretations, and the lack of technical knowledge by AW specialists, which usually leads to difficult interoperability among data sets. At the same time, we have demonstrated the significant benefits of data-driven approaches, emphasizing how they enable systematic analyses, facilitate interdisciplinary collaboration, and expand accessibility to research materials. Solutions such as the use of double annotations for ambiguous data help preserve complexity while still maintaining systematic rigor. Structured data sets in interoperable formats, such as CSV files, allow for greater accessibility, integration, and reusability across disciplines. Furthermore, training initiatives, workshops, and academic collaborations have proven instrumental in encouraging broader participation and equipping AW scholars with the skills needed for data-driven research.

The rise of data papers for AW studies represents a crucial step in bridging the gap between traditional scholarship and the growing adoption of digital methodologies in the field. Historically, AW scholars have focused on the close, interpretive analysis of primary sources, with data sets often overlooked or treated as secondary to the findings. However, recognizing the importance of the data sets themselves, rather than just the conclusions drawn from them, has become more and more essential. Data papers provide a platform for scholars to publish and share their data sets, ensuring their discoverability and reproducibility. They also aim to stress the reuse potential of these data sets, favoring their wider accessibility and enabling further research and collaboration across disciplines. By emphasizing the reuse potential, data papers encourage scholars to consider how their data sets might contribute to broader academic conversations, supporting the development of new hypotheses, methodologies, and comparative studies. This not only enhances the impact of the original research but also fosters a more collaborative and open scholarly environment.

Nevertheless, substantial work still remains to be done in terms of standardizing data formats and improving interoperability across the field. Overcoming these obstacles will require a concerted effort from scholars, institutions, and funding bodies to promote open data practices and to ensure that the data sets generated within AW studies are treated as valuable, stand-alone scholarly contributions. Existing initiatives such as Papyri.info (Sosin, 2010), employing TEI-XML standards for collaborative editing of papyrological resources, the Pelagios Network (Kahn et al., 2021; Simon, Barker, & Isaksen, 2012; Simon, Isaksen, Barker, & de Soto Cañamares, 2016; Vitale et al., 2021), leveraging Linked Open Data for spatial analysis of the AW, and Open Context (Kansa & Kansa, 2007, 2022; Kansa, Kansa, & Arbuckle, 2014), facilitating structured sharing of archaeological data, offer valuable models for best practices in data sharing and reuse for the AW. Following such examples, AW studies may expand the scope and scale of data-driven projects. This includes developing more comprehensive cross-disciplinary databases, fostering global collaborations to harmonize diverse data sets, and integrating emerging technologies such as machine learning and artificial intelligence to enhance data analysis and interpretation. Investing in training programs and infrastructures supporting open science principles will be crucial for equipping AW scholars with the technical skills necessary to navigate this evolving landscape. Future work should also prioritize the creation of flexible frameworks that accommodate the complexity and diversity of AW data, ensuring that nuanced interpretations are preserved while promoting interoperability. This way, AW studies can embrace the full potential of data-driven research, ensuring that the discipline evolves and remains relevant in an increasingly digital academic landscape. Ultimately, by fostering a culture of openness, collaboration, and reuse, AW studies can continue to grow, with data sets playing a central role in shaping the future of the discipline.

Funding information: The authors state no funding involved.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. AF conceptualized and supervised the writing of this article. He wrote Sections 1, 2, 6, and 7 of the original draft, and reviewed and contributed to Sections 3–5. PM wrote Section 3 of the original draft and contributed to Sections 4 and 5. MB wrote Section 4 of the original draft. DB wrote Section 5 of the original draft and contributed to Section 4.
Conflict of interest: The authors state no conflict of interest.

References

Allen, R., & Hartland, D. (2018). FAIR in practice – Jisc report on the findable accessible interoperable and reusable data principles. Geneva, Switzerland: Zenodo.Search in Google Scholar

Almas, B., & Beaulieu, M. C. (2013). Developing a new integrated editing platform for source documents in classics. Literary and Linguistic Computing, 28(4), 493–503.10.1093/llc/fqt046Search in Google Scholar

Amato, G., Casarosa, V., Martineau, P., Orlandi, S., Santucci, R., & Giberti, L. M. C. (2014). EAGLE - Europeana Network of Ancient Greek and Latin Epigraphy, A Digital Bridge to the Ancient World. In P. Ronzino & F. Niccolucci (Eds.), Proceedings of the Workshop on Horizon2020 and Creative Europe vs Digital Heritage: A European Projects Crossover (pp. 25–32).Search in Google Scholar

Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., … de Freitas, N. (2022). Restoring and attributing ancient texts using deep neural networks. Nature (London), 603(7900), 280–283. doi: 10.1038/s41586-022-04448-z.Search in Google Scholar

Bamman, D., & Crane, G. (2006). The design and use of a Latin dependency treebank. Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT2006) (pp. 67–78).Search in Google Scholar

Baraz, Y. (2007). Revelations of lexicography: The daily learning at the thesaurus. Transactions of the American Philological Association, 137(2), 497–501. doi: 10.1353/apa.2008.0001.Search in Google Scholar

Blanke, T., Bryant, M., Hedges, M., Aschenbrenner, A., & Priddy, M. (2011). Preparing DARIAH. In IEEE 7th International Conference on E-Science (e-Science), 2011 (pp. 158–165).10.1109/eScience.2011.30Search in Google Scholar

Branco, A., Eskevich, M., Frontini, F., Hajič, J., Hinrichs, E., Jong, F., … Zinn, C. (2023). The CLARIN infrastructure as an interoperable language technology platform for SSH and beyond. Language Resources and Evaluation, 1–32. doi: 10.1007/s10579-023-09658-z.Search in Google Scholar

British Library. (2024). Learning lessons from the cyber-attack. British Library cyber incident review. https://www.bl.uk/home/british-library-cyber-incident-review-8-march-2024.pdf/.Search in Google Scholar

Bru, M. (2023a). Word Lengths in Classical and Post-classical Greek. Dataset. doi: 10.7910/DVN/HKP1VU.Search in Google Scholar

Bru, M. (2023b). Word Lengths in Classical and Post-classical Greek. Journal of Open Humanities Data, 9(19), 1–6. doi: 10.5334/johd.121.Search in Google Scholar

Brunner, T. F. (1993). Classics and the computer: The history. In J. Solomon (Ed.), Accessing antiquity: The computerization of classical databases (pp. 10–33). Tucson: University of Arizona Press.Search in Google Scholar

Brusuelas, J. H. (2021). Scholarly editing and AI: Machine predicted text and Herculaneum papyri. Magazén, 2(1), 45–70.10.30687/mag/2724-3923/2021/03/002Search in Google Scholar

Buchholz, S., & Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X) (pp. 149–164).10.3115/1596276.1596305Search in Google Scholar

Burns, P., Farina, A., Marongiu, P., & Rodda, M. A. (Eds.). (2023–2024). Representing the Ancient World through Data. Journal of Open Humanities Data, special collection, 9–10.Search in Google Scholar

Busa, R. (1974–1980). Index Thomisticus. Stuttgart-Bad Cannstatt: Frommann-Holzboog.Search in Google Scholar

Cecchini, F., Sprugnoli, R., Moretti, G., & Passarotti, M. (2020). Udante: First steps towards the universal dependencies treebank of Dante’s Latin works. Proceedings of the Seventh Italian Conference on Computational Linguistics (pp. 99–105). Accademia University Press.10.4000/books.aaccademia.8653Search in Google Scholar

Celano. G. G. A. (2024). Opera Graeca Adnotata: Building a 34M + Token Multilayer Corpus for Ancient Greek. ArXiv. doi: 10.48550/ARXIV.2404.00739.Search in Google Scholar

Chapman, C. Y., Parker, C. S., Bertelsman, A., Gessel, K., Hatch, H., Seevers, K., … Seales, W. B. (2021a). The Digital Compilation and Restoration of Herculaneum Fragment P.Herc.118. Manuscript Studies: A Journal of the Schoenberg Institute for Manuscript Studies, 6(1), 1–32. doi: 10.1353/mns.2021.0000.Search in Google Scholar

Chapman, C. Y., Parker, C. S., Parsons, S., & Seales, W. B. (2021b). Using METS to express digital provenance for complex digital objects. Metadata and Semantic Research, 1355, 143–154. doi: 10.1007/978-3-030-71903-6_15.Search in Google Scholar

Cobanoglu, Y., Sáenz, L., Khait, I., & Jiménez, E. (2024). Sign detection for cuneiform tablets. it – Information Technology, 66(1), 28–38. doi: 10.1515/itit-2024-0028.Search in Google Scholar

Crane, G. (1987). From the old to the new: Integrating hypertext into traditional scholarship. In HYPERTEXT ’87: Proceedings of the ACM Conference on Hypertext (pp. 51–57). New York, NY, USA: ACM Press.10.1145/317426.317432Search in Google Scholar

Crane, G. (1991). Generating and parsing classical greek. Literary and Linguistic Computing, 6(4), 243–245.10.1093/llc/6.4.243Search in Google Scholar

Crane, G. (2004). Classics and the computer: An end of the history. In S. Schreibman, R. Siemens, & J. Unsworth (Eds.), A companion to digital humanities. Oxford: Blackwell. doi: 10.1002/9780470999875.ch4.Search in Google Scholar

Crane, G., Bamman, D., Cerrato, L., Jones, A., Mimno, D., Packel, A., … Weaver, G. (2006). Beyond digital incunabula: Modeling the next generation of digital libraries. Proceedings of the 10th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2006), Alicante (Spain) (pp. 341–352).10.1007/11863878_30Search in Google Scholar

Cummings, G. (2018). A world of difference: Myths and misconceptions about the TEI. Digital Scholarship in the Humanities, 34(1), 58–79. doi: 10.1093/llc/fqy071.Search in Google Scholar

Dell’Oro, F. (2019–2025). WoPoss. A corpus to analyse the evolution of modality in the diachrony of the Latin language. Swiss National Science Foundation. https://woposs.unine.ch/.Search in Google Scholar

Dell’Oro, F. (2023). WoPoss guidelines for the annotation of modality. Zenodo. (Version 4). 10.5281/zenodo.10427053.Search in Google Scholar

Develaki, M. (2020). Comparing crosscutting practices in STEM disciplines: Modeling and reasoning in mathematics, science, and engineering. Science & Education, 29, 949–979. doi: 10.1007/s11191-020-00147-1.Search in Google Scholar

Digital Research Infrastructure for the Arts and Humanities. (2024). DARIAH-EU Annual Report 2023. Zenodo. doi: 10.5281/zenodo.14007767.Search in Google Scholar

eBL. (2023). On the Launch of the Electronic Babylonian Library Platform. Workshop at the Ludwig-Maximilians Universität München, 3 February 2023.Search in Google Scholar

Elliott, T. (2021). The Pleiadic Gaze: Looking at Archaeology from the Perspective of a Digital Gazetteer. Classical Archaeology in the Digital Age – The AIAC Presidential Panel. doi: 10.11588/PROPYLAEUM.708.C10612.Search in Google Scholar

Farina, A. (2023a). 25 + SEA words morpho-semantically annotated in Ancient Greek and Latin. King’s College London. Dataset. doi: 10.18742/23968773.v1.Search in Google Scholar

Farina, A. (2023b). Lost at Sea: A Dataset of 25 + SEA words morpho-semantically annotated in Ancient Greek and Latin. Journal of Open Humanities Data, 9(24), 1–7. doi: 10.5334/johd.139.Search in Google Scholar

Farina, A., Marongiu, P., & Rodda, M. A. (2024). Editorial: Representing the ancient world through data. Journal of Open Humanities Data, 10(57), 1–6. doi: 10.5334/johd.245.Search in Google Scholar

Finkel, I. L., & Taylor, J. (2015). Cuneiform. London: The British Museum.Search in Google Scholar

Gamba, F., & Zeman, D. (2023). Universalising Latin Universal Dependencies: A harmonisation of Latin treebanks in UD. Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023) (pp. 7–16).Search in Google Scholar

García-García, A., López-Borrull, A., & Peset, F. (2015). Data journals: Eclosión de nuevas revistas especializadas en datos. El profesional de la información, 24(6), 845–854. doi: 10.3145/epi.2015.nov.17.Search in Google Scholar

Given, L. M. (2008). Humanities, qualitative research inThe SAGE Encyclopedia of Qualitative Research Methods (pp. 402–407). Los Angeles: SAGE Publications, Inc., doi: 10.4135/9781412963909.Search in Google Scholar

Haug, D. T., & Jøhndal, M. (2008). Creating a parallel treebank of the old Indo-European Bible translations. In Proceedings of the second workshop on language technology for cultural heritage data (LaTeCH 2008) (pp. 27–34).Search in Google Scholar

Hobbs, R. (2003). Power of public: The Portable Antiquities Scheme and regional museums in England and Wales. Proceedings of the 8th Meeting of the International Committee of Money and Banking Museums (ICOMON), Barcelona, 2001 (pp. 116–125).Search in Google Scholar

Holleran, C. (2021). Mapping migration in Roman Iberia. https://mappingromanmigration.exeter.ac.uk/index.html.Search in Google Scholar

Jakubíček, M., Kilgarriff, A., Kovář, V., Rychlý, P., & Suchomel, V. (2013). The TenTen corpus family. In 7th International Corpus Linguistics Conference CL (pp. 125–127).10.1007/s40607-014-0009-9Search in Google Scholar

Johnson, K. P., Burns, P. J., Stewart, J., Cook, T., Besnier, C., & Mattingly, W. (2021). The Classical Language Toolkit: An NLP framework for pre-modern languages. Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations (pp. 20–29).10.18653/v1/2021.acl-demo.3Search in Google Scholar

Kahn, R., Isaksen, L., Barker, E., Simon, R., de Soto, P., & Vitale, V. (2021). Pelagios – Connecting Histories of Place. Part II: From Community to Association. International Journal of Humanities and Arts Computing, 15(1–2), 85–100.10.3366/ijhac.2021.0263Search in Google Scholar

Kalinin, N. A., & Skvortsov, N. A. (2023). Difficulties of FAIR Principles Implementation in Cross-Domain Research Infrastructures. Lobachevskii Journal of Mathematics, 44, 147–156.10.1134/S199508022301016XSearch in Google Scholar

Kansa, E. C., & Kansa, S. W. (2007) Open context: Collaborative data publication to bridge field research and museum collections. In J. Trant & D. Bearman (Eds.), International Cultural Heritage Informatics Meeting (ICHIM07): Proceedings. Toronto: Archives & Museum Informatics.Search in Google Scholar

Kansa, E. C., & Kansa, S. W. (2022). Promoting data quality and reuse in archaeology through collaborative identifier practices. Proceedings of the National Academy of Sciences (PNAS), 119(43), 1–9.10.1073/pnas.2109313118Search in Google Scholar

Kansa, E. C., Kansa, S. W., & Arbuckle, B. (2014). Publishing and pushing: Mixing models for communicating research data in archaeology. International Journal of Digital Curation, 9(1), 57–70.10.2218/ijdc.v9i1.301Search in Google Scholar

Kintigh, K. (2006). The promise and challenge of archaeological data integration. American Antiquity, 71(3), 567–578. doi: 10.2307/40035365.Search in Google Scholar

Korkiakangas, T. (2021). Late Latin Charter Treebank: Contents and annotation. Corpora, 16(2), 191–203. doi: 10.3366/cor.2021.0217.Search in Google Scholar

Krebs, C. (2009). You say ‘putator’. The first word on the first day of a Latin lexicographer. The Times Literary Supplement, 6, 14–15.Search in Google Scholar

Ma, R. (2024). Toward an open humanities data. Current states, challenges, and cases. In X. Wang, M. L. Zeng, J. Gao, & K. Zhao (Eds.), Intelligent Computing for Cultural Heritage. Global Achievements and China’s Innovations (pp. 3–24). London: Routledge.10.4324/9781032707211-2Search in Google Scholar

Mannocci, A., Casarosa, V., Manghi, P., & Zoppi, F. (2014). The Europeana network of ancient Greek and Latin epigraphy data infrastructure. In S. Closs, R. Studer, E. Garoufallou, & M. A. Sicilia (Eds.), Metadata and semantics research (Vol. 478, pp. 286–300). Communications in Computer and Information Science. Cham: Springer International Publishing. doi: 10.1007/978-3-319-13674-5_27.Search in Google Scholar

Marongiu, P., Pedrazzini, N., Ribary, M., & McGillivray, B. (2025) Le Journal of Open Humanities Data: Enjeux et défis dans la publication de data papers pour les sciences humaines. In C. Kosmopoulos & J. Schopfel (Eds.), Publier, Partager, Réutiliser les Données de la Recherche: Les Data Papers et Leurs enjeux. Villeneuve-d’Ascq, France: Presses Universitaires du Septentrion.Search in Google Scholar

McDonough, J. (1959). Computers and classics. The Classical World, 53(2), 44–50.10.2307/4344244Search in Google Scholar

McGillivray, B., & Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In P. Bennett, M. Durrell, S. Scheible, & R. J. Whitt (Eds.), New Methods in Historical Corpus Linguistics. Tübingen: Narr.Search in Google Scholar

McGillivray, B., Kondakova, D., Burman, A., Dell’Oro, F., Bermúdez Sabel, H., Marongiu, P. & Márquez Cruz, M. (2022b). A new corpus annotation framework for Latin diachronic lexical semantics. Journal of Latin Linguistics, 21(1), 47–105. doi: 10.1515/joll-2022-2007.Search in Google Scholar

McGillivray, B., Marongiu P., Pedrazzini N., Ribary M., Wigdorowitz M., & Zordan E. (2022a). Deep impact: A study on the impact of data papers and datasets in the humanities and social sciences. Publications, 10(39), 1–40. doi: 10.3390/publications10040039.Search in Google Scholar

Nivre, J., De Marneffe, M. C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., … Zeman, D. (2016). Universal dependencies v1: A multilingual treebank collection. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 1659–1666).Search in Google Scholar

NSTA. (2011). Beginning a STEM Research Project. National Science Teaching Association. https://www.cusd80.com/cms/lib/AZ01001175/Centricity/Domain/9860/Chapter%201%20Introduction.pdf.Search in Google Scholar

Oldman, D., Doerr, M., de Jong, G., Norton, B., & Wikman, T. (2014). Realizing lessons of the last 20 years: A manifesto for data provisioning & aggregation services for the digital humanities (a position paper). D-lib magazine, 20(7/8).10.1045/july2014-oldmanSearch in Google Scholar

Orlandi, S. (2016). Ancient inscriptions between citizens and scholars: The double soul of the EAGLE project. In G. Bodard & M. Romanello (Eds.), Digital classics outside the echo-chamber: Teaching, knowledge exchange & public engagement (pp. 205–221). London: Ubiquity Press. doi: 10.5334/bat.l.Search in Google Scholar

Palladino, C., & Bodard, G. (Eds.). (2023). Can’t Touch This. Digital Approaches to Materiality in Cultural Heritage. London: Ubiquity Press.10.5334/bcvSearch in Google Scholar

Pantelia, M. (2000). No û s, into chaos’: The creation of the Thesaurus of the Greek Language. International Journal of Lexicography, 13(1), 1–11, doi: 10.1093/ijl/13.1.1.Search in Google Scholar

Parker, C. S., Parsons, S., Bandy, J., Chapman, C., Coppens, F., & Seales, W. B. (2019). From invisibility to readability: Recovering the Ink of Herculaneum. Public Library of Science, PLOS ONE, 14(5), 1–17. doi: 10.1371/journal.pone.0215775.Search in Google Scholar

PAS (Portable Antiquities Scheme). (2012). Treasure and Portable Antiquities. Statistical Release. UK Government – Department for Culture, Media, and Sport. https://assets.publishing.service.gov.uk/media/5a79e58b40f0b670a80263af/Statistics_Release_Treasure_and_Portable_antiques_2010-2011.pdf.Search in Google Scholar

Passarotti, M. C. (2011). Language resources. The state of the art of Latin and the Index Thomisticus treebank project. ALIENTO. Échanges sapientiels en Méditerranée (pp. 301–320).Search in Google Scholar

Passarotti, M., Mambrini F., Franzini G., Cecchini F. M., Litta E., Moretti G., … Sprugnoli R. (2020). Interlinking through Lemmas. The Lexical Collection of the LiLa Knowledge Base of Linguistic Resources for Latin. In M. Passarotti (Ed.), Current Approaches in Latin Lemmatization, Studi e Saggi Linguistici, (Vol. LVIII (1), pp. 177–212). doi: 10.4454/ssl.v58i1.277.Search in Google Scholar

Poljak Bilić, L., & Posavec, K. (2024). FAIRness of research data in the European humanities landscape. Publications, 12(6). doi: 10.3390/publications12010006.Search in Google Scholar

Revellio, M. (2015). Classics and the Digital Age. Advantages and limitations of digital text analysis in classical philology. LitLingLab Pamphlet, 2, 1–8. https://kops.uni-konstanz.de/server/api/core/bitstreams/620defc4-effd-4224-bfb6-782e20748e01/content.Search in Google Scholar

Romano, A. J. (2011). Classics and digital humanities. Expositions, 5(2), 142–146.Search in Google Scholar

Rowley, J., & Hartley, R. (2008). Organizing knowledge. An introduction to managing access to information. Aldershot: Ashgate Publishing.Search in Google Scholar

Ruediger, D. & MacDougall, R. (2024). Are the Humanities Ready for Data Sharing? Ithaka S + R. 6 March 2023. doi: 10.18665/sr.318526.Search in Google Scholar

Scheidel, W., Meeks, E., & Weiland, J. (2012). ORBIS: The Stanford geospatial network model of the Roman World. Princeton/Stanford Working Papers in Classics. https://orbis.stanford.edu/orbis2012/ORBIS_v1paper_20120501.pdf.Search in Google Scholar

Schöpfel, J., Farace, D. J., Prost, H., & Zane, A. (2019). Data papers as a new form of knowledge organization in the field of research data. 12ème Colloque international d’ISKO-France: Données et mégadonnées ouvertes en SHS: De nouveaux enjeux pour l’état et l’organisation des connaissances?. Montpellier, France: ISKO France. https://shs.hal.science/halshs-02284548v1.Search in Google Scholar

Simon, R., Barker, E., & Isaksen, L. (2012). Exploring Pelagios: A visual browser for geo-tagged datasets. International Workshop on Supporting Users’ Exploration of Digital Libraries, Paphos (Cyprus) (pp. 1–6).Search in Google Scholar

Simon, R., Isaksen, L., Barker, E., & de Soto Cañamares, P. (2016). The Pleiades gazetteer and the Pelagios project. In M. L. Berman, R. Mostern, & H. Southall (Eds.), Placing names: Enriching and integrating gazetteers (pp. 97–109). Bloomington/Indianapolis: Indiana University Press. doi: 10.2307/j.ctt2005zq7.Search in Google Scholar

Sommerschield, T. (2020). Restoring ancient texts using Machine Learning: A case-study on Greek and Latin epigraphy. Papers of the British School at Rome, 88, 387–388.10.1017/S0068246220000240Search in Google Scholar

Sosin, J. D. (2010). Digital papyrology: A new platform for collaborative control of DDbDP, HGV, and APIS Data. In 26th International Congress of Papyrology, Geneva, August 11.Search in Google Scholar

Sprugnoli, R. & Passarotti, M. (2024). Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024. Turin: ELRA and ICCL.Search in Google Scholar

Tasovac, T., Romary, L., Tóth-Czifra E., Ackermann R. C., Alves, D., Chambers, … Viola, L. (2023). The Role of Research Infrastructures in the Research Assessment Reform: A DARIAH Position Paper. HAL Open Science. https://hal.science/hal-04136772v1.Search in Google Scholar

TEI Consortium (Eds.). (2024). Guidelines for electronic text encoding and interchange. http://www.tei-c.org/P5/.Search in Google Scholar

Tóth-Czifra, E. (2019). The risk of losing thick description: Data management challenges Arts and Humanities face in the evolving FAIR data ecosystem. In J. Edmond (Ed.), Digital Technology and the Practices of Humanities Research. Cambridge: Open Book Publishers.10.11647/OBP.0192.10Search in Google Scholar

Valiela, I. (2001). Doing science: Design, analysis, and communication of scientific research. New York: Oxford University Press.10.1093/oso/9780195079623.001.0001Search in Google Scholar

Vesuvius Challenge. (2024). AI reads ancient scroll buried by Vesuvius eruption. https://phys.org/news/2024-02-ai-ancient-scroll-vesuvius-eruption.html.Search in Google Scholar

Vitale, V., de Soto, P., Simon, R., Barker, E., Isaksen, L., & Kahn, R. (2021). Pelagios – Connecting histories of place. Part I: Methods and tools. International Journal of Humanities and Arts Computing, 15(1–2), 5–32.10.3366/ijhac.2021.0260Search in Google Scholar

Walters, W. H. (2020). Data journals: Incentivizing data access and documentation within the scholarly communication system. Insights the UKSG Journal, 33(1), 1–20. doi: 10.1629/uksg.510.Search in Google Scholar

Wigdorowitz, M., Ribary, M., Farina, A., Lima, E., Borkowski, D., Marongiu, P., … McGillivray, B. (2024). It takes a village! editorship, advocacy, and research in running an open access data journal. Publications, 12(3), 24. doi: 10.3390/publications12030024.Search in Google Scholar

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., … Mons, B. l. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. doi: 10.1038/sdata.2016.18.Search in Google Scholar

Worthington, M. (2020). Ea’s duplicity in the Gilgamesh flood story. London: Routledge. doi: 10.4324/9780429424274.Search in Google Scholar

Received: 2024-12-11

Revised: 2025-02-04

Accepted: 2025-03-04

Published Online: 2025-03-28

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/opis-2025-0014

Keywords for this article

data-driven methodologies; Ancient World studies; FAIR principles; data collection and reuse; data paper

Creative Commons

BY 4.0