Exploring Large-Scale Digital Archives – Opportunities and Limits to Use Unsupervised Machine Learning for the Extraction of Semantics
-
Seth van Hooland
Abstract
The current excitement in regards to machine learning has spurred enthusiasm amongst collection holders and historians alike to rely on algorithms to reduce the amount of manual labor required for management and appraisal of large volumes of non-structured archival content. The Digital Humanities and commercial archival software promote out-of-the-box tools for auto-classification, but is the adoption of machine learning as straightforward as it is currently presented in both the popular press and the Digital Humanities literature? This chapter brings a sense of pragmatism to the debate by giving an overview of both possibilities and limits of machine learning to extract semantics from large collections of digitized textual archives. Two methods have gained substantial popularity: Topic Modeling (TM) and Word Embeddings (WE). This chapter introduces these non-supervised machine learning methods to the community of historians, based on an experimental case-study of digitized archival holdings of the European Commission (EC).
Abstract
The current excitement in regards to machine learning has spurred enthusiasm amongst collection holders and historians alike to rely on algorithms to reduce the amount of manual labor required for management and appraisal of large volumes of non-structured archival content. The Digital Humanities and commercial archival software promote out-of-the-box tools for auto-classification, but is the adoption of machine learning as straightforward as it is currently presented in both the popular press and the Digital Humanities literature? This chapter brings a sense of pragmatism to the debate by giving an overview of both possibilities and limits of machine learning to extract semantics from large collections of digitized textual archives. Two methods have gained substantial popularity: Topic Modeling (TM) and Word Embeddings (WE). This chapter introduces these non-supervised machine learning methods to the community of historians, based on an experimental case-study of digitized archival holdings of the European Commission (EC).
Kapitel in diesem Buch
- Frontmatter I
- Contents V
- Introduction 1
-
Part 1: Historiography
- The Historiographical Foundations of Digital Public History 17
- Crowdsourcing and User Generated Content: The Raison d’Être of Digital Public History 35
- Sharing Authority in Online Collaborative Public History Practices 49
- Shifting the Balance of Power: Oral History and Public History in the Digital Era 61
- Digital Public Archaeology 77
- Identities – a historical look at online memory and identity issues 87
- Digital Environmental Humanities 97
- Combining Values of Museums and Digital Culture in Digital Public History 107
- Open Access: an opportunity to redesign scholarly communication in history 121
- Past and Present in Digital Public History 131
- Digital Hermeneutics: The Reflexive Turn in Digital Public History? 139
-
Part 2: Contexts
- Archivists as Peers in Digital Public History 149
- History Museums: Enhancing Audience Engagement through Digital Technologies 165
- Interactive Museum & Exhibitions in Digital Public History Projects and Practices: An Overview and the Unusual Case of M9 Museum 175
- Digital Public History in Libraries 185
- Publishing Public History in the Digital Age 199
- “Learning Public History by doing Public History” 211
- Spaces: What’s at Stake in Their Digital Public Histories? 223
- Digital Public History in the United States 235
- Technology and Historic Preservation: Documentation and Storytelling 243
- Social Media: Snapshots in Public History 259
-
Part 3: Best Practices
- Curation: Toward a New Ethic of Digital Public History 277
- Data Visualization for History 291
- Mapping and Maps in Digital and Public History 301
- Gaming and Digital Public History 309
- Individuals in the Crowd: Privacy, Online Participatory Curation, and the Public Historian as Private Citizen 317
- Building Communities, Reconciling Histories: Can We Make a More Honest History? 327
- Cybermemorials: Remembrance and Places of Memory in the Digital Age 337
- Living History: Performing the Past 349
- Activist Digital Public History 359
- Digital Public History: Family History and Genealogy 369
- Digital Personal Memories: The Archiving of the Self and Public History 377
- Planning with the Public: How to Co-develop Digital Public History Projects? 385
- As Seen through Smartphones: An Evolution of Historic Information Embedment 395
-
Part 4: Technology, Media, Data and Metadata
- What does it Meme? Public History in the Internet Memes Era 405
- Historical GIS 419
- Content Management 431
- Linked Open Data & Metadata 439
- Big Data and Public History 447
- Modeling Data Complexity in Public History and Cultural Heritage 459
- History and Video Games 475
- Historians as Digital Storytellers: The Digital Shift in Narrative Practices for Public Historians 485
- The Audiovisual Dimension & the Digital Turn in Public History Practices 495
- Digital Public History and Photography 505
- Exploring Large-Scale Digital Archives – Opportunities and Limits to Use Unsupervised Machine Learning for the Extraction of Semantics 517
- Infographics and Public History 531
- List of Contributors 545
Kapitel in diesem Buch
- Frontmatter I
- Contents V
- Introduction 1
-
Part 1: Historiography
- The Historiographical Foundations of Digital Public History 17
- Crowdsourcing and User Generated Content: The Raison d’Être of Digital Public History 35
- Sharing Authority in Online Collaborative Public History Practices 49
- Shifting the Balance of Power: Oral History and Public History in the Digital Era 61
- Digital Public Archaeology 77
- Identities – a historical look at online memory and identity issues 87
- Digital Environmental Humanities 97
- Combining Values of Museums and Digital Culture in Digital Public History 107
- Open Access: an opportunity to redesign scholarly communication in history 121
- Past and Present in Digital Public History 131
- Digital Hermeneutics: The Reflexive Turn in Digital Public History? 139
-
Part 2: Contexts
- Archivists as Peers in Digital Public History 149
- History Museums: Enhancing Audience Engagement through Digital Technologies 165
- Interactive Museum & Exhibitions in Digital Public History Projects and Practices: An Overview and the Unusual Case of M9 Museum 175
- Digital Public History in Libraries 185
- Publishing Public History in the Digital Age 199
- “Learning Public History by doing Public History” 211
- Spaces: What’s at Stake in Their Digital Public Histories? 223
- Digital Public History in the United States 235
- Technology and Historic Preservation: Documentation and Storytelling 243
- Social Media: Snapshots in Public History 259
-
Part 3: Best Practices
- Curation: Toward a New Ethic of Digital Public History 277
- Data Visualization for History 291
- Mapping and Maps in Digital and Public History 301
- Gaming and Digital Public History 309
- Individuals in the Crowd: Privacy, Online Participatory Curation, and the Public Historian as Private Citizen 317
- Building Communities, Reconciling Histories: Can We Make a More Honest History? 327
- Cybermemorials: Remembrance and Places of Memory in the Digital Age 337
- Living History: Performing the Past 349
- Activist Digital Public History 359
- Digital Public History: Family History and Genealogy 369
- Digital Personal Memories: The Archiving of the Self and Public History 377
- Planning with the Public: How to Co-develop Digital Public History Projects? 385
- As Seen through Smartphones: An Evolution of Historic Information Embedment 395
-
Part 4: Technology, Media, Data and Metadata
- What does it Meme? Public History in the Internet Memes Era 405
- Historical GIS 419
- Content Management 431
- Linked Open Data & Metadata 439
- Big Data and Public History 447
- Modeling Data Complexity in Public History and Cultural Heritage 459
- History and Video Games 475
- Historians as Digital Storytellers: The Digital Shift in Narrative Practices for Public Historians 485
- The Audiovisual Dimension & the Digital Turn in Public History Practices 495
- Digital Public History and Photography 505
- Exploring Large-Scale Digital Archives – Opportunities and Limits to Use Unsupervised Machine Learning for the Extraction of Semantics 517
- Infographics and Public History 531
- List of Contributors 545