Presented to you through Paradigm Publishing Services

John Benjamins Publishing Company

Chapter
Licensed
Unlicensed Requires Authentication

Seek&Hide

Anonymising a French SMS corpus using natural language processing techniques
  • , , , and

Abstract

This article presents the system Seek&Hide, a text message processing tool developed for the sud4science LR (http://www.sud4science.org/) project. It performs the anonymisation/de-identification of a corpus. At present, it has been used to anonymise the sud4science LR corpus of French text messages collected during the project. This is done in two phases. In the first phase, it automatically processes over 70% of the corpus. The rest of the corpus is processed in the second phase, aided by an expert annotator via a web interface specifically designed to simplify the task.

Abstract

This article presents the system Seek&Hide, a text message processing tool developed for the sud4science LR (http://www.sud4science.org/) project. It performs the anonymisation/de-identification of a corpus. At present, it has been used to anonymise the sud4science LR corpus of French text messages collected during the project. This is done in two phases. In the first phase, it automatically processes over 70% of the corpus. The rest of the corpus is processed in the second phase, aided by an expert annotator via a web interface specifically designed to simplify the task.

Downloaded on 14.4.2026 from https://www.degruyterbrill.com/document/doi/10.1075/bct.61.03acc/html
Scroll to top button