Startseite Linguistik & Semiotik Book genre and author’s gender recognition based on titles
Kapitel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Book genre and author’s gender recognition based on titles

The example of the bibliographic corpus of microtexts
  • Adam Pawłowski , Elżbieta Herden und Tomasz Walkowiak
Weitere Titel anzeigen von John Benjamins Publishing Company
Language and Text
Ein Kapitel aus dem Buch Language and Text

Abstract

The subject of this chapter is the application of automatic taxonomy methods to the corpus of microtexts, consisting of book titles. We test two hypotheses. The first one claims that simply on the basis of a book title one can automatically recognize its genre (writing species). The second assumes the possibility of recognizing the author’s gender on the basis of the book’s title. FastText and word2vec methods were applied. The analyses give a positive (and rather astonishing) result: with properly chosen n-grams more than 70% of titles could be correctly assigned a writing species, while the accuracy of the gender recognition of the author was almost 80%. Both values significantly exceed the levels of random recognition. The research was conducted on the corpus of titles derived from the Polish national bibliography.

Abstract

The subject of this chapter is the application of automatic taxonomy methods to the corpus of microtexts, consisting of book titles. We test two hypotheses. The first one claims that simply on the basis of a book title one can automatically recognize its genre (writing species). The second assumes the possibility of recognizing the author’s gender on the basis of the book’s title. FastText and word2vec methods were applied. The analyses give a positive (and rather astonishing) result: with properly chosen n-grams more than 70% of titles could be correctly assigned a writing species, while the accuracy of the gender recognition of the author was almost 80%. Both values significantly exceed the levels of random recognition. The research was conducted on the corpus of titles derived from the Polish national bibliography.

Heruntergeladen am 8.9.2025 von https://www.degruyterbrill.com/document/doi/10.1075/cilt.356.15paw/html
Button zum nach oben scrollen