Home Linguistics & Semiotics Book genre and author’s gender recognition based on titles
Chapter
Licensed
Unlicensed Requires Authentication

Book genre and author’s gender recognition based on titles

The example of the bibliographic corpus of microtexts
  • Adam Pawłowski , Elżbieta Herden and Tomasz Walkowiak
View more publications by John Benjamins Publishing Company
Language and Text
This chapter is in the book Language and Text

Abstract

The subject of this chapter is the application of automatic taxonomy methods to the corpus of microtexts, consisting of book titles. We test two hypotheses. The first one claims that simply on the basis of a book title one can automatically recognize its genre (writing species). The second assumes the possibility of recognizing the author’s gender on the basis of the book’s title. FastText and word2vec methods were applied. The analyses give a positive (and rather astonishing) result: with properly chosen n-grams more than 70% of titles could be correctly assigned a writing species, while the accuracy of the gender recognition of the author was almost 80%. Both values significantly exceed the levels of random recognition. The research was conducted on the corpus of titles derived from the Polish national bibliography.

Abstract

The subject of this chapter is the application of automatic taxonomy methods to the corpus of microtexts, consisting of book titles. We test two hypotheses. The first one claims that simply on the basis of a book title one can automatically recognize its genre (writing species). The second assumes the possibility of recognizing the author’s gender on the basis of the book’s title. FastText and word2vec methods were applied. The analyses give a positive (and rather astonishing) result: with properly chosen n-grams more than 70% of titles could be correctly assigned a writing species, while the accuracy of the gender recognition of the author was almost 80%. Both values significantly exceed the levels of random recognition. The research was conducted on the corpus of titles derived from the Polish national bibliography.

Downloaded on 7.9.2025 from https://www.degruyterbrill.com/document/doi/10.1075/cilt.356.15paw/html
Scroll to top button