Home The evolutionary dynamics of grammatical gender in Torricelli languages
Article Open Access

The evolutionary dynamics of grammatical gender in Torricelli languages

  • Jose A. Jódar-Sánchez EMAIL logo and Marc Allassonnière-Tang
Published/Copyright: September 27, 2024

Abstract

Grammatical gender in New Guinea is an often neglected area in typological research, even though it is extremely diverse. For example, in New Guinea, some languages have grammatical gender systems with two sex-based categories, more than four gender-indexing targets, and no gender marking on nouns, while some languages have grammatical gender systems with much more categories, which are only marginally sex-based. This paper infers the processes of development and change of grammatical gender in Torricelli languages from two perspectives. First, it synthesizes the available data in the existing literature and hypothesizes the evolutionary pathway of gender systems in Torricelli languages. Nineteen Torricelli languages are selected as a representative coverage of the 55 Torricelli languages listed in Glottolog within the limits of the available documentation. These languages are then coded based on 6 presence-absence features relating to gender marking on verbs, adjectives, nouns, numerals, pronouns, and demonstratives. Second, it conducts an analysis with phylogenetic comparative methods to provide a quantitative assessment of the evolutionary possibilities for gender systems in Torricelli languages. The preliminary results show that gender is likely marked at the root of Torricelli languages, with pronouns and verbs being at the core of the system. This is in agreement with trends reflecting the evolution of gender systems in languages across the world.

1 Introduction

Languages can rely on various strategies to categorize nouns of the lexicon (Kemmerer 2017; Seifart 2010). One of the most common strategies is grammatical gender systems (Corbett 1991), in which each noun of the lexicon is assigned to a specific category, which can relate to sex, humanness, animacy, plants, fruits, and liquids, among others (Corbett 2013). For example, in Swahili, nouns are associated with more than ten categories, while in Catalan nouns are associated with only two categories, either masculine or feminine. One of the most common formal criteria to define grammatical gender is grammatical agreement. Grammatical gender systems may have marking on nouns, as shown in Swahili, with prefixes on the nouns, e.g., m-toto (class.1-child) ‘child’. However, grammatical gender systems also generate grammatical agreement in words associated with the noun and its referent. Grammatical gender can be marked on adjectives, verbs, demonstratives, and numerals, among others. These are called the targets of gender. Taking Catalan as an example, the masculine and feminine categories are reflected by agreement on, among others, demonstratives and adjectives, e.g., aquest cotxe groc (dem.prox.m.sg car yellow.m.sg) ‘this yellow car’ and aquest-a bicicleta grog-a (dem.prox-f.sg bike yellow-f.sg) ‘this yellow bike’.

It is generally agreed that languages may develop a grammatical gender system through different stages of grammaticalization (Aikhenvald 2016; Grinevald 2002), which is relevant for studies related to the diachronic change of linguistic complexity (Wälchli et al. 2020). At the beginning, classification of referents is based on lexical nouns, which can then develop into classificatory morphemes, which in turn can further grammaticalize and become agreement markers (Grinevald and Seifart 2004). However, little is known about the micro-steps of this grammaticalization process. For example, is grammatical gender marking more likely to develop on nouns first? Or on pronouns or verbs? If so, how do they then spread to different morphosyntactic domains such as demonstratives and adjectives? Most diachronic quantitative studies have focused on the evolution of agreement marking in large language families such as Indo-European languages (Allassonnière-Tang and Dunn 2020; Carling and Cathcart 2021) and Atlantic languages (Rochant et al. 2022). The current study aims to broaden the scope of studies on the evolution of grammatical gender marking by conducting phylogenetic analyses of grammatical gender marking across multiple morphosyntactic domains in Torricelli languages.

This paper contributes theoretically to the discussion about the development and evolution of grammatical gender systems. Our analysis of Torricelli languages suggests that their gender systems developed around marking on the pronoun and verb, with pronouns having developed gender first, which then spread on to verbs. This corroborates claims which hypothesize that grammatical gender first develops in the noun phrase (Audring 2016) and then extends to other parts of the sentence. More generally, in the case of gender in the Torricelli family, the universal trends about gender are suppported. This is at odds with the claims in Dunn et al. (2011) about how the evolutionary trends of some grammatical features are family-specific rather than universal.

The Torricelli language family is interesting in and of itself. Grammatical gender in New Guinea is extremely diverse. For example, in New Guinea, it is common to find languages with two-gendered sex-based systems with semantic assignment, more than four gender-indexing targets, and no gender marking on nouns (Svärd 2019). As another example, the Arapesh languages, among which we have data for Abu’ Arapesh, Bukiyip, Bumbita Arapesh, and Mufian, have grammatical gender systems with an extensive number of categories, which are only marginally sex-based. For instance, the Mufian system has 17 categories and only 2 categories are sex-based, class 8 for women and class 16 for men (Alungum et al. 1978). Furthermore, the co-occurrence of several types of systems is also common. For example, Mian has a gender and a classifier system (Corbett et al. 2017). The area is thus of high typological interest. However, according to Foley (2018: 297), “in general Torricelli languages remain poorly documented; they comprise perhaps the least documented largish language family in the world”. It is, therefore, important to look at how gender is deployed in these languages and how this contributes to the knowledge we have of gender across languages. Not only that, but (most of) the languages in this family diverge from neighboring languages in various respects, including word order, the complexity of verb morphology, and grammatical gender marking.

There are various studies that look at language contact as an explanatory force in the transfer and acquisition of gender marking in languages of New Guinea, both Papuan and Austronesian ones (Schapper 2010; Terrill 2002; van den Berg 2015). In general, Austronesian languages are characterized as not having grammatical gender. Given this, it is interesting to set up a clear picture of what grammatical gender looks like and how it may have evolved in the Torricelli family so that future studies can correlate this with the evolution of other features in those languages and in neighboring ones. For instance, does the loss of grammatical gender in some Torricelli languages correlate with the loss of other morphological features? Can the loss or evolution of grammatical gender in some Torricelli languages be explained through contact with neighboring languages of the Lower Sepik-Ramu, Sepik, and Austronesian families?

The article is structured as follows. After this introduction, in Section 2, we lay out the data and research questions of the study. In Section 3, we present our analysis of the data based on a phylogenetic comparative analysis. Finally, in Section 4, we summarize the results of the analysis and suggest some directions for future research.

2 Data and research questions

The data used in this study comprises 19 of the 55 Torricelli languages listed in Glottolog (Hammarström et al. 2021), namely 34.54 % of the total number of languages.[1] These 19 languages are Abu’ Arapesh, Au, Aro, Bragat, Bukiyip, Bumbita Arapesh, Eitiep, Heyo, Kombio, Minidien, Mol, Molmo One, Mufian, Northern One, Olo, Srenge, Urat, Walman, and Yeri (Figure 1). All possible sources were consulted for data points. We have languages from the Arapeshan and Kombioic branches (including Urat), from the Paleic, Wapeic, West Paleic, and West Wapeic branches.

Figure 1: 
A geographic overview (Kahle and Wickham 2013) of the 19 Torricelli languages included in the sample.
Figure 1:

A geographic overview (Kahle and Wickham 2013) of the 19 Torricelli languages included in the sample.

The sources of data are three. The first source of data are grammars and dictionaries published mostly by SIL missionaries but also some by university linguists such as Drinfeld (2024), Pehrson et al. (2016), and Wilson (2017). Most of these are grammar sketches and lexicons, with the only other comprehensive sources being the Kombio dictionary (Farr 2018) and Aro and Yeri grammars (Drinfeld, ibid. and Wilson ibid., respectively). The second source of data are the field notes of various colleagues. These include Matthew Dryer and Lea Brown’s notes on Andrey Drinfeld’s notes on Aro, and Thomas Diaz’s notes on Heyo. The third source of data are the first author’s field notes on Bragat, Eitiep, Minidien, Mol, and Srenge. It should be born in mind that some of the data in this article is tentative. While it is easier to code data in a binary way (see below) and specifying the morphemes used to code gender or the types of referents each of those morphemes can refer to, in some cases doubts arose as to whether the little data available evinced the presence or absence of grammatical gender, or whether it was insufficient to determine that. For instance, the data on Elkei recently made available (Elgh and Persson 2024), based on a single elicitation session of a few hours, is so scarce and tentative that it provides no information on gender in different word classes.

For each of these 19 languages, six data points were obtained, namely whether they index grammatical gender in pronouns, verbs, demonstratives, adjectives, numerals, and nouns, as shown in Table 1. Each of these was coded in a binary way, either ‘yes’ if the word class in question codes gender in that language, or ‘no’ if it does not.[2] Note that, when we say that verbs bear gender marking, in the majority of cases we are referring to the pronominal affixes attached to them.

Table 1:

Gender-indexing targets in Torricelli languages.

Language ISO code Pronouns Verbs Demonstratives Adjectives Numerals Nouns
Abu’ Arapesh aah Yes Yes Yes Yes Yes Yes
Au avt Yes Yes No Yes Yes No
Aro tei No No No No No No
Bragat aof Yes Yes No No No No
Bukiyip ape Yes Yes Yes Yes Yes Yes
Bumbita Arapesh aon Yes Yes Yes Yes Yes Yes
Eitiep eit Yes No No No No No
Heyo auk Yes Yes Yes Yes Yes Yes
Kombio xbi No No No No No No
Minidien wii Yes Yes No No No No
Mol alx Yes Yes No Yes Yes No
Molmo One aun No No No No No No
Mufian aoj Yes Yes Yes Yes Yes Yes
Northern One onr No No No No No No
Olo ong Yes Yes Yes No Yes No
Srenge lsr Yes Yes No Yes No Yes
Urat urt Yes Yes No No No No
Walman van Yes Yes Yes Yes Yes Yes
Yeri yev Yes Yes Yes Yes Yes Yes

Au numerals inflect for gender, as shown in (1); they would be coded as ‘yes’ for the word class ‘numeral’.

(1)
Au (Wapeic, Torricelli)
(a)
wiketer-es
two-du.m
(b)
wiketer-em
two-du.n
(c)
wiketer-i
two-du.f
(Scorza 1985: 232)

In the Urat sentence in (2), subjects are indexed in pronouns and verb pronominal prefixes, both of which code gender. Urat would be coded as ‘yes’ for the word classes ‘pronoun’ and ‘verb’.

(2)
Urat (Kombioic, Torricelli)
Kin yukur n-ainge tup ti w-ende pakai.
3sg.m neg 3sg.m-write book 3sg.f 3sg.f-do neg
‘He didn’t write the book, she did’
(Barnes 1989: 61)

Some complications arose during the data coding process. In general, all of them are related to the fact that, for some languages, the data is provisional to the extent that it is based on extremely limited fieldwork, in some cases one or two days at most. A first complication was that, in cases where data comes from field notes, much of the coding relied on only a few examples. For instance, the Bragat data only contains five sentences with demonstratives and the Eitiep data only contains six numerals in isolation. In these two cases (and a few others), it was not straightforward to decide how to code these data points. Even for languages for which there is documentation, what is available may not be enough to determine whether the language has gender and where it is coded. For instance, the dictionary manuscript of Yahang (Filer n.d.), a language closely related to Heyo, suggests that verbs in this language code gender in pronominal affixes, at least in the third person singular. Nothing else, though, can be retrieved. Yahang data is, therefore, not included in our study.

A second complication was that, in some cases, it was hard to tell whether gender marking is a full-fledged feature of the language or whether the language has marginal gender.[3] In the case of nouns in some of the languages, only those referring to humans or kinship are marked for gender. For instance, Mol only has one noun in our data which is marked for gender, namely the one for ‘baby, newborn child’, which is pamiə for females and bamiən for males. In Yil, the nouns marked for gender are only a few, mostly denoting humans. In Bragat, some nouns like unpa ‘man’ and urpa ‘woman’ are marked for gender, but this seems to be very rare. Are these languages with marginal gender or is this a product of the elicitation? Similarly, some languages mark gender in numerals but maybe in only a few of them. It is thus a phenomenon limited to a few items within that word class. For instance, in Walman, the two words for ‘one’, ngo and alpa, are adjective-like in that they inflect for gender. However, they are the only numerals that do so, given that only words in the singular inflect for gender in that language and all other numerals are plural.[4] In some languages such as Au, only the first two numerals inflect for gender. In Olo, that is the case for the first four numerals. To determine whether gender is marginal in, for example, Walman, we turn to other word classes. The answer is that grammatical gender is a category of the language, as shown in Dryer (2019).

The research questions addressed in this study are two, namely whether Proto-Torricelli had grammatical gender at its root and, if so, which categories were more likely to have developed it before others. Besides those two questions, we also briefly address the issue of whether the answer to those questions is in agreement with what is known about the evolution of gender systems in the languages of the world. The next section delves into the analysis of the data which attempts to answer these research questions.

3 Analysis

We use phylogenetic comparative methods to a) infer the presence/absence of gender marking on different variables at the root of Torricelli languages b) infer the transition traits between the variables included in our data set. Variables here refer to word classes. Such methods are considered appropriate as they allow us to address the non-independence of features from evolutionary processes (Galton’s problem, Mace and Holden 2005). As an example, Macklin-Cordes and Round (2022) explain that phylogenetic methods can contribute to the study of language typology, as even when based on partially erroneous data, it gives better-than-chance results. First, a tree sample is used as a basis to infer the evolutionary processes undergone by the target features of the languages. In our study, we use a world tree of languages (Bouckaert et al. 2022) from which we extract the tree for the Torricelli languages included in our data set. This results in a sample of 902 trees that represent the possible evolutionary pathways of Torricelli languages in the data set. This sample can be summarized with a maximum clade credibility tree, in which every tree of the sample is scored by the product of the likelihood of the splits observed in each individual tree. The tree with the highest score is considered as the most representative tree in the sample and can thus be used as an overview of the tree sample. The maximum clade credibility tree extracted from the tree sample of Torricelli languages is shown in Figure 2, along with the variables included in our data set. We also verified that this tree sample matches with the clades found in the Glottolog tree to ensure our use of the state of the art classification of Torricelli languages, widely accepted by experts.

Figure 2: 
The maximum clade credibility tree of Torricelli languages based on the world tree from Bouckaert et al. (2022); the heatmap visualization displays the variables included in our data set.
Figure 2:

The maximum clade credibility tree of Torricelli languages based on the world tree from Bouckaert et al. (2022); the heatmap visualization displays the variables included in our data set.

Based on the sample of trees, three methods are used to infer the probability of gender marking for each variable at the root of Torricelli languages. These three methods are all considered to assess the robustness of our results. Ideally, their output should converge. First, we use ancestral character estimation, which infers the maximum likelihood of states with equal rates of transition (Pagel 1994). Second, we use stochastic character mapping to sample character histories from their posterior probability distribution (Huelsenbeck et al. 2003). With this method, we simulate different instances of character evolution along the phylogeny. By doing so a large number of times, we can estimate the probabilities that a given node is present at a given state. Third, we use a reverse jump hyperprior (RJHP), which simultaneously gives not only the probability of gender marking at the root, but also the probability of change between the different values of each variable along with a set of transition rate between states of each variable (Green 1995; Gowri-Shankar and Rattray 2007). The RJHP is based on a Continuous Time Markov Chain process, which considers scenarios of reversed change between different states of a variable. For example, the algorithm scores the probability that gender marking is lost, acquired, re-acquired, and re-lost. The RJHP method thus allows us to infer the correlated evolution between the features included in our data set. As an example, between gender marking on verbs and gender marking on adjectives, we can infer which one is more likely to acquire (or lose) gender marking first and which status is the most stable. The analyses are conducted in R (R-Core-Team 2021) with the following packages: ape (Paradis and Schliep 2019), diagram (Soetaert 2020), ggally (Schloerke et al. 2021), ggpubr (Kassambara 2020), phangorn (Schliep 2011), phytools (Revell 2012), and tidyverse (Wickham et al. 2019).

First, in Figure 3 we can visualize the probability of gender marking for different domains at the root of Torricelli languages. The three methods (ancestral character estimation, simulate stochastic character map, reverse jump hyperprior) converge in showing that verbs and pronouns were likely to have gender marking. The reconstruction is less certain for adjectives, demonstratives, and numerals. Finally, nouns are less likely to have gender marking. These results generally match with the literature, as will be discussed below.

Figure 3: 
The probability of gender marking at the root of Torricelli languages based on our data set. The three methods considered are ancestral character estimation (ace), simulate stochastic character maps (simmap), and reverse jump hyperprior (RJHP).
Figure 3:

The probability of gender marking at the root of Torricelli languages based on our data set. The three methods considered are ancestral character estimation (ace), simulate stochastic character maps (simmap), and reverse jump hyperprior (RJHP).

Second, we consider the pairwise correlated evolution between the variables of the data set. Each pair of variables can have four states: 00, 01, 10, and 11. For example, if the considered pair is gender marking on the verb and gender marking on the noun, 00 would indicate that gender marking is found neither on the verb nor the noun. Eleven would represent the case where gender is marked on both the verb and the noun. Ten would refer to the situation where gender is marked on the verb and not on the noun, while 01 would mean that gender is not marked on the verb but marked on the noun. For each pair of variables, two models are built. One model considers that the variables are independent and the other model includes the dependence between the variables. For each pair of models, we calculated Bayes Factors (Burnham and Anderson 2004) from the marginal likelihoods of both models which we obtained using a stepping stone sampler (Xie et al. 2011) with 100 stones and 1,000 iterations per stone. For each model, 1,000,000 iterations are conducted. The first half (5,00,000) is discarded as a burn-in and the sampling frequency is every 1,000 iterations, which results in (1,000,000 − 5,00,000)/1,000 = 500 iterations per pair of variables. The Bayes Factor is estimated in the following way: 2 × (log marginal likelihood of dependent model – log marginal likelihood of independent model). We interpret Bayes Factors above 2 as positive evidence, above 5 as strong evidence, and above 10 as very strong evidence in support of the dependent model (Raftery 1996). In the current study, we kept the pairwise interactions that had a Bayes factor higher than 2, which are shown in Figure 4.

Figure 4: 
The transition rates between pairs of variables from the data set. Only pairs with a Bayes factor above 2 when comparing the independent and the dependent models are displayed. For the sake of visualization, the transition rates are multiplied by ten when plotted with the arrows. The thickness of the arrows indicates a stronger or weaker transition rate between two states. Arrows with a very small rate are not shown. The size of the states (00, 11, 01, 10) represent the probability at the root of Torricelli languages.
Figure 4:

The transition rates between pairs of variables from the data set. Only pairs with a Bayes factor above 2 when comparing the independent and the dependent models are displayed. For the sake of visualization, the transition rates are multiplied by ten when plotted with the arrows. The thickness of the arrows indicates a stronger or weaker transition rate between two states. Arrows with a very small rate are not shown. The size of the states (00, 11, 01, 10) represent the probability at the root of Torricelli languages.

The probabilities at the root match with the output when considering each variable individually. For example, verbs are marked as the root when considering the interaction between verbs and demonstratives, pronouns, nouns, adjectives, and numerals. Nouns are not likely to mark gender, even when considering the pairwise interaction with adjectives, pronouns, and verbs. The transition rates can be read as follows. Taking the interaction between pronouns and verbs as an example, the transition rates from 01 to 00 (5) is larger than from 01 to 11 (3), which means that if gender is not marked on the pronoun but marked on the verb, gender marking is more likely to be lost on the verb as well. The same is found when gender is marked on the pronoun but not on the verb (10). Furthermore, the transition rate from 00 to 01 is similar to the transition rate from 00 to 10, which means that when gender is marked neither on the pronoun nor on the verb, gender marking is equally likely to develop on the verb or the pronoun. We acknowledge that this pairwise visualization might not be easy to interpret. Thus, we provide an overview of these pairwise interactions in Figure 5. For each pair of variables, we count how many times, in the total of 500 iterations, is the transition rate higher than the transition rate from 00 to 01 or vice-versa. For example, within the 500 iterations of gender marking on adjectives and on verbs, in 326 iterations the transition rate of 00 to 01 (no gender marking → gender marking on verbs but not on adjectives) is higher than the transition rate from 00 to 10 (no gender marking → gender marking on adjectives but not on verbs), which results in a ratio of 0.652 (326/500) for cases where gender marking is more likely to be marked on verbs first rather than on adjectives. The ratio of gender marking on adjectives first is 0.06 (30/500) while in 0.288 (144/500) of the iterations, there is an equal transition rate from 00 (no gender marking) to 10 (gender marking on adjectives but not on verbs) and 10 (gender marking on verbs but not on adjectives). These ratios are then used to plot the arrows in Figure 5. Solid arrows represent a proportion higher than 0.5. Dashed arrows indicate a proportion between 0.5 and 0.33. A proportion lower than 0.33 is shown with a dotted line. The detailed numbers of each transition are provided in the Supplementary Materials.

Figure 5: 
An overview of the transition rates extracted from the RJHP analysis. The arrows indicate the proportion of trees in the total sample in which the probability of acquiring a variable is higher. For example, an arrow pointing from pronouns to nouns means that pronouns are more likely to have gender marking before nouns. Solid arrows represent a proportion higher than 0.5. Dashed arrows indicate a proportion between 0.5 and 0.33. A proportion lower than 0.33 is shown with a dotted line. Pairs of variables that did not have evidence for the dependent model are not linked by arrows. Note that no dotted lines are represented in this figure.
Figure 5:

An overview of the transition rates extracted from the RJHP analysis. The arrows indicate the proportion of trees in the total sample in which the probability of acquiring a variable is higher. For example, an arrow pointing from pronouns to nouns means that pronouns are more likely to have gender marking before nouns. Solid arrows represent a proportion higher than 0.5. Dashed arrows indicate a proportion between 0.5 and 0.33. A proportion lower than 0.33 is shown with a dotted line. Pairs of variables that did not have evidence for the dependent model are not linked by arrows. Note that no dotted lines are represented in this figure.

One main observation is that verbs and pronouns seem to be at the core of gender marking in Torricelli languages. First, solid arrows go from gender marking on pronouns to gender marking on nouns, demonstratives, verbs, and numerals. In parallel, solid arrows also go from gender marking on verbs to gender marking on demonstratives, numerals, nouns, and adjectives. This shows that gender marking is more likely to emerge on pronouns first (since no solid arrow points from verbs to pronouns), and then spread to different domains. However, gender marking on verbs has an equally strong spreading effect compared with pronouns. Finally, gender marking on numerals is more likely to be found before gender marking on demonstratives since an arrow points from numerals to demonstratives. Overall, these tendencies agree with tendencies about the development of gender across languages (Audring 2016). For instance, Givón (1976) hypothesizes that, in Swahili, gender-marked pronouns which co-occur with nouns in the noun phrase may evolve into gender-marked verbal morphology. This situation is reminiscent of the situation in the Torricelli family where pronouns may have given rise to pronominal affixes on verbs.

One question which arises from the analysis above is whether our results match a reconstruction of the gender system in Proto-Torricelli done through the comparative method. There are various questions to consider here. The first question is the lack of reconstruction of forms in Proto-Torricelli. Dryer (Forthcoming), the first major comparative study in Torricelli languages and still work in progress, does not attempt to reconstruct forms in the proto-language. The second question is the reason why this reconstruction has not been undertaken yet. The main reason is the state of documentation of Torricelli languages, which is very poor (Foley 2018), with only under a fifth of Torricelli languages having grammars or grammar sketches available. Partly as a consequence of this, the data available is too sparse to make any solid comparison and obtain any reliable reconstruction. A second reason is that the data available has been collected, in some cases, by researchers with no knowledge of other Torricelli languages (with the exceptions of the large-scale survey conducted by Don Laycock half a century ago and the data collected by Matthew Dryer in the last few decades), making the data even less reliable. The final question relates to whether there is any aspect of the gender system that can be reconstructed with any confidence. It is likely that Proto-Torricelli verbs had n- as the 3rd person masculine singular pronominal prefix and w- as the 3rd person feminine singular pronominal prefix. In fact, Laycock (1975: 768) claims that this w- is one of the main diagnostics to identify languages in the Torricelli family. Since gender in Proto-Torricelli is hypothesized to spread from pronouns and verbs to other words (see above Section 3), it is not unexpected that one of these two word classes, that of verbs, is the one where some of the morphemes expressing gender marking are the most stable and where their reconstruction is the safest.

4 Concluding discussion

In the preceding sections, we have laid out an analysis of grammatical gender marking in various word classes of over a third of Torricelli languages which provides answers to our research questions (see Section 2). First, the analysis shows that gender was very likely a feature of Proto-Torricelli. Second, the analysis also shows that gender marking likely occurred in pronouns and verbs, with nouns less likely to have it. This is in agreement with crosslinguistic tendencies observed in the literature (Audring 2016). The evidence for adjectives, demonstratives, and numerals is less conclusive in this respect. Pronouns and especially verbs in Proto-Torricelli are shown to be central in the spread and the development of grammatical gender in the language. Pronouns are likely to have had gender marking before any other word class, followed by the pronominal affixes on verbs. This is not surprising, given that most Torricelli languages have pronouns and verb pronominal affixes coding event participants in terms of number and, sometimes, gender (with a few exceptions such as the genderless language Aro [Drinfeld 2024]). This is also a general tendency in languages spoken in the area (Svärd 2019). In addition, we have presented evidence showing that pronouns were likely to have had gender marking before adjectives and nouns, and numerals before demonstratives. The first of these correlations is expected, as 3rd person singular pronouns in most Torricelli languages distinguish between at least masculine and feminine gender while not all adjectives and nouns do so. In fact, only the masculine and feminine pronominal affixes on verbs can be reconstructed with some confidence in Proto-Torricelli. Overall, in this language family, gender in pronouns is more elaborate than in adjectives or nouns. We have no explanation yet, however, for the correlation between numerals and demonstratives.

From the point of view of linguistic theory, our results provide evidence for the claim that the locus of development of gender systems is in the noun domain, and specifically in pronouns, as it is in language (sub)families such as the Indo-European one and the Bantu one. It may well be that the link between independent pronouns and verb pronominal affixes in terms of agreement indexing favored the transmission of gender features from the former to the latter. In this respect, our results also provide evidence for the universal trends believed to be at work in the evolution of gender systems in languages across the world.

The questions that remain unanswered, however, are manifold. One question is that, given the probability that Proto-Torricelli had gender at its root, it remains to be investigated how and why some Torricelli languages lost gender marking. The chances of getting answers for this are, nonetheless, slim due to their scant, sometimes nonexistent, documentation. Another question is what gender evolution from Proto-Torricelli to the Torricelli languages spoken today looks like. Among other things, we wonder what the sources for different gender markers are. For instance, are verb pronominal affixes in some of these languages derived from pronouns, given their formal similarity? Finally, one broader question is to what extent the tendencies observed in the Torricelli data are generalizable to neighboring language families such as the Sepik, the Ramu-Lower Sepik, and the Skou families. Such replications in future studies can provide more answers to the puzzle of the evolutionary dynamics of grammatical gender in languages of New Guinea.


Corresponding author: Jose A. Jódar-Sánchez, Departament de Filologia Catalana i Lingüística General, Universitat de Barcelona, Barcelona, Spain; and Department of Linguistics, University at Buffalo, Buffalo, New York, 14260, USA, E-mail:

Funding source: French National Research Agency

Award Identifier / Grant number: ANR-20-CE27-0021

Acknowledgments

We are grateful to the feedback provided by the audience at the 14th International Conference of the Association for Linguistic Typology, celebrated in Austin, Texas in 2022, and to the comments of Matthew Dryer, William A. Foley, Jeff Good, and Jayden Macklin-Cordes on earlier drafts of this article. The research in this article is funded by the United States National Science Foundation project ‘Typological and historical discovery through language documentation of Walman, Srenge, Aro, and Eho, four critically endangered languages’ (1500751) and by the French Agence National de la Recherche project ‘The role of linguistic and non-linguistic factors in the evolution of nominal classification systems’ (ANR-20-CE27-0021).

  1. Research funding: This work was supported by French National Research Agency under (ANR-20-CE27-0021).

  2. Data availability: The data and the code used for the analyses are available at the following repository: https://osf.io/gafs4/.

Abbreviations

3

3rd person

dem

demonstrative

du

dual

f

feminine

m

masculine

n

neuter

neg

negation

prox

proximate

sg

singular

Supplementary Material

The supplementary materials for the article can be found at https://osf.io/gafs4/.

References

Aikhenvald, Alexandra Y. 2016. How gender shapes the world. Oxford: Oxford University Press.10.1093/acprof:oso/9780198723752.001.0001Search in Google Scholar

Allassonnière-Tang, Marc & Michael Dunn. 2020. The evolutionary trends of grammatical gender in Indo-Aryan languages. Language Dynamics & Change 11(2). 211–240. https://doi.org/10.1163/22105832-bja10011.Search in Google Scholar

Alungum, John, Robert J. Conrad & Joshua Lukas. 1978. Some Muhiang grammatical notes. In Richard Loving (ed.), Miscellaneous papers on Dobu and Arapesh, 89–130. Ukarumpa: SIL.Search in Google Scholar

Audring, Jenny. 2016. Gender. Oxford Research Encyclopedias: Linguistics. https://doi.org/10.1093/acrefore/9780199384655.013.43.Search in Google Scholar

Barnes, Barney. 1989. Urat grammar essentials. Ukarumpa: SIL.Search in Google Scholar

Bouckaert, Remco, David Redding, Oliver Sheehan, Thanos Kyritsis, Russel Gray, Kate, E. & Quentin, Atkinson. 2022. Global language diversification is linked to socio-ecology and threat status. https://doi.org/10.31235/osf.io/f8tr6.Search in Google Scholar

Burnham, Kenneth P. & David R. Anderson (eds.). 2004. Model Selection and Multimodel Inference. New York, NY: Springer.10.1007/b97636Search in Google Scholar

Carling, Gerd & Chundra Cathcart. 2021. Evolutionary dynamics of Indo-European alignment patterns. Diachronica 38(3). 358–412. https://doi.org/10.1075/dia.19043.car.Search in Google Scholar

Corbett, Greville G. 1991. Gender. Cambridge: Cambridge University Press.Search in Google Scholar

Corbett, Greville G. 2013. Sex-based and non-sex-based gender systems. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar

Corbett, Greville G., Sebastian Fedden & Raphael Finkel. 2017. Single versus concurrent systems: Nominal classification in Mian. Linguistic Typology 21(2). 209–260. https://doi.org/10.1515/lingty-2017-0006.Search in Google Scholar

Drinfeld, Andrey. 2024. A grammar of Aro, a Torricelli language of Papua New Guinea. Buffalo: University at Buffalo (Doctoral dissertation).Search in Google Scholar

Dryer, Matthew S. 2019. Gender in Walman. In Francesca di Garbo, Bruno Olsson & Bernhard Wälchli (eds.), Grammatical gender and linguistic complexity. Volume I: General issues and specific studies, 171–196. Berlin: Language Science Press.Search in Google Scholar

Dryer, Matthew S. Forthcoming. Towards a genealogical classification of Torricelli languages. Manuscript.Search in Google Scholar

Dunn, Michael, Simon J. Greenhill, Stephen Levinson & Russell D. Gray. 2011. Evolved structure of language shows lineage-specific trends in word-order universals. Nature 473. 79–82. https://doi.org/10.1038/nature09923.Search in Google Scholar

Elgh, Erik & Rasmus Persson. 2024. Field notes on Elkei. NUSA 76. 29–49. https://doi.org/10.15026/0002000313.Search in Google Scholar

Farr, Joan. 2018. Kombio dictionary: Kombio-English-Tok Pisin. Manuscript.Search in Google Scholar

Filer, Colin. n.d. Yahang-English dictionary. Manuscript.Search in Google Scholar

Foley, William A. 2018. The languages of the Sepik-Ramu basic and environs. In Bill Palmer (ed.), The languages and linguistics of the New Guinea area: A comprehensive guide, 197–432. Berlin/Boston: Mouton de Gruyter.10.1515/9783110295252-003Search in Google Scholar

Givón, Tom. 1976. Topic, pronoun, and grammatical agreement. In Charles N. Li (ed.), Subject and topic, 149–188. New York: Academic Press.Search in Google Scholar

Gowri-Shankar, Vivek & Magnus Rattray. 2007. A reversible jump method for Bayesian phylogenetic inference with a nonhomogeneous substitution model. Molecular Biology & Evolution 24(6). 1286–1299. https://doi.org/10.1093/molbev/msm046.Search in Google Scholar

Green, Peter J. 1995. Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82. 711–732. https://doi.org/10.2307/2337340.Search in Google Scholar

Grinevald, Colette. 2002. Making sense of nominal classification systems: Noun classifiers and the grammaticalization variable. In Ilse Wischer & Gabrielle Diewald (eds.), New reflections on grammaticalization, 259–275. Amsterdam: John Benjamins.10.1075/tsl.49.17griSearch in Google Scholar

Grinevald, Colette & Frank Seifart. 2004. Noun classes in African and Amazonian languages: Towards a comparison. Linguistic Typology 8(2). 243–285. https://doi.org/10.1515/lity.2004.007.Search in Google Scholar

Hammarström, Harald, Robert Forkel, Martin Haspelmath & Sebastian Bank. 2021. Glottolog 4.5. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar

Huelsenbeck, John P., Rasmus Nielsen & Jonathan P. Bollback. 2003. Stochastic mapping of morphological characters. Systematic Biology 52(2). 131. https://doi.org/10.1080/10635150390192780.Search in Google Scholar

Kahle, David & Hadley Wickham. 2013. Ggmap: Spatial visualization with ggplot2. The R Journal 5(1). 144–161. https://doi.org/10.32614/rj-2013-014.Search in Google Scholar

Kassambara, Alboukadel. 2020. Ggpubr: ‘ggplot2’ based publication ready plots. R package version 0.4.0. Available at: https://CRAN.R-project.org/package=ggpubr.Search in Google Scholar

Kemmerer, David. 2017. Categories of object concepts across languages and brains: The relevance of nominal classification systems to cognitive neuroscience. Language, Cognition & Neuroscience 32(4). 401–424. https://doi.org/10.1080/23273798.2016.1198819.Search in Google Scholar

Laycock, Don C. 1975. The Torricelly phylum. In Stephen, A. (1975), New Guinea area languages and language study. Volume 1: The Papuan languages and the New Guinea linguistic scene, 767–780. Canberra: Pacific Linguistics/The Australian National University.Search in Google Scholar

Mace, Ruth & Clare J. Holden. 2005. A phylogenetic approach to cultural evolution. Trends in Ecology & Evolution 20(3). 116–121. https://doi.org/10.1016/j.tree.2004.12.002.Search in Google Scholar

Macklin-Cordes, Jayden L. & Erich R. Round. 2022. Challenges of sampling and how phylogenetic comparative methods help: With a case study of the Pama-Nyungan laminal contrast. Linguistic Typology 26(3). 533–572. https://doi.org/10.1515/lingty-2021-0025.Search in Google Scholar

Pagel, Mark. 1994. Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. Proceedings of the Royal Society of London, Series B: Biological Sciences 255(1342). 37–45. https://doi.org/10.1098/rspb.1994.0006.Search in Google Scholar

Paradis, Emmanuel & Klaus Peter Schliep. 2019. Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35. 526–528. https://doi.org/10.1093/bioinformatics/bty633.Search in Google Scholar

Pehrson, Benjamin, Musi Gibson & Malan Joel. 2016. Tentative grammatical description for the Onnele Wolwale [onr] language spoken in Sandaun Province, Papua New Guinea. Manuscript.Search in Google Scholar

R Core Team. 2021. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Raftery, Adrian E. 1996. Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2). 251–266. https://doi.org/10.1093/biomet/83.2.251.Search in Google Scholar

Revell, Liam J. 2012. Phytools: An R package for phylogenetic comparative biology (and other things). Methods in Ecology & Evolution 3. 217–223. https://doi.org/10.1111/j.2041-210X.2011.00169.x.Search in Google Scholar

Rochant, Neige, Marc Allassonnière-Tang & Chundra Cathcart. 2022. The evolutionary trends of noun class systems in Atlantic languages. In Proceedings of the Joint Conference on Language Evolution (JCoLE). Nijmegen: Max Planck Institute for Psycholinguistics, 624–631. (:unas).Search in Google Scholar

Schapper, Antoinette. 2010. Neuter gender in Eastern Indonesia. Oceanic Linguistics 49(2). 407–435. https://doi.org/10.1353/ol.2010.a411420.Search in Google Scholar

Schliep, Klaus Peter. 2011. Phangorn: Phylogenetic analysis in R. Bioinformatics 27(4). 592–593. https://doi.org/10.1093/bioinformatics/btq706.Search in Google Scholar

Schloerke, Barret, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg & Jason Crowley. 2021. Ggally: Extension to ‘ggplot2’. R package version 2.1.2. Available at: https://CRAN.R-project.org/package=GGally.Search in Google Scholar

Scorza, David. 1985. A sketch of Au morphology and syntax. Papers in New Guinea Linguistics 22. 215–273.Search in Google Scholar

Seifart, Frank. 2010. Nominal classification. Language & Linguistics Compass 4(8). 719–736. https://doi.org/10.1111/j.1749-818x.2010.00194.x.Search in Google Scholar

Soetaert, Karline. 2020. Diagram: Functions for visualising simple graphs (networks), plotting flow diagrams. R package version 1.6.5. Available at: https://CRAN.R-project.org/package=diagram.Search in Google Scholar

Stolz, Thomas. 2012. Survival in a niche: On gender-copy in Chamorro (and sundry languages). In Martine Vanhove, Thomas Stolz, Aina Urdze & Hitomi Otsuka (eds.), Morphologies in contact, 93–140. Berlin: Akademie Verlag.10.1524/9783050057699.91Search in Google Scholar

Svärd, Erik. 2019. Gender in New Guinea. In Di Garbo Francesca, Bruno Olsson & Bernhard Wälchli (eds.), Grammatical gender and linguistic complexity: Volume I: General issues and specific studies, 225–276. Berlin: Language Science Press.Search in Google Scholar

Terrill, Angela. 2002. Systems of nominal classification in East Papuan languages. Oceanic Linguistics 41(1). 63–88. https://doi.org/10.2307/3623328.Search in Google Scholar

Van den Berg, René. 2015. The loss of clusivity and the rise of gender in West Oceanic pronominals. Language & Linguistics in Melanesia 33(1). 10–47.Search in Google Scholar

Wälchli, Bernhard, Bruno Olsson & Francesca Di Garbo. 2020. Grammatical gender and linguistic complexity, Volume 1: General issues and specific studies. Berlin: Language Science Press.Search in Google Scholar

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, Alex Hayes, Lionel Henry, Jim Hester, Max Kuhn, Thomas Lin Pedersen, Evan Miller, Stephan Milton Bache, Kirill Müller, Jeroen Ooms, David Robinson, Dana Paige Seidel, Vitalie Spinu, Kohske Takahashi, Davis Vaughan, Claus Wilke, Kara Woo & Hiroaki Yutani. 2019. Welcome to the tidyverse. Journal of Open Source Software 4(43). 1686. https://doi.org/10.21105/joss.01686.Search in Google Scholar

Wilson, Jennifer. 2017. A grammar of Yeri, a Torricelli language of Papua New Guinea. Buffalo: University at Buffalo (Doctoral dissertation).Search in Google Scholar

Xie, Wangang, Paul O. Lewis, Yu Fan, Lynn Kuo & Ming-Hui Chen. 2011. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60(2). 150–160. https://doi.org/10.1093/sysbio/syq085.Search in Google Scholar

Published Online: 2024-09-27
Published in Print: 2024-09-25

© 2024 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 22.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/stuf-2024-2010/html
Scroll to top button