Abstract
This study explores the relationship between prosody and quantifier scope interpretation in the context of Japanese-speaking children and adults. Prior research has investigated how prosody, a universal quantifier and negation interact in the theoretical literature of Japanese linguistics, yet the empirical evidence remains inconclusive. Employing a Picture-Selection task, we investigate how Thematic Topic (TT) and Contrastive Topic (CT) prosody affect interpretation. The results reveal that both children and adults exhibit sensitivity to prosodic distinctions, with a less preference for the surface scope interpretation with the CT prosody. The results also demonstrate highly ambiguous nature of the TT prosody. The discrepancy in response times suggests potential processing disparities between age groups and highlights the complex interplay of prosody and quantifier scope.
1 Introduction
This study investigates Japanese-speaking children’s and adults’ interpretation of sentences which contain a universal quantifier in the subject position and negation. In languages like English, sentences with multiple quantifiers (e.g., Some student read every book in the library.) or a quantifier and a logical operator (e.g., All the men didn’t go.) typically exhibit scope ambiguity. Research has long observed and discussed how prosody affects the interpretation and has argued that these ambiguous sentences become unambiguous when uttered with a specific prosody (Constant 2012, 2014; Jackendoff 1972; Jespersen 1933; Ladd 1980; Ward and Hirschberg 1985, among others). Similar observations have been made in other languages such as German (Büring 1997, 2003; Krifka 1998; Sauerland and Bott 2002), Hungarian (É. Kiss and Gyuris 2003; Jackson 2008), Greek (Baltazani 2002), Korean (Lee 2006), and Japanese (Hara 2006; Kuno 1973; Nakanishi 2008; Oshima 2008; Tomioka 2010a, 2010b). Previous studies have mainly focused on theoretical and pragmatic explanations for why certain interpretations arise during specific prosodic events. However, there has been much less experimental evidence presented regarding whether people actually make such interpretations. This paper will focus on an ambiguous construction in Japanese which involves a -wa-marked (Topic-marked) quantifier subject and negation. The experiments reported in this paper will investigate whether adult and child native speakers of Japanese are sensitive to this type of prosodic manipulation in the ambiguous sentences. Additionally, by considering the reaction time from hearing a sentence to selecting an interpretation, this paper aims to contribute to the understanding of the processing of the semantic and prosodic link in Japanese, particularly in the context of contrastive topic marking and quantifier scope.
1.1 Contrastive topic
Although the main focus of this paper concerns the prosodic effect on quantifier scope in Japanese, let us briefly review the discussion in the English literature. Jackendoff (1972) proposes that the sentence as in (1a), uttered with the pitch accent on the quantifier and falling at the end corresponds to the “All > Not” interpretation, which means none of the men went, and that the sentence as in (1b), uttered with the pitch accent on the quantifier and rising at the end corresponds to the “Not > All” interpretation, where not all of the men went.
ALL the men didn’t go.L%1 (All > Not; None of them went) |
- 1
L and H stand for low and high tones, respectively. Boundary tones are indicated using the symbol %, e.g., low boundary tone as L%.
ALL the men didn’t go…L-H% (Not > All; Not all of them went) |
The prosodic contour for (1b) has been variously referred to, such as ‘B-accent’, ‘Fall-Rise’, ‘Rise-Fall-Rise (RFR)’ etc. in the literature. Büring’s (1997, 2003) work along with Constant’s (2012, 2014) analysis forms the basis for examining how the specific prosody on Contrastive Topic results in the interpretation we have observed. According to their analysis, this specific prosody is employed to mark the Contrastive Topic (CT) of the sentence. CT is used when there are other alternatives in the context that are not resolved, as in (2).
Did your friends like the movie? |
JOHN liked it…L-H% |
MOST of my friends liked it…L-H% | (Constant 2012: (23), (33a)) |
#ALL my friends liked it…L-H% |
In (2), the conversations between A and B, and A and C are felicitous. When the interlocutor utters John or most of their friends with a pitch accent, a set of alternatives are generated, such as {John, Mary, Bill, Sue …} and {all, most, just some, none}. After asserting John liked it, the questioner can further ask whether Mary, Bill, or Sue liked it or not. Namely, some of the alternative propositions remain unresolved after assertion. The CT prosody is felicitously licensed in such a case. On the other hand, the conversation between A and D is infelicitous with the CT prosody. A set of alternatives generated would be {all, most, just some, none} – once the statement in D is asserted, all of the other alternatives are resolved. There is no point employing the CT prosody when there is no alternative proposition remaining unresolved. That said, in order for the CT to be felicitously licensed in a sentence, there is a condition to be satisfied; there needs to be at least one proposition in the alternative set remaining unresolved. This general requirement regarding CT applies to the case of quantifier scope relations. Take the sentence in (1), and compare how the alternatives fare in the two interpretations, “All > Not” and “Not > All” to see only the “Not > All” interpretation is felicitous with the CT prosody.
ALL the men didn’t go. |
![]() |
![]() |
If we assert the “All > Not” interpretation, all the other alternatives are resolved, since we already know that most/some of the men did not go is true as the proposition is entailed by the assertion. On the other hand, if we assert the “Not > All” interpretation, we still have other alternative propositions remaining unresolved. That is why only the “Not > All” interpretation is felicitous with the CT prosody. The mechanism is summarized in (4).
Mechanism of how CT contributes to a specific interpretation: |
The CT activates a set of relevant alternatives at the term that is contrasted. |
Proposition alternatives are generated. |
Compare the sets of alternatives for each of the interpretations. |
Identify which interpretation satisfies the CT condition of unresolved alternatives. |
As we have seen in this section, the effects of CT itself and its prosodic impacts are not unique to scope relations. Rather, there exists a general effect of CT in pragmatics, and applying this to scope relations naturally explains the effects observed in scope interactions.
1.2 Thematic and contrastive topic -wa
The analysis extends to the case of Japanese, which marks the CT by using -wa, a Topic marker. I will not go into details of the analysis of the nature of CT in Japanese (See Hara 2006; Nakanishi 2008; Oshima 2008; Tomioka 2010a, 2010b), but what seems to be agreed on in the literature is that (i) a Japanese CT is accompanied with -wa, and the sentence would be infelicitous if the other case particle (e.g., -ga, a nominative case marker) is used, and (ii) the prosodic characteristics of a Japanese CT are basically identical to that of a focus. For a proto-typical focus in Japanese, a prominent pitch accent is given, and the pitch of the rest of the sentence gets reduced or lowered. Furthermore, the high boundary tone at the end of the CT is not (necessarily) observed in Japanese.
Japanese topic marker -wa has dual meanings; Thematic Topic (TT) -wa and Contrstive Topic -wa. The thematic -wa is characterized by having an intact pitch accent in the rest of the sentence (typically the predicate), as depicted in the following diagram by Nakanishi (2008). Since -wa is a topic marker, it can be attached to other elements than the subject of a sentence (e.g., object).
Just as English CT, the function of CT is to generate a set of alternatives and indicate that there is at least one alternative that is not resolved after assertion. The CT prosody itself can be applied to sentences without scope interactions. For example, to a question as in (2A), one can answer with “[JOHN-wa] CT kiniitta (‘John liked it.’).” After hearing this utterance, the interlocutor can ask about other contextually relevant alternatives, such as whether Mary or Bill liked it or not. The function of CT remains the same even when elements other than the subject are marked with -wa and read with the CT prosody.
The following conversation between A and B illustrates a typical example of the CT prosody and its interpretation. As pointed out in Tomioka (2010a, 2010b), there are variations as to where to place the pitch peak. Many of Tomioka’s informants preferred to place the pitch peak on the particle, rather than the focused phrase (Tomioka 2010a: fn 3). One can also place the prominence of pitch on both the focused phrase and the particle, though the utterance would sound very colloquial and exaggerating.
Did all your friends come to the party? |
MINNA-wa/Minna-WA | [ko-nakat-ta]reduced pitch. |
everyone-top | come-neg-past |
‘Everyone didn’t come. (Not>All)’ |
MINNA-wa | [ko-nakat-ta]non-reduced pitch. |
everyone-top | come-neg-past |
‘Everyone didn’t come.’ |
Minna-ga | ko-nakat-ta. |
everyone-nom | come-neg-past |
‘Everyone didn’t come.’ |
The response by C (with the TT prosody) and D (with a nominative case marker) are generally assumed to indicate the unambiguous “All > Not.”[2]
As we have seen in this section, the phenomenon of prosody influencing interpretation is observed in multiple languages. The mechanism involves intricate semantic and pragmatic calculations. In the following section, I will introduce some prior studies that have investigated whether children are sensitive to these prosodic manipulations.
2 Previous studies in the first language acquisition
The first language acquisition literature to look into the scopal relation between a universal quantifier (in the subject position) and negation by English-speaking children has seen two main findings. One is that children naturally have a strong bias towards the “All > Not” interpretation. But at the same time, the other important finding from the extensive research has shown that children can access the inverse, “Not > All” interpretation, when the context is manipulated to alleviate the processing load to interpret negated sentences or by semantic priming (Musolino 1998; Musolino and Lidz 2006; Viau et al. 2010, among many others).[3] Prosodic manipulation, however, had not been systematically investigated until recently, except one study by Iannucci and Dodd (1980). If we turn to an experimental study to test adults’ interpretations of ambiguous sentences with different types of prosody, Syrett et al. (2014) report that English-speaking adults were successful in differentiate the interpretations according to different prosodic features. Iannucci and Dodd (1980) conducted a picture-selection experiment (between “All > Not” picture and “Not > All” picture) with children ranged from kindergarten (K), Grade 2, 4, and 7, and adults, with sentences using Rise-Fall-Rise (RFR) and Falling prosody. The rates of choosing the “Not > All” picture under the RFR prosody by K, Grade 2, 4 and 7 were 38 %, 47 %, 46 %, and 58 % respectively, contrasting with adults’ 96 %. The rates of choosing the “Not > All” picture under the falling prosody, on the other hand, were 27 %, 19 %, 17 %, 18 % by respective child groups, whereas the rate by adults was 18 %. This study is highly suggestive in that (i) children who are older than kindergarteners seem to behave differently to the prosodic manipulation (RFR vs. Falling), and (ii) children seem to behave differently from adults under the RFR condition.
Sugawara et al. (2018) conducted an elaborated picture-selection task to revisit the issues. The experiment was aimed to investigate the effect of the prosody in potentially ambiguous English sentences with 24 adult and 32 child participants who are native speakers of English. The design involves two phases. The first phase introduces the situation (e.g., “Hey look, a bunch of apples here in this tree.”) and the second phase shows four pictures, illustrating the situations of (i) “All > Not” (None of the items VP-ed), (ii) “Not > All” (Some but not all of the items VP-ed), (iii) “All did” (All of the items VP-ed), and (iv) “irrelevant” event (See Figure 1). The position of the pictures was counter-balanced. The participants were asked to point to the picture which matches what the character said.

Experimental design (Sugawara et al. 2018).
The results from adult participants show that they chose the “Not > All” pictures 23 % of the time under the Falling prosody condition and 72 % under the RFR prosody condition. Similarly, the children (aged from 4;4 to 6;10, M = 5;2) chose the “Not > All” pictures 30 % of the time under the Falling prosody condition and 70 % under the RFR prosody condition.[4] The study showed that naïve speakers, both adults and children, significantly preferred the “Not > All” interpretation in the sentences uttered with the RFR prosody.
In this experimental design, it is worth pointing out that the first phase of each trial was identical regardless of the condition. This means that the participants were not biased towards either of the interpretations from the start, unlike in a truth value judgment task, where one of the interpretations (situations) is presented to the participants. In the picture-selection design, they were asked to determine which interpretation was more suitable only after hearing the stimulus sentence. Therefore, the significant differences between the RFR prosody condition and the Falling condition can be attributed solely to the prosodic manipulation. Furthermore, since the non-chosen interpretation was presented as an option, the results can be safely said to indicate a preference for the chosen interpretation over the alternative.
Yatsushiro et al. (2019) conducted an experiment following Sugawara et al.’s (2018) design to test German sentences with the neutral prosody and the German CT prosody. They recruited 42 German-speaking children (ranged 3;6 to 6;11, M = 5;2) and 20 adults. They also found that both adults and children preferred the “Not > All” interpretation under the CT prosody.
Now, let us turn to Japanese CT prosody. Hattori et al. (2006) conducted an experiment involving 22 four-to five-year-old (M = 5;5) native speakers of Japanese children to investigate the relationship between prosody and scope interpretation. They used the TVJT for their experimental setup. In a story, three animals appear for a race on Sports Day, but two end up not running the race and just one animal runs the race. At the end of the story, the puppet says the following target sentence:
Minna-wa | hasira-nakat-ta-yo. | (Hattori et al. 2006: 10) |
Everyone-top | run-neg-past-excl | |
‘Everyone didn’t run.’ |
Children’s task is to judge whether the puppet is right or wrong relative to the story. They had four “Not > All” stories, and for two of them the target sentences were uttered with the TT prosody while for the rest two, the sentences were uttered with the CT prosody. Since the stories support the “Not > All” interpretation, the predictions are that the sentences with the TT prosody are rejected while the sentences with the CT prosody are accepted. The results show that children rejected the sentences with the TT prosody 86.4 % of the time, and they correctly accepted the sentences with the CT prosody 65.9 % of the time. They also noted that there were five children who constantly assigned the surface “All > Not” interpretation. The overall results indicate that most of the children do access the “Not > All” interpretation with the CT prosody at the age of five.
This study is nicely designed but is not free from limitations. First, they did not test an adult control group. They mentioned that Nakanishi (2008) tested the TT and CT sentences with adults to confirm the prosodic effect. However, there were only four informants, and it seems that only one pair of sentences were tested (Minna-wa ne-nakat-ta. ‘Everyone did not sleep’ with TT and CT prosody). Indeed, Nakanishi’s informants answered that the TT prosody corresponds to the “All > Not” interpretation and the CT prosody to the “Not > All” interpretation. It is, however, worth noting that Japanese has different ways to embrace a quantifier around the subject NP, as shown in the following examples. It is possible that different structures yield different results in adults, and thus it is important to set up a baseline by conducting an adult control experiment with a larger sample size.
Gakusee | minna-wa | hasira-nakat-ta. |
student | everyone-top | run-neg-past |
Gakusee-wa | minna | hasira-nakat-ta. |
student-top | everyone | run-neg-past |
Zen’in-no | gakusee-wa | hasira-nakat-ta | |
everyone-gen | student-top | run-neg-past | a-c: ‘Everyone didn’t run.’ |
Zenbu-wa | hiraka-nakat-ta. | |
everything-top | open-neg-past | d: ‘Everything didn’t open.’ |
Secondly, in Hattori et al.’s experiment, all the four stories supported the “Not > All” situation. This leads to two potential issues; (i) the results might have overestimate children’s competence, and (ii) we cannot know if children give ambiguous interpretation to the CT prosody (i.e. allowing both “Not > All” and “All > Not” interpretations) or have preference towards the “Not > All” interpretation. Having only the “Not > All” stories means that one just has to answer “yes” to be regarded as correct under the CT prosody. There was no trial included where children had to reject the “All > Not” situation to be regarded as correct under the CT prosody.[5] Considering that children tend to have what is called “yes-bias,” where children who are uncertain have a bias to answer “yes,” it is possible that the correct “yes” responses have come from those who consciously prefer the “Not > All” interpretation with the CT prosody and also from those who were not sure. It is worth investigating if children prefer the “Not > All” interpretation over the “All > Not” interpretation when a sentence is presented with the CT prosody.
The current study aims to follow up on Hattori et al.’s (2006) study, with the picture-selection method involving adult participants as well as children to address the raised issues.[6] Specifically, we conducted three studies. The first one is a preparatory survey which aims at gauging possible, likely interpretations in sentences with varied word orders and determining the sentence structure to be implemented in the experimental stimuli. The second one is an experiment testing adults’ interpretations of ambiguous sentences with TT and CT prosody, and the third one is an experiment testing children’s interpretations of them with TT and CT prosody.
3 Experiments
3.1 Preparatory survey
As pointed out in Section 2, there are different ways to express quantified NPs in Japanese. To investigate how different word orders can affect interpretation, and to determine which word orders are (potentially) most influenced by prosodic manipulation, a simple survey was conducted. Eighty-seven university students from Mie University were recruited to evaluate the interpretations of sentences (9a-d), while an additional 131 students from the same university assessed the sentences (9e, f).
Kasa-ga | zenbu | kawaka-nakat-ta-yo. | (NP-ga all neg) |
umbrella-nom | all | dry-neg-past-excl |
Kasa | zenbu-ga | kawaka-nakat-ta-yo. | (NP all-ga neg) |
umbrella | all-nom | dry-neg-past-excl |
Kasa-wa | zenbu | kawaka-nakat-ta-yo. | (NP-wa all neg) |
umbrella-top | all | dry-neg-past-excl |
Kasa | zenbu-wa | kawaka-nakat-ta-yo. | (NP all-wa neg) |
umbrella | all-top | dry-neg-past-excl |
Subete-no | kasa-ga | kawaka-nakat-ta-yo. | (all-gen NP-ga neg) |
all-gen | umbrella-nom | dry-neg-past-excl |
Subete-no | kasa-wa | kawaka-nakat-ta-yo. | (all-gen NP-wa neg) |
all-gen | umbrella-top | dry-neg-past-excl | |
‘All of the umbrellas didn’t dry.’7 |
- 7
In subsequent experiments, we utilized stimuli similar to those used in previous studies (Sugawara et al. 2018, Yatsushiro et al. 2019), where subjects are inanimate and verbs are primarily unaccusative. Therefore, in this preparatory survey, I employed an example similar to those used in prior research. Due to the inanimate nature of the subjects, the word minna (‘everyone’), which was used in Hattori et al. (2006), was not suitable. Instead, the terms like zenbu (‘all’) and subete (‘everything’) were used.
The participants were handed the questionnaire with a list of pictures (Figure 2) for each of the sentences. They were asked to read the sentences and circle the interpretation(s) that they thought were possible as an illustration of the sentence. They were explicitly instructed to circle multiple pictures if they thought multiple situations were possible. They were also encouraged to imagine employing different prosody to read those sentences when they think about possible interpretations. The survey took at most a few minutes and was conducted as a part of a class activity. The participants had little background in linguistics, and no relevant literature was introduced in the class prior to the survey. The participants were not compensated, and non-participants did not face any disadvantages in the class.

List of pictures for the survey.
The results are shown in Figure 3. The numbers within the graph represent percentages. The dark bar represents the responses where only the “All > Not” picture was chosen, the middle part represents the responses where both the “All > Not” and “Not > All” pictures were chosen, and the light bar represents the responses where only the “Not > All” pictures were chosen.[8]

Responses to the survey.
Here are some observations to be noted. Firstly, it is notable that even for word orders like (9a) and (9b), traditionally considered unambiguous and rigid, there were a certain number of responses indicating ambiguity, and we even saw a few responses only allowing the “Not > All” interpretation. Let us now turn to the interaction between the term zenbu/subete ‘all’ and -wa (9c, d, f). It appears that when ‘all’ precedes -wa the sentence tends to favor the “Not > All” interpretation, rather than the other way around. The survey also clarified which word orders contribute to rigidity or flexibility in interpretation. The sentences like (9d) and (9e) were rated as virtually unambiguous – (9d) favors the “Not > All” interpretation whereas (9e) favors the “All > Not” interpretation.[9] Also, the word order in (9f) yielded most responses of “ambiguous” among the six conditions.
Based on these results, we decided to use the structure and the word order as in (9f) in the stimulus sentences for the following experiments, in order to investigate the extent of the effects brought about by prosodic manipulation.[10],[11]
3.2 Experiment – adults
3.2.1 Method and design
The experiment was designed following Sugawara et al.’s (2018) picture-selection task. There were two phases as depicted in Figure 4, and a pre-recorded dialogue between a girl and a boy was played. Although the experiments in Sugawara et al. (2018) and Yatsushiro et al. (2019) were conducted using PowerPoint slides and manual advancement of the scenes, the current experiment was constructed using PsychoPy[12] on a Windows laptop computer to measure the response time. A setup of five buttons, about 10 cm in diameter, was connected to the computer via USB. This included four buttons corresponding to the four keys for choosing pictures, and one button corresponding to the space key to advance to the next scene. The participants were instructed to press the button corresponding to the space key for playing the audio and advancing the scenes. When presented with four choices, they were directed to press one of the four buttons that they thought appropriate for what the audio indicated. Their choices and the response time was recorded. Participants first practiced with four trials using sentences unrelated to the main experiment, before moving on to the main trials. They participated in the experiment in a quiet room while wearing headphones. The main trial consisted of eight target sentences (4 TT and 4 CT sentences) and twelve filler questions, presented in a pseudo-random order (See appendix for the list of target sentences). In the preliminary explanation, the participants were instructed to respond intuitively without overthinking in a case where multiple pictures might be considered viable. They were also told that inconsistency in choices within a session is completely acceptable. Including the preliminary explanation, the experiment took about 10–15 min. See Appendix for the list of target and filler items as well as an example of a lead-in question.[13]

Experimental procedure.
There were three levels in the prosodic type (TT vs. CT-A (-wa focus) versus CT-B (phrase focus)). Each item list contained 4 TT items and 4 CT-A items or 4 CT-B items, as well as practice and filler items. That is, each of the participants heard both TT items and CT items within a session, but CT-A versus CT-B was a between-subjects factor. We prepared two types of CT prosody since we were aware of the variations of the CT prosody as discussed in the example of (6B). The pitch contour of each of the prosodic types are depicted in Figures 5, 6, and 7. In all of the figures, the depicted pitch range on the vertical axis is between 100 Hz and 500 Hz.

Pitch contour for thematic topic prosody.

Pitch contour for contrastive topic (-wa focus) prosody.

Pitch contour for contrastive topic (phrase focus) prosody.
The TT prosody clearly has two peaks in a sentence, one on the quantifier súbete ‘all’ and the other on the predicate. The CT-A (-wa focus) prosody has a clear peak on the topic marker -wa, and the predicate gets a lowered pitch contour over all. The CT-B (phrase-focus) prosody has a prominence on the quantifier and the predicate gets a lowered pitch contour.[14]
Our hypotheses are as follows. We will see different behaviors under the TT and CT conditions. More specifically, as the previous literature suggests, with the TT prosody, we will see predominantly the “All > Not” responses whereas with the CT prosody we will see predominantly the “Not > All” responses. Although we have not made a specific prediction regarding the response time, it is possible that people take more time to respond to the CT items since interpreting the CT prosody requires intricate semantic and pragmatic computation.
3.2.2 Results
We recruited 34 adult participants (17 on the TT and CT-A condition and 17 on the TT and CT-B condition) with no or little linguistics background. The participants were mostly undergraduate students at Mie University. Of 272 relevant data points (8 items * 34 subjects), 8 data points were excluded from the analysis since they were responses that chose a positive “All VP-ed” picture. That is, 264 data points were analyzed as binomial parameters. Figures 8 and 9 show the rates of choosing “All > Not” pictures by condition. The error bars indicate 95 % confidence intervals in the following analyses.

Percentage of “All > Not” on the CT-A (-wa focus) condition by adults.

Percentage of “All > Not” on the CT-B (phrase focus) condition.
The rates of choosing the “All > Not” pictures for the TT and CT-A conditions were 36.5 % and 4.4 %, and those for the TT and CT-B conditions were 37.9 % and 3.0 %, respectively. Using Generalized Linear Mixed Model, statistical analysis revealed main effects of prosodic type in both sub-experiments (p < 0.01 for both conditions).[15] Our hypotheses were only partially confirmed by the results. With the CT prosody, indeed we predominantly saw the “Not > All” choices. With the TT prosody, however, the responses indicated people showed ambiguity, rather than the predominant “All > Not” interpretation as the previous literature had assumed. In fact, more than 60 % of the choices were for the “Not > All” interpretation.
Let us now turn to the results of the response time. The response time was recorded as the onset of the target sentence until the button pressing. Therefore, the response time for the analysis was calculated by subtracting the duration of the sound file. A datum which resulted in negative value was excluded from the analysis (n = 1). All the other data which were used in the choice analysis (263 data points) were analyzed in the RT analysis, where the raw RTs were converted to logged RTs. Also, the data in the sub-experiments are merged in the RT analysis. Figure 10 summarizes the results.

Logged RT by the prosody.
The average raw RTs of the TT condition was 3640 ms, that of CT-A (-wa focus) was 2141 ms, and that of CT-B (phrase focus) was 2784 ms. The analysis of the logged RTs, using Linear Mixed Model, showed a significant difference between the TT (intercept) and CT-A (t value = −4.335) and between the TT and CT-B (t value = −2.563).[16]
Contrary to our predictions, which were based more on intuition than on theoretical grounding, our RT results showed that reaction times were significantly faster in the CT conditions than in the TT condition.
The following figure shows the logged RTs broken down by the participants’ choice and the conditions. Note that there were not many “All > Not” choices on the CT-A (-wa focus) and CT-B (phrase focus) conditions (Figure 11).

Logged RTs by the choice.
Under the TT condition, participants took a similar amount of time to choose a picture, regardless of the picture chosen. However, under the CT conditions, when participants selected the “Not > All” picture, as prompted by the prosody, their reaction time was quicker. In contrast, choosing the “All > Not” picture under the CT conditions seemed to involve more deliberation, as indicated by longer reaction times.
3.2.3 Discussion
The results from the adult experiment indicate that, even with the TT prosody, interpretations can be ambiguous. It was observed that more than half of the participants in our experiment preferred the “Not > All” interpretation under the TT prosody. This trend is at odds with what the previous literature has argued and is worth reporting. For sentences with the same structure but uttered with the CT prosody, participants almost exclusively accessed the “Not > All” interpretation. This is in line with the theoretical analysis that has been made in the literature. Additionally, shorter reaction times were noted under the CT prosody conditions. These findings suggest that in such sentences that were tested in our experiment with the TT prosody, listeners access both “All > Not” and “Not > All” interpretations, leading to longer reaction times due to hesitation. The lack of clear differences in picture choices and in reaction times implies no strong preference. CT prosody appears to streamline choices by narrowing down multiple LFs, facilitating quicker decision-making, unlike TT prosody, where conscious selection from multiple LFs is required. This suggests that the processing load involved in complex semantic and pragmatic computations under the CT prosody might be cancelled out by the effect of narrowing down the choices.
3.3 Experiment – children
3.3.1 Method and design
The experiment was designed following the format of the adult experiment, with two key differences: (i) the number of filler items was reduced from 12 to 6, and (ii) for the eight target items, the prosody type was treated as a between-subjects variable. That is, they heard 8 target items of the same prosodic type as well as practice and filler items. The reason for changing the design from within-subject to between-subjects was our concern that four items per condition per child might not be sufficient. This is because children might make more errors, such as choosing a positive picture for a negated sentence. Like the adult experiment, this experiment was conducted using PsychoPy on a computer, connected to five easily-pressable buttons about 10 cm in diameter via USB (four for picture selection and one for advancing scenes). The experimenter was present next to the participant, providing instructions and advancing the scenes during the experiment. Participants were instructed to choose the picture they thought matched the girl’s speech by pressing a button. Before the practice items, additional exercises ensured understanding that the four buttons corresponded to the four on-screen images. The experimenter pressed the button to advance scenes (equivalent to the space key), while participants were asked to only press the buttons during picture selection. The buttons (instead of asking to press keys on the keyboard) were used for ease, but issues like children pointing at the screen before pressing, confirming their choice with the experimenter, or not pressing the button correctly (e.g., too softly or too hard) led to challenges in accurately measuring reaction times in children’s experiments. Therefore, the results for reaction times will be reported but not considered very reliable. Future studies using touchscreens or eye-tracking methods are suggested. The experiment took about 10–15 min.
3.3.2 Participants
We recruited 45 child participants (15 on the TT condition, 15 on the CT-A (-wa focus) condition, and 15 on the CT-B (phrase focus) condition). The six filler items in the experiment did not contain any negation, making them incompatible with the “All > Not” interpretation. Trials where participants erroneously chose the “All > Not” picture or an irrelevant one for filler items were considered incorrect. The accuracy rate for the filler items was calculated for each participant. One participant (in the CT-B condition) with a correct response rate below 60 % was excluded from the subsequent analysis. That is, 44 children (ranged 4;0 to 6;10, M = 5;10) are included in the analysis. There were 8 four-year-olds, 10 five-year-olds, and 26 six-year-olds, and the mean age was 6;1 for the TT condition, 5;9 for the CT-A condition, and 5;7 for the CT-B condition. Among them, 21 participants were recruited before the COVID19 pandemic in the Mie area, and the remaining 23 were recruited during and after the pandemic in the Tokyo area.[17]
3.3.3 Results
Of 352 relevant data points for the target items (8 items * 44 subjects), there were no responses that chose the irrelevant picture. However, there was a certain number of erroneous responses which chose the “All VP-ed” pictures.[18] Figure 12 and Table 1 summarizes the responses by the condition. The figure shows the proportion of the types of responses and the table provides the raw count for the responses.

Breakdown of answers by condition (Children).
Raw count of the responses by condition (Children).
Conditions | |||
---|---|---|---|
TT prosody | CT-A prosody | CT-B prosody | |
“All > Not” picture | 61 | 16 | 15 |
“Not > All” pictures | 59 | 85 | 74 |
“All VP-ed” pictures (error) | 0 | 19 | 23 |
A Chi-square test for independence was conducted to examine the distribution of responses across the three conditions. The results revealed significant differences in responses across all three groups (p < 0.01). Further Chi-square tests between TT and CT-A, TT and CT-B, and CT-A and CT-B (with a significance level set at p = 0.0167 based on the Bonferroni method) showed significant differences between TT and CT-A (p < 0.01), and TT and CT-B (p < 0.01).
Figure 13 shows the average logged RT by the condition. Figure 14, in turn, summarizes the logged RT broken down by the prosody and choice. In Figure 14, the choice of “All VP-ed” picture is indicated as Pos, standing for the positive (not negative) picture. Since there was no error in the TT condition, the “TT-Pos” is not depicted in the figure. As noted earlier, the RT results are not considered to be very reliable, but they are suggestive in that we can observe some trends to be discussed in the discussion section.

Logged RTs by the prosody (Children).

Logged RTs by the choice (Children).
The average raw RTs of the TT condition was 4274 ms, that of CT-A (-wa focus) was 5038 ms, that of CT-B (phrase focus) was 5699 ms. The analysis of the logged RTs, using Linear Mixed Model, showed no significant differences between conditions.
3.3.4 Discussion
The children’s experiment showed several findings. First, sentences with the TT prosody were interpreted as ambiguous by children, a result in line with adult findings. This finding was highlighted through the picture-selection method. In our experiment, the participants were evenly split between “All > Not” and “Not > All” interpretations, and the lack of reaction time differences can suggest that they made choices without hesitation.
Secondly, there were notable differences in the rate of choosing the erroneous “All VP-ed” picture across conditions. The picture representing the “All VP-ed” situation should not be chosen if participants notice the negation in the stimulus sentence. In Japanese, the negation appears towards the end of the sentence, which might cause participants to overlook it due to being preoccupied with semantic computations related to the CT prosody at the beginning of the sentence. The error was absent in the TT condition but occurred 15–20 % under the CT conditions. This suggests additional cognitive load in processing sentences with the CT prosody. It is possible that they are following the mechanism illustrated in (4), which would give them more processing load.
Thirdly, no significant difference was found between CT-A and CT-B prosody, indicating children processed both as CT prosody. While some literature reports informants’ preference for CT-A (-wa focus) prosody, theoretically, prosody like CT-B (phrase focus) has been more extensively analyzed. It is speculated that children might receive more input in the CT-A form, but this experiment demonstrated that both CT-A and CT-B were processed similarly as CT prosody.
Lastly and importantly, it has been shown that children exhibited different interpretations for the TT and CT prosody, with CT prosody excluding the “All > Not” interpretation. As observed in the first point, since the TT prosody results in ambiguous interpretations, the effect of using the CT prosody can be seen as reducing the “All > Not” interpretations.[19]
4 General discussion
The overall results suggest that both adults and children exhibited different interpretations for the TT and CT prosody, highlighting CT’s role in reducing the “All > Not” interpretation. While previous literature (both theoretical and experimental studies) has noted different behaviors in TT and CT prosody, the current experiments highlighted a significant ambiguity perceived by speakers when hearing the stimuli with the TT prosody. In languages like English and German, which inherently have flexible scope, starting with the two LFs (“All > Not” and “Not > All”) for the semantic computations under the CT prosody is straightforward. However, in Japanese, if the sentences read with the TT prosody are interpreted solely as “All > Not” and that provides the starting LF for the computations under the CT, it raises questions about how the “Not > All” LF is accessed when the computation starts. If, on the other hand, the sentences like our stimuli read with the TT prosody are ambiguous, then, two LFs can be assumed as candidates for semantic interpretation similar to English and German.
While the discussion on the scope rigidity of Japanese is beyond the scope of this paper, the current experiment points out the possibility that speakers allow flexibility to a great extent even in structures traditionally considered rigid. I would hope the current study opens up the investigation of such topics. The possible factors to affect the scope flexibility/rigidity would include word choice (e.g., minna ‘everyone’ vs. NP zenin ‘every NP’), animacy, verb types (e.g., transitive vs. intransitive), types of scope relations (e.g., involving multiple quantifiers vs. a quantifier and negation), word order (as discussed in the preparatory survey), and many other factors. Future experiments on these factors are anticipated.
Turning to our results, the difference in RT analyses between adults and children indicates that adults are quicker in computation with the CT prosody. Viewing only adult results might lead one to question if the complex computations proposed in prior studies are indeed necessary to achieve the effects of the CT prosody. There might also be speculation that the CT prosody is the default, with something special occurring in interpreting the TT prosody. However, these doubts are likely resolved when comparing with children’s results. Children’s results suggest that the TT prosody involves fewer errors and thus less processing load, indicating that CT prosody’s semantic computations add processing load, leading to more errors. On top of the practical reasons of using the TT prosody to convey neutral meanings, its ease of comprehension for children suggests that the TT prosody is indeed the default. Children process the sentences with the TT prosody with no issues, and they can make use of the effects of the CT prosody when they process the sentences with the CT prosody but it appears to take time.
Another possibility to account for the difference in RT analyses between adults and children comes from a limitation of the study.[20] We used one type of lead-in question for the target items, namely, “Did all…?”-type questions. As illustrated in (6), the answer with the “All > Not” reading is theoretically not ideal for a “Did all…?”-type question.[21] Consequently, there might have been infelicity in the question-answer string with the TT prosody, if the meaning linked to the TT prosody is primarily “All > Not.” It is possible that adults were sensitive to the infelicity and thus took longer RTs with the TT prosody than with the CT prosody. Also, it is possible that children were not sensitive to this and thus did not show differences between TT and CT. This issue will be addressed by conducting a follow-up study with fully congruent question-answer pairs for both TT and CT.
Since the semantic and pragmatic computations for the CT in Japanese align with the proposals in languages like English and German, similar outcomes are anticipated for sentence processing involving CT in other languages. It remains to be seen whether other languages will also exhibit faster reaction times for CT, compared to their corresponding neutral prosody. As previously observed in Section 1.1, the CT prosody is not limited to sentences involving flexible scope relations. Rather, it is prevalent in everyday conversations that generate a set of alternatives. The adult-like rates and/or the speed to resolve the scope interpretations under the CT prosody may correlate with the general ability to correctly interpret conversations that include contrast. The application in this direction poses a question worth exploring.
5 Conclusions
In the current paper, new sets of data were presented that showed that both adults and children exhibit sensitivity to the prosodic manipulation in sentences with a universal quantifier and negation. To be more specific, both adults and children demonstrated the reduced prevalence of the “All > Not” interpretation under the CT prosody. Although prior research has acknowledged these variations in the TT and CT prosody, this study uniquely emphasized the substantial ambiguity speakers experience with the TT prosody in the presented stimuli. The results became clear through experiments using a picture-selection task, where participants inferred meanings from the sentences they heard. The difference in the trends in reaction times points to possible differences in cognitive processing between adults and children. For adults, the CT prosody facilitates the processing, by eliminating one of the possible LFs. However, for children, the computations involved in the CT prosody result in higher cognitive load and more time-consuming sentence processing. They can make use of the role of prosody in interpreting scope relations similarly to adults, but how effective they can use the knowledge will develop gradually. The interaction of this ability with other general cognitive abilities is a promising area for future developmental research.
Acknowledgments
I am grateful to the participants in the study, especially the children, their parents and their teachers at daycares and preschools in the Mie and Tokyo areas, Japan. I would also like to thank my research assistants at Waseda University, for their enormous help with conducting experiments. The preliminary results of the experiments were presented as a poster at the Generative Approaches to Language Acquisition North America 9 (GALANA 2021). I thank the audience there for their helpful and critical comments and suggestions. I would also like to extend my gratitude to the two anonymous reviewers for their insightful comments, which greatly helped in improving this paper. This research is partially supported by the JSPS KAKENHI Grants (#19K13221, 21K18370), the JSPS Core-to-Core Program (#JPJSCCAJ 231702005), Waseda University Special Research Projects (2019C-547, 2020C-632, 2021C-181). Their support is gratefully acknowledged.
Appendix: Example of lead-in question for target items, and all the target items for the adult and child experiment are listed below.
Lead-in question (example corresponding to the one to prompt the target item in (2a) of Appendix) | ||
Syatu-wa zenbu kawai-ta | kana? | |
shirts-top all dry-past | Q | |
‘Did all the shirts dry?’ |
Target items |
Subete-no | syatu-wa | kawaka-nakat-ta-yo. |
all-gen | shirts-top | dry-neg-past-excl |
‘All of the shirts didn’t dry.’ |
Subete-no | kasa-wa | kawaka-nakat-ta-yo. |
all-gen | umbrella-top | dry-neg-past-excl |
‘All of the umbrellas didn’t dry.’ |
Subete-no | tako-wa | toba-nakat-ta-yo. |
all-gen | kites-top | fly-neg-past-excl |
‘All of the kites didn’t fly.’ |
Subete-no | mado-wa | simara-nakat-ta-yo. |
all-gen | window-top | close-neg-past-excl |
‘All of the windows didn’t get closed.’ |
Subete-no | remon-wa | oti-nakat-ta-yo. |
all-gen | lemon-top | fall-neg-past-excl |
‘All of the lemons didn’t fall.’ |
Subete-no | hune-wa | sizuma-nakat-ta-yo. |
all-gen | ship-top | sink-neg-past-excl |
‘All of the ships didn’t sink.’ |
Subete-no | jyuusu-wa | kobore-nakat-ta-yo. |
all-gen | juice-top | spill-neg-past-excl |
‘All of the juice didn’t get spilled.’ |
Subete-no | keeki-wa | ure-nakat-ta-yo. |
all-gen | cake-top | sell-neg-past-excl |
‘All of the cake didn’t get sold.’ |
Filler items for the children’s experiment |
Sukunakutomo | hitotu-no doa-ga | hirai-ta-yo. |
at:least | one-gen door-nom | open-past-excl |
‘At least one door opened.’ |
Sukunakutomo | hitotu-no booto-ga | ukan-da-yo. |
at:least | one-gen boat-nom | float-past-excl |
‘At least one boat floated.’ |
Sukunakutomo | hitotu-no | kutusita-ga | kawai-ta-yo. |
at:least | one-gen | socks-nom | dry-past-excl |
‘At least one (pair of) socks dried.’ |
Ooku-no | mikan-ga | oti-ta-yo. |
many-gen orange-nom | fall-past-excl | |
‘Many oranges fell.’ |
Ooku-no | ranpu-ga | tui-ta-yo. |
many-gen | lamp-nom | light-past-excl |
‘Many lamps lit up.’ |
Ooku-no | hikooki-ga | ton-da-yo. |
many-gen | planes-nom | fly-past-excl |
‘Many planes flew.’ |
References
Baltazani, Mary. 2002. The prosodic structure of quantificational sentences in Greek. In Mary Andronis, Erin Debenport, Anne Pycha & Keiko Yoshimura (eds.), Proceedings of Chicago linguistic society (CLS) 38, 63–78. Chicago, IL: Chicago Linguistic Society.Suche in Google Scholar
Büring, Daniel. 1997. The great scope inversion conspiracy. Linguistics and Philosophy 20. 175–194. https://doi.org/10.1023/a:1005397026866.10.1023/A:1005397026866Suche in Google Scholar
Büring, Daniel. 2003. On D-trees, beans, and B-accents. Linguistics and Philosophy 26. 511–545.10.1023/A:1025887707652Suche in Google Scholar
Chierchia, Gennaro. 2004. Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. In Adriana Belletti (ed.), Structures and beyond, 39–103. Oxford: Oxford University Press.10.1093/oso/9780195171976.003.0003Suche in Google Scholar
Constant, Noah. 2012. English rise-fall-rise: A study in the semantics and pragmatics of intonation. Linguistics and Philosophy 35. 407–442. https://doi.org/10.1007/s10988-012-9121-1.Suche in Google Scholar
Constant, Noah. 2014. Contrastive topics: Meanings and realizations. PhD dissertation. Amherst: University of Massachusetts.Suche in Google Scholar
Crain, Stephen & Rosalind Thornton. 1998. Investigations in universal grammar: A guide to research on the acquisition of syntax and semantics. Cambridge, MA: MIT Press.Suche in Google Scholar
É. Kiss, Katalin & Beáta Gyuris. 2003. Apparent scope inversion under the rise fall contour. Acta Linguistica Hungarian 50(3-4). 371–404.10.1556/ALing.50.2003.3-4.3Suche in Google Scholar
Hara, Yurie. 2006. Grammar of knowledge representation: Japanese discourse items at interfaces. PhD dissertation. Newark, DE: University of Delaware.Suche in Google Scholar
Hattori, Noriko, Seiki Ayano, Dylan Herrick, David, Stringer & Koji, Sugisaki. 2006. Topics in child Japanese. In Proceedings of the 7th Tokyo Conference on Psycholinguistics, 103–120. Tokyo: Hituzi Syobo.Suche in Google Scholar
Iannucci, David & David Dodd. 1980. The development of some aspects of quantifier negation in children. In Papers and Reports on child language development (PRCLD) #19, 88–94. Stanford, CA: Stanford University Department of Linguistics.Suche in Google Scholar
Jackendoff, Ray. 1972. Semantic interpretation in generative grammar. Cambridge, MA: MIT Press.Suche in Google Scholar
Jackson, Scott. 2008. The prosody-scope relation in Hungarian. In Christopher Piñón & Szilárd Szentgyörgyi (eds.), Papers from the Veszprém conference approaches to Hungarian 10, 85–102. Budapest: Akadémiai Kiadó.Suche in Google Scholar
Jespersen, Otto. 1933. Essentials of English grammar. (Eighth printing, 1994). Tuscaloosa, AL: The University of Alabama Press.Suche in Google Scholar
Krifka, Manfred. 1998. Scope inversion under the rise-fall contour in German. Linguistic Inquiry 29. 75–112. https://doi.org/10.1162/002438998553662.Suche in Google Scholar
Kuno, Susumu. 1973. The structure of the Japanese language. Cambridge, MA: MIT Press.Suche in Google Scholar
Ladd, D. Robert. 1980. The structure of intonational meaning. Bloomington, IN: Indiana University Press.Suche in Google Scholar
Lee, Chungmin. 2006. Contrastive (predicate) topic, intonation, and scalar meanings. In Chungmin Lee, Matthew Gordon & Daniel Buring (eds.), Topic and focus: Cross-linguistic perspectives on meaning and intonation, 151–175. Dordrecht: Springer.10.1007/978-1-4020-4796-1_9Suche in Google Scholar
Musolino, Julien. 1998. Universal grammar and the acquisition of semantic knowledge: An experimental investigation into the acquisition of quantifier-negation interaction in English. College Park, MD: University of Maryland PhD dissertation.Suche in Google Scholar
Musolino, Julien & Jeffrey Lidz. 2006. Why children aren’t universally successful with quantification. Linguistics 44(4). 817–852.10.1515/LING.2006.026Suche in Google Scholar
Nakanishi, Kimiko. 2008. Prosody and scope interpretations of the topic marker wa in Japanese. In Chungmin Lee, Matthew Gordon & Daniel Büring (eds.), Topic and focus: Cross-linguistic perspectives on meaning and intonation, 177–193. Dordrecht: Springer.10.1007/978-1-4020-4796-1_10Suche in Google Scholar
Oshima, David Y. 2008. Morphological vs. phonological contrastive topic marking. In Rodney L. Edwards, Patrick J. Midtlyng, Colin L. Sprague & Kjersti G. Stensrud (eds.), Proceedings of Chicago linguistic society (CLS) 41, 371–384. Chicago, IL: Chicago Linguistic Society.Suche in Google Scholar
Sauerland, Uli & Oliver Bott. 2002. Prosody and scope in German inverse linking constructions. In Bernard Bel & Isabelle Marlien (eds.), Proceedings of the 1st international conference on speech prosody, 623–628. Aix-en-Provence: Laboratoire Parole et Langage.10.21437/SpeechProsody.2002-141Suche in Google Scholar
Sugawara, Ayaka, Martin Hackl, Irina Onoprienko & Ken Wexler. 2018. Children know the prosody/semantic link: Experimental evidence from Rise-Fall-Rise and scope. In Katalin É. Kiss & Tamás Zétényi (eds.). Linguistic and cognitive aspects of quantification, 31–55. Cham: Springer.10.1007/978-3-319-91566-1_3Suche in Google Scholar
Sugawara, Ayaka & Ken Wexler. 2014. Children do not accept unambiguous inverse-scope readings: Experimental evidence from prosody and scrambling in Japanese. In Shigeto Kawahara & Mika Igarashi (eds.), Proceedings of formal approaches to Japanese linguistics 7 (FAJL7), 215–226. Cambridge, MA: MIT Working Papers in Linguistics.Suche in Google Scholar
Syrett, Kristen & Julien Musolino. 2013. Collectivity, distributivity, and the interpretation of plural numerical expressions in child and adult language. Language Acquisition 20(4). 259–291. https://doi.org/10.1080/10489223.2013.828060.Suche in Google Scholar
Syrett, Kristen, Georgia Simon & Kristen Nisula. 2014. Prosodic disambiguation of scopally ambiguous quantificational sentences in a discourse context. Journal of Linguistics 50. 453–493. https://doi.org/10.1017/s0022226714000012.Suche in Google Scholar
Tomioka, Satoshi. 2010a. Contrastive topics operate on speech acts. In Malte Zimmermann & Caroline Féry (eds.), Information structure: Theoretical, typological, and experimental perspectives, 115–138. Oxford: Oxford University Press.10.1093/acprof:oso/9780199570959.003.0006Suche in Google Scholar
Tomioka, Satoshi. 2010b. A scope theory of contrastive topics. Iberia 2(1). 113–130.Suche in Google Scholar
Viau, Joshua, Jeffrey, Lidz & Julien, Musolino. 2010. Priming of abstract logical representations in 4-year-olds. Language Acquisition 17(1-2): 26–50.10.1080/10489221003620946Suche in Google Scholar
Ward, Gregory & Julia Hirschberg. 1985. Implicating uncertainty: The pragmatics of fall-rise intonation. Language 61. 747–776. https://doi.org/10.2307/414489.Suche in Google Scholar
Yatsushiro, Kazuko, Ayaka Sugawara & Uli Sauerland. 2019. Quantifier scope and intonation in German. In Proceedings of the 43rd annual Boston university conference on language development, 730–743. Somerville, MA: Cascadilla Press.Suche in Google Scholar
© 2025 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Artikel in diesem Heft
- Frontmatter
- Editorial
- Advancements in Japanese psycholinguistics: developmental and acquisitional perspectives
- Editors’ Notes
- Guest Editors’ Notes
- Articles
- Prosodic influence on quantifier scope interpretation in Japanese-speaking children and adults: a picture-selection study
- Incorrect association of the focus particle dake: new evidence from child Japanese
- Exploring the emergence of language-unique event perception and description in children
- The empathetic utterance-final particle -ne in Japanese: a study on its phonological representation
- Similarity effect in morphological generalization: Using the volitional form elicited production task of Japanese verbs with suru ending
- The role of pitch accent in lexical recognition in Japanese: evidence from event-related potential and gamma-band activity
Artikel in diesem Heft
- Frontmatter
- Editorial
- Advancements in Japanese psycholinguistics: developmental and acquisitional perspectives
- Editors’ Notes
- Guest Editors’ Notes
- Articles
- Prosodic influence on quantifier scope interpretation in Japanese-speaking children and adults: a picture-selection study
- Incorrect association of the focus particle dake: new evidence from child Japanese
- Exploring the emergence of language-unique event perception and description in children
- The empathetic utterance-final particle -ne in Japanese: a study on its phonological representation
- Similarity effect in morphological generalization: Using the volitional form elicited production task of Japanese verbs with suru ending
- The role of pitch accent in lexical recognition in Japanese: evidence from event-related potential and gamma-band activity