Abstract
Despite the increased research on multiword expressions (MWEs) in the past two decades, there is a lack of theory that can explain the often-contradictory findings of those studies. Therefore, the aim of this paper was to propose two models, (a) one explaining differences in MWE density produced by speakers and writers at different proficiency levels and (b) another explaining differences in the difficulty of acquisition of different MWE groups. Taken together, they should theoretically predict differences in phrase density of MWEs produced by language users of different proficiency levels, with any L1-L2 combination, for various MWE groups.This paper validated the models for use for the test takers of the KIT Speaking Test (L1 Japanese-L2 English). The data used came from a 222,777-word corpus containing 574 examinees’ responses across four proficiency groups. Findings show that the proposed models predicted the phrase density patterns very well for 12 different MWE groups. These models not only contribute to theory, but also have the practical application of helping identify the MWE groups that raters should focus on to differentiate between test takers, including where lower rates of use should not be incorrectly attributed to a lower score.
1 Introduction
In the past two decades or so, researchers have increasingly recognized the importance of multiword expressions (MWEs) for fluent and appropriate communication (see Siyanova-Chanturia & Pellicer-Sanchez 2018 for a review). MWEs are important and pervasive in both written and spoken English (e.g., Erman and Warren 2000; Sinclair 1991) and use of MWEs in speech aid processing for both the speaker and the listener (see below for a review). It is also well-documented that learners have trouble with MWEs (e.g., Martinez and Murphy 2011), especially those that are semantically opaque. Therefore, the lack of theory on the acquisition and production of MWEs is problematic.
2 Literature Review & Models Proposed
2.1 Multiword Expressions
In this paper, patterns in vocabulary use are referred to as MWEs, as this is, along with multiword units, the established term in usage-based research, including corpus research (see Wray 2002 for a list and review of other similar/related terms). In this paper, we have operationalized MWEs following the usage-based approach (e.g., Tyler et al. 2018), which posits that all of language is processed similarly in a single system of language, even longer and more complex phrases. In other words, language users follow conventionalized language, so when learners, even highly proficient ones, produce grammatically acceptable utterances that native speakers do not produce, they are marked as foreign and perceived to be less fluent (e.g., Bybee 1998; Shin and Nation 2008).
There are two main reasons why this paper aligns with the usage-based approach. First, frequency is important in MWE acquisition, as learners often overuse highly frequent MWEs while underusing less frequent ones that strongly cohere (e.g., Durrant and Schmitt 2009; Granger and Bestgen 2014). From the usage-based perspective, frequency is the feature of input that has the largest impact on language acquisition (Tyler et al. 2018). Second, this paper operationalizes MWEs as including word combinations that are not necessarily processed as a single unit (vs. MWEs like kind of, of course), as long as the meaning makes sense, and they occur more frequently together than they would be expected to by chance. These combinations, such as modal verb-verb collocations (e.g., can explain), are acceptable in the usage-based approach but may be considered compositional and not strictly MWEs in other views.
The term MWE group is used in this paper to refer to MWEs that differ in terms of the part of speech (PoS) of their constituent words, with the two broad categories of lexical MWEs made up of two lexical words (e.g., verb-noun collocations, adjective-noun collocations) and grammatical MWEs made up of one lexical and one function word (e.g., modal verb-verb collocations, noun-preposition collocations).
2.1.1 Language Processing and Language Tests
The main reason why MWEs are so important is that they aid language processing, which, in turn, means that they are related to multiple aspects of language, not just vocabulary. MWEs are thought to be processed as single chunks, which reduces memory load (e.g., Martinez and Schmitt 2012), and therefore more knowledge of MWEs correlates with faster processing speed for both L1 and L2 language users (e.g., Wray 2002). Use of MWEs not only affects general language processing speed (e.g., Pawley and Syder 1983) and the encoding and decoding of texts (e.g., Poulsen 2005), but also affects fluency in reading (e.g., Wray 2002) and in speaking (e.g., Xu 2018).
MWE use is especially crucial in speaking (and writing to a lesser extent), where the use of unexpected word combinations may throw off listeners (and readers). This is because when a listener hears (or a reader reads) the first word of a MWE, they use their vocabulary (and grammar) knowledge to start narrowing down the range of expected upcoming words. For example, when hearing the word rancid, the expected category of nouns that it modifies is some foods (e.g., butter, oil, milk), so it would be unexpected if the speaker produced paint instead. This is related to fluency and cohesion/coherence aspect of language often used in assessing speaking tasks; not only has MWE use been found to increase speaking fluency (e.g., Xu 2018), but it also covers some cohesive features (Hyland and Tse 2004).
MWE use can be linked to other aspects of language assessed in speaking tasks, with the most obvious being the vocabulary aspect, often referred to as lexical resource or vocabulary/lexical range and accuracy, where MWE use is related to precise and idiomatic vocabulary use and depth of vocabulary knowledge (e.g., Qian 2002; Read 1998). MWEs are also related to grammatical range and accuracy, as correct use of MWEs requires knowledge of the relationship between words, including the typical grammatical patterns or categories that go with them (i.e., colligation, Xiao and McEnery 2006; Scrivener 1994). MWE use can also increase overall comprehensibility and have a positive effect on multiple suprasegmental pronunciation features like pausing at logical junctions and speech rate (Isaacs et al. 2015). Note that these pronunciation features can also be considered relevant to fluency and coherence/cohesion; this is in line with Seedhouse et al.’s (2014) observation that there is a possible halo effect of accurate and appropriate use of MWEs.
2.2 Models Proposed
There is currently no theoretical basis for explaining differences in MWE density produced by speakers and writers at different proficiency levels, nor is there an existing model that explains differences in difficulty or acquisition order of different MWE groups. Contradictory findings have been pointed out by researchers, who have not yet been able to explain these differences. Therefore, we have proposed two related models after cobbling together findings from the literature on MWEs: the model of phrase density shows phrase density produced by speakers and writers across language proficiency, while the model of MWE group difficulty uses the part of speech of constituent words to predict the relative difficulty of acquisition of MWE groups.
2.2.1 Model of Phrase Density
The proposed model of phrase density assumes that level of MWE group acquisition tends to be positively correlated with language proficiency. Note that MWE group acquisition refers to the acquisition of a particular type of MWE (e.g., verb-noun collocation) in general, rather than specific/individual MWEs, which have their own properties that highly affect their ease of acquisition (e.g., the more frequent and more concrete play game is usually acquired before the less frequent and abstract engender interest), meaning that in this paper, the model reflects phrase density of particular phrase groups overall, rather than that of individual MWEs, though it is likely that acquisition of individual MWEs would follow a similar phrase density pattern.
It was hypothesized that differences in phrase density produced by speakers and writers would be roughly in an upside-down ‘U’ shape with increasing level of MWE group acquisition, tapering off steadily, as shown in Figure 1. This is because when proficiency is too low, speakers do not have enough word or MWE knowledge to produce many MWEs. As proficiency increases, there is a corresponding increase in MWE acquisition and therefore MWE use, though this is not straightforward. While some of the less proficient learners may avoid using MWEs altogether (Koya 2003), it is more common that (intermediate level) learners overuse highly frequent MWEs but underuse less frequent ones that strongly cohere where compared to an L1 baseline (e.g., Durrant and Schmitt 2009; Granger and Bestgen 2014); this tends to disappear for highly proficient learners, who start to use less common MWEs that strongly cohere (Granger and Bestgen 2014).

Hypothesized phrase density pattern by level of MWE acquisition in general. Note. Figure is not to scale.
Figure 2 shows how the hypothesized model can be applied in predicting patterns of phrase density use across proficiency levels in any particular speaking or writing test (or potentially even task[s]). The dashed lines show the range of amount of MWE group acquisition for a hypothetical test-taker group, with the left line representing the lowest level of acquisition, corresponding to lower language proficiency, and the right line representing the highest level of acquisition, corresponding to higher language proficiency. In other words, it is assumed that the lowest proficiency group of target test takers would fall on the left of the range and the highest proficiency group would fall on the right.

Hypothesized phrase density pattern across language proficiency for phrase groups of varying difficulty levels for a hypothetical test-taker group.
For a phrase group that is difficult to acquire for the target test-taker group (left-most model), phrase density increases as acquisition/proficiency levels increase; while the middle model shows an increase followed by a decrease, phrase difficulty can vary and produce slightly different patterns. For instance, a medium-difficult phrase group may largely show an increase across acquisition/proficiency levels but show smaller increases at the higher acquisition/proficiency levels (imagine the dashed lines falling from the left of the peak to the middle of the peak), or a medium-difficult phrase group may show very little variation across proficiency levels (imagine the dashed lines being much closer together and centred around the peak).
Note that if the test were to include extremely low proficiency levels, the left dashed line would sit at or very close to the left-most part of the model, no matter the difficulty level of the phrase group, as test takers with extremely low proficiency levels would not be able to produce many instances, if at all, of any phrase group. On the other hand, if the test were to only include extremely highly proficient speakers, it is expected that there would be little change across proficiency levels due to high levels of MWE group acquisition (i.e., dashed lines would fall along the rightmost part of the model), though it is highly unlikely that there would be such a test, as it is arguably not very useful to try to differentiate between those at the highest proficiency levels.
Figure 3 shows how this model can be applied to the mixed and contradictory findings from three recent longitudinal corpus-based studies on MWE production. Garner and Crossley (2018) looked at bigram frequency in spoken production (i.e., all bigrams considered together, with no specific MWE group[s] considered) of L2 learners of English from various L1 backgrounds, either enrolled in undergraduate and graduate-level courses at a US university or attending an intensive English program at the same university. They found an increase in bigram frequency over time, with this increase being larger for less proficient speakers. The dashed lines on the left model in Figure 3 thus covers a range where there is greater increase in phrase density as proficiency increases at the lower proficiency/acquisition end, and less increase at the higher proficiency/acquisition end.

Estimated MWE acquisition ranges of participants matching observed phrase density patterns from three recent longitudinal corpus-based studies of MWE production.
Siyanova-Chanturia (2015) and Siyanova-Chanturia and Spina (2020) were methodologically comparable with similar test-taker groups, so findings were placed on a single model (Figure 3, right model). Siyanova-Chanturia (2015) explored noun-adjective collocations in beginner level L1 Chinese learners of Italian and found that written phrase density was comparable across three time periods (across 21 weeks), though the average strength of association of the collocations produced increased. Siyanova-Chanturia and Spina (2020) similarly explored noun-adjective collocations in L1 Chinese learners of Italian, with participants at three different proficiency levels (CEFR level A1, A2, and B1). They found that phrase density decreased across time (six months), that phrase density was higher for A1 learners than for A2 and B1 learners, and that the phrase density decrease was stronger for A1 than for the other two proficiency groups. The dashed lines on the right model in Figure 3 therefore show a narrower range of test-taker proficiency levels for Siyanova-Chanturia’s (2015) study, and centres around the peak of the model, reflecting the finding that phrase density did not significantly change. The lines for Siyanova-Chanturia and Spina’s (2020) study are wider apart, showing that it covered a wider range of proficiency/acquisition levels, and somewhat reflects the decrease in phrase density over time, with less decrease at the higher proficiency levels. Note that the hypothesized model of phrase density is currently not to scale, and further improvements can be made to tweak its shape to better fit further research findings.
2.2.2 Model of MWE Group Difficulty
Turning back to the issue of determining the relative positions of each phrase group on the hypothesized model, there are several factors to consider. This paper focused on five variables related to the part of speech (PoS) of the words making up the MWEs:
Word class: lexical/open/form class (nouns, verbs, adjectives, adverbs) versus function/closed/structure class (pronouns, modal verbs, determiners, conjunctions, prepositions, interjection). Function words were expected to be more difficult due to them carrying less meaning, noting that light verbs (vs lexical verbs) can be argued to be closed class. It has been found that language learners acquire vocabulary more quickly if a word is more concrete (vs abstract), imagable (i.e., ease of constructing a mental image), and meaningful (i.e., level of association with other words) (e.g., Salsbury, Crossley, and McNamara 2011), which applies more to words from a lexical word class.
Simplicity of use:
Position: either having a set position (e.g., adjectives modify nouns) or having multiple possible positions (e.g., adverbs modify verbs, adjectives, other adverbs, whole sentences) such that having a set position is easier, with those that have a set position but are used optionally (e.g., the conjunction that) being in the middle. An example is that the multiple positions for adverbs in English has been pointed out as a reason for their difficulty (e.g., Rutledge and Fitton 2015).
Functions: either one function or multiple functions (e.g., modal verbs do not only indicate ability, necessity, possibility, permission, but are also related to the pragmatic principle of politeness and serve as the principal means of expressing hedging in English academic discourse), such that having one function is easier. For instance, Bensaid (2015) noted that although English learners can master declarative of modals, they find the pragmatic uses of modals very difficult, especially since these uses are underrepresented in English language textbooks.
Frequency (e.g., nouns are more frequently used than adverbs), especially as they appear in the corpus in question, where higher frequency reflects more ease. While there are several frequency effects (token vs type, absolute vs relative, e.g., Ambridge et al. 2015), the consensus is that with all else being equal, the more often a learner hears/reads a word, the more likely they are to learn it (e.g., Read 2000).
L1 comparability: same or similar use in the L1 versus quite different use or does not exist in the L1, such that more similarity between the L1 and the target language leads to easier acquisition. L1-L2 congruency (i.e., word-for-word overlap in L1-L2 form-meaning connection) of specific MWEs has been found to be a good predictor of MWE knowledge (e.g., Nguyen and Webb 2017) and non-congruency has been found to be an obstacle to learning (e.g., reviewed in Boers and Webb 2018); a similar effect of L1-L2 differences can be expected for parts of speech.
In other words, the model consisted of two functions, with larger numbers corresponding to easier acquisition: (1) Predicted word ease (for any given PoS) = word class + position + function + frequency (in the corpus of test performances) + L1 comparability, and (2) Predicted MWE group ease = predicted ease of word 1 × predicted ease of word 2. The two functions were kept as simple as possible; as there was no clear reason to weigh any of the variables differently, they were given equal weighting, with values ranging from 0 to 1. Note that the values for each variable should be mostly self-explanatory and differences in rating were made only in clearly more difficult cases. This means that maximum predicted word ease for constituent words is 5, and maximum MWE group ease is 5 × 5 = 25.
2.3 This Study
This paper reports on part of a corpus-based project to evaluate and improve the KIT Speaking Test, a localized computer-based semi-direct test of spoken English developed and administered at the Kyoto Institute of Technology in Japan (Hato et al. 2016; 2018). This test format was adopted instead of an interview test for the fair testing of all students (around 600 annually), due to logistical constraints. It has been used as an achievement test in a compulsory course that develops students’ English communication and production skills for first-year undergraduates.
The aims of this study were twofold: (a) to check whether the hypothesized model of phrase density is supported by the KIT Speaking Test corpus (KISTEC, n.d.) for Japanese learners of English, and (b) to create a model predicting the phrase density pattern of test takers across different proficiency levels using the PoS of MWE constituent words. The research question was therefore: Can the proposed models accurately predict the phrase density pattern across proficiency levels of the KIT Speaking Test?
3 Validation Methods
3.1 KIT Speaking Test
3.1.1 Impetus for the Test
In Japan, English communicative competence is lacking, as are speaking skills in general. According to the English Proficiency Survey to Improve English Education conducted by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) in 2017, 33.6 % of Japanese high school seniors were at CEFR A2 level or above in listening, 33.5 % in reading, and 19.7 % in writing, but only 12.9 % in speaking. Therefore, one of the two compulsory English courses for first-year undergraduates at the Kyoto Institute of Technology focuses on communication and output.
For language tests to be at their most effective, they should be reflective of local needs, student populations, and be connected to the language curriculum (Bernhardt, Rivera, and Kamil 2004); hence locally developed tests are likely to be more effective than existing products (Dimova, Yan, and Ginther 2020; O’Sullivan 2019). Therefore, the decision was made to develop an original university-wide speaking test to assess the communication course’s efficacy, rather than use existing commercialized tests such as TOEFL, IELTS or Versant, which were determined to be expensive, assess language skills too advanced for Japanese students, and not necessarily match the university’s curriculum and educational content. The aim was also to motivate students towards studying by providing an end goal: doing well on the KIT Speaking Test.
3.1.2 Speaking Construct
One major reason why existing commercialized tests were inappropriate for the KIT Speaking Test is that their speaking constructs were all oriented towards native speaker norms, while English education at the Kyoto Institute of Technology incorporates the concept of English as a lingua franca (ELF; Jenkins 2007; Seidlhofer 2013). Instead of adhering to these norms, various skills and knowledge are taught so that students can creatively use their linguistic resources to achieve tasks and convey opinions as an ELF user.
Consequently, the KIT Speaking Test focuses on ELF communication, with the scoring focusing on task achievement (80 %) and task delivery (i.e., fluency, 20 %) rather than on the proximity of pronunciation and grammar to native speakers. This emphasis on ELF and heavy weighting on fluency means that MWE use is especially pertinent to the KIT Speaking Test; as reviewed above, accurate and appropriate MWE use is linked to faster processing speed and increased fluency. Note that the ELF-focus of the KIT Speaking Test, with the assumption that interlocutors familiar with ELF would be more accepting of less conventional MWEs, meant that a very small number of unconventional MWEs were identified and accepted in this study, such as blue light instead of green light when it came to traffic lights.
3.1.3 KIT Speaking Test Corpus
The KIT Speaking Test corpus (KISTEC, n.d.) was created because analysis of learner corpora can be of great assistance in building on and improving language teaching and assessment methods (Granger 1998) and there are currently very few spoken English learner corpora with L1 Japanese learners. Although the corpus is still in need of minor corrections, it is now freely available on https://kitstcorpus.jp/. The KISTEC complements the NICT Japanese Learners of English (NICT JLE) Corpus (Izumi, Uchimoto, and Isahara 2004) and the International Corpus Network of Asian Learners of English (ICNALE) (Ishikawa 2023).
Corpus construction was supervised by one of the co-authors and his colleagues. The audio responses were automatically transcribed by Microsoft’s Video Inder, a speech-to-text system, manually corrected, and given 16 different tags to represent their speech features, such as fillers, repetitions, and self-corrections. A work manual was created, and the transcription process was monitored to maximize consistency. Metadata such as examinee attributes, test scores, and the test questions that elicited responses are provided.
The KISTEC contained contained data from 574 examinees, comprising a total of 290,454 words (M = 506.02 words per test taker). Approximately 97 % of the examinees were Japanese and 3 % were international students, including students from China, Korea, and Malaysia. A total of 72 % of the examinees stated that they had studied English in Japan because they had no experience of living abroad.
A full score on the KIT Speaking Test is 100; test takers recorded a mean score of 48.22 (SD = 10.45, Min = 21, Max = 90). Their mean Test of English for International Communication (TOEIC) listening and reading score was 563.54 (SD = 133.15, Min = 195, Max = 985). The correlation coefficients between the TOEIC and KIT Speaking Test were 0.59 for correlation with the total score, 0.56 with the listening score, and 0.54 with the reading score.
The KIT Speaking Test scores were well distributed, with most learners considered basic users of English, around the A1 and A2 levels of the Common European Framework of Reference for Languages (CEFR). Therefore, while the four proficiency levels referred to in this paper are labelled low (score ≤ 39), intermediate (score 40–49), high-intermediate (score 50–59) and advanced (score ≥ 60), these labels are relative.
3.2 Extraction and Selection of MWEs
MWEs were extracted from the KISTEC as bigrams, using AntConc (Version 4.2.0.), using lemmas. First, the corpus data were manually corrected. Among the tagged utterances, filler, repetition, self-correction, Japanese, and non-verbal sound were removed, along with their tags, as they could affect the collocation analysis results. Conversely, for time-out, cut-off, proper nouns, discriminatory terms, not confident in listening, completely inaudible, and laughing while speaking, the utterances were not deleted, but the tags were. The bigrams were extracted using AntConc’s n-Gram function, with parameters n-gram size (number of words) set to 2, open slots (number of slots in the n-gram that can take multiple values) to 0, minimum n-gram frequency to 1, and minimum n-gram range (number of files) to 1. Table 1 presents the number of words analysed and number of bigrams extracted.
Number of words and bigrams in the KIT speaking test corpus, by proficiency Group.
| Proficiency | Number of words | Number of bigrams |
|---|---|---|
| Low | 27,632 | 11,992 |
| Intermediate | 81,415 | 25,924 |
| High-intermediate | 74,189 | 23,980 |
| Advanced | 39,541 | 15,645 |
| All groups | 222,777 | 51,952 |
Bigrams were retained if they appeared at least five times in the full corpus and mutual information (MI) was at least three. MI is a measure of the strength of associations between words; it is traditionally a commonly used measure that is strongly affected by the frequency of individual words (i.e., low-frequency words tend to lead to high MI scores, Gablasova et al. 2017). While the similar measure log Dice has recently become the preferred measure for large corpora because it is not as affected by low frequency words and it is a standardized measure that aids in comparisons between corpora of different sizes (Gablasova et al. 2017), it is difficult to calculate. MI was used in this study due to several reasons: (a) its ease of calculation and that (b) MI is not further used for predictive purposes (where the issue with low frequency words may become an issue), (c) KISTEC is a small corpus, and (d) there is no comparison between different corpora (which necessitates a standardized measure). Both researchers went through all the bigrams to categorize them by MWE group, which served to identify accidental errors in categorization. Only bigrams with stand-alone meanings were included (vs. incomplete MWEs such as *rise apartment as an incomplete form of high-rise apartment). The MWE groups included in this paper are listed below.
3.3 Procedure
This paper reports the findings in two parts, with the first applying the proposed models of phrase density and MWE group difficulty to the KIT Speaking Test, and the second to test if the findings still hold for further MWE groups. Before that, predicted word ease was calculated for the parts of speech of constituent words, based on KISTEC frequencies and the combination of L1 Japanese and L2 English.
3.3.1 Predicted Word Ease
Table 2 shows how the five variables were summed to calculate the predicted ease score for the PoS of constituent words of the MWE groups included in this paper. To calculate predicted verb-noun collocation ease for the KIT Speaking Test, for instance, the predicted ease score of verbs (5) is multiplied by the predicted ease score of nouns (5) to give 25.
Part of speech predicted word ease values.
| Part of speech | Word class | Position | Functions | KISTEC frequency | L1 comparability | Predicted ease |
|---|---|---|---|---|---|---|
| Verb | 1 | 1 | 1 | 1 | 1 | 5 |
| Light verb | 0 | 1 | 1 | 0.5 | 1 | 3.5 |
| Noun | 1 | 1 | 1 | 1 | 1 | 5 |
| Adjective | 1 | 1 | 1 | 0.5 | 1 | 4.5 |
| Modal verb | 0 | 1 | 0 | 0 | 0 | 1 |
| Preposition | 0 | 0 | 1 | 0.5 | 1 | 2.5 |
| Adverb | 1 | 0 | 1 | 0.5 | 1 | 3.5 |
| that conjunction | 0 | 0.5 | 1 | 0 | 0 | 1.5 |
-
Word class: 0 = function, 1 = lexical. Position: 0 = multiple possible positions or optional use, 1 = set position. Functions: 0 = multiple functions, 1 = one function. Frequency: 0 = <3 % of corpus, 0.5 = 3–10 % of corpus, 1 = >10 % of corpus. L1 comparability: 0 = different use or does not exist in L1, 1 = same/similar use in L1.
Remember that the any lower rating (denoting increased difficulty) for functions and L1 comparability were made only in clearly more difficult cases, while ratings are straightforward for word class, position, and KISTEC frequency. This is why for L1 comparability, it was determined that English prepositions were not too different from Japanese postpositions (which have a wider range of uses than in English, such as subject marker が), as they are similar in that they link (pro)nouns to other words. On the other hand, the optional that conjunction was also given a rating of 0, as there is no similar word in Japanese. For instance, the equivalent of the man (that) Tom saw is トムが見た男 (see Example 1). Also note that relative clauses come before the noun (phrase) in question in Japanese, rather than after, as is the case for English.
| Example 1 | トム | が | 見た | 男 |
| Tomu | ga | mita | otoko | |
| Tom subject marker see past tense man | ||||
Similarly, only modal verbs were considered different enough in Japanese to warrant a function rating of 0. While some modal verbs are used similarly to indicate ability, necessity, possibility, and permission, they are usually encoded in the verb itself rather than as an auxiliary verb, such as the equivalent of can (in the sense of ability to do something), where 話す (dictionary form to speak) becomes 話せる when can is encoded in the verb:
| Example 2 | 話す | vs. | 話せる |
| hanas+u dictionary form | hanas+eru can | ||
| to speak | speak-can |
At other times, very different constructions are used. For example, one way of constructing the equivalent of must is a double negative conditional, such as must speak 話さなければならない (see Example 3), where the negative form of 話す (hanas+u dictionary form) is 話さない (hanas+anai negation) and negative form of なる(nar+u dictionary form; to be(come)/get) is ならない (nar+anai negation), resulting in the literal meaning ‘if not speak, not be(come)’. Of course, this construction is learned as/considered a single chunk, +なければならない (+nakerebanaranai), rather than reconstructed every time it is used. The other ways to construct must are also more complex than a simple encoding of the verb.
| Example 3 | 話さなければならない is constructed from | ||
| 話さない | + ければ | + ならない | |
| hanasanai | + kereba | + naranai | |
| speak-negative | + conditional if true, then the following will happen | + be(come)-negative | |
3.3.2 Part 1: Creation of Prediction Model
Part 1 of the study focused on the prediction model for the KIT Speaking Test, which entailed a comparison of the predicted MWE group ease scores with the observed phrase density patterns for a set of MWE groups, to determine cut scores for different patterns. Table 3 lists the MWE groups used in Part 1, with examples of each.
MWE groups included in Part 1, with examples.
| MWE group | Shortened name | Examples |
|---|---|---|
| Modal verb-verb | ModVV | can explain, should go |
| Adjective-preposition | AdjPrep | different from, good at |
| Noun-preposition | NPrep | ability to, reason for |
| Phrasal verb/verb-preposition | PhV/VPrep | look after, communicate with |
| Adverb-adjective | AdvAdj | very dangerous, more slowly |
| Verb-adverb | VAdv | travel abroad, do well |
| Compound word | CompW | amusement park, promotional video |
| Adjective noun | AdjN | beautiful scenery, traditional temples |
Note that the MWE groups selected here were the most frequent ones that also usually appear as bigrams, to ensure a more accurate representation of MWE use patterns. For instance, noun-preposition collocations (used in Part 1) rarely have words in between the node word and the collocate, while verb-noun collocations (used in Part 2) often have at least one word in between (e.g., for solve + problem, ‘solve the/some/her problem[s]’). Four were grammatical MWEs (ModVV, AdjPrep, NPrep, PhV/VPrep) and four were lexical (AdvAdj, VAdv, CompW, AdjN), to cover a range of difficulty levels, as grammatical ones tend to be more difficult.
3.3.3 Part 2: Application of Prediction Model
The second part of the study served as a check of the cut scores set in Part 1, to see if the predicted phrase density patterns reflected observed patterns for a further set of MWE groups (see Table 4). Similarly, they were balanced in terms of having two grammatical MWEs (Vthat, PrepN) and two lexical MWEs (VAdj, VN).
MWE groups included in Part 2, with examples.
| MWE group | Shortened name | Examples |
|---|---|---|
| Verb-that | Vthat | dream that, understand that |
| Preposition-noun | PrepN | at home, in trouble |
| Verb-adjective | VAdj | feel relaxed, get angry |
| Verb-noun | VN | eat lunch, face problem |
4 Results Part 1: Creation of Prediction Model
Part 1 of this study involved checking to see if the predicted MWE group relative difficulty levels matched the phrase density patterns across proficiencies observed in the KIT Speaking Test corpus. Then cut scores were set for the predicted MWE group ease scores for each phrase density pattern.
4.1 Predictions
Table 5 shows the predicted MWE group ease scores for each MWE group, and the word ease values for the constituent words used. Modal verb-verb collocations were predicted to be the most difficult, and adjective-noun collocations and compound words to easiest.
Predicted MWE group ease scores for the Part 1 MWE Groups.
| MWE group | Node word | Collocate | MWE group ease | ||
|---|---|---|---|---|---|
| PoS | Ease | PoS | Ease | ||
| ModVV | V | 5 | Vm | 1 | 5 |
| AdjPrep | Adj | 4.5 | Prep | 2.5 | 11.25 |
| NPrep | N | 5 | Prep | 2.5 | 12.5 |
| PhV.VPrep | V | 5 | Prep | 2.5 | 12.5 |
| AdvAdj | Adj | 4.5 | Adv | 3.5 | 15.75 |
| VAdv | V | 5 | Adv | 3.5 | 17.5 |
| CompW | N | 5 | N/Adja | 4.5 | 22.5 |
| AdjN | N | 5 | Adj | 4.5 | 22.5 |
-
aThe lower word ease value was used where multiple PoS were possible.
4.2 Findings
Table 6 shows the number of times MWEs from each MWE group were produced, by proficiency level. As the sub-corpora of each proficiency level differed in size, chi-square tests of proportions were run for each MWE group to check for significant differences in phrase density across all proficiency groups overall (see Table 7). Significant differences were found for modal verb-verb, adjective-preposition, adverb-adjective, verb-adverb, and adjective-noun collocations, though the effect size was only sizeable (and still considered small) for modal verb-verb collocations. No significant differences were found for noun-preposition collocations, phrasal verbs/verb-preposition collocations and compound words.
Number of instances of each Part 1 MWE group produced, by proficiency Level.
| MWE group | Proficiency Level | ||||
|---|---|---|---|---|---|
| Low | Intermediate | High-intermediate | Advanced | Total | |
| ModVV | 406 | 1,472 | 1,454 | 948 | 4,280 |
| AdjPrep | 138 | 550 | 566 | 274 | 1,528 |
| NPrep | 124 | 502 | 418 | 232 | 1,276 |
| PhV.VPrep | 650 | 1944 | 1,658 | 924 | 5,176 |
| AdvAdj | 352 | 1,120 | 1,148 | 422 | 3,042 |
| VAdv | 108 | 470 | 320 | 176 | 1,074 |
| CompW | 402 | 1,408 | 1,218 | 568 | 3,596 |
| AdjN | 670 | 1752 | 1,540 | 738 | 4,700 |
Chi-square tests of phrase density for Part 1 MWE Groups.
| MWE group | Chi-square value | p-value | Cramér’s V |
|---|---|---|---|
| ModVV | 41.15 | <0.001 | 0.06 |
| AdjPrep | 8.19 | 0.04 | 0.03 |
| NPrep | 3.33 | 0.34 | 0.02 |
| PhV.VPrep | 5.58 | 0.13 | 0.02 |
| AdvAdj | 19.31 | <0.001 | 0.04 |
| VAdv | 10.60 | 0.01 | 0.03 |
| CompW | 4.93 | 0.18 | 0.02 |
| AdjN | 20.39 | <0.001 | 0.04 |
-
df = 3, so Cramér’s V is small at 0.06, medium at 0.17 and large at 0.29.
Nonetheless, pairwise comparisons were still run (via chi-square tests of proportions) for each proficiency group combination for all MWE groups, as a rough expectation of phrase density patterns had been set a priori. Figure 4 provides a visual representation of the phrase density pattern across proficiency levels for each MWE group, by displaying significant differences in phrase density between the four test-taker proficiency groups. These patterns fit well with the predicted MWE group ease scores and the hypothesized model of phrase density.

Significant differences in phrase density (proportion of all words produced) between proficiency levels for each Part 1 MWE group. Note. Overlapping boxes indicate non-significance.
Phrase density of modal verb-verb collocations showed an increasing pattern right across the four proficiency groups, which matches the position of the dashes on left-most model in Figure 2, indicating that this was a difficult MWE group for the test takers. This was followed by adjective-preposition collocations, which a mostly increasing pattern that dipped at the highest proficiency level, indicating that this was slightly less difficult than modal verb-verb collocations. The next three MWE groups (noun-preposition collocations, verb phrases/verb-preposition collocations, adverb-adjective collocations) showed phrase density patterns that were mostly flattish or somewhat like an upside-down ‘U’, which roughly match the position of the dashes on middle model in Figure 2, denoting medium-difficulty of acquisition. The last three MWE groups (verb-adverb collocations, compound words, adjective-noun collocations) showed phrase density patterns that were moving towards a decrease in phrase density at the higher proficiency groups (i.e., moving toward the right-most model in Figure 2), meaning that they are the easier MWE groups to learn.
4.3 Cut Scores for Pattern Prediction
It is important to remember that predicted MWE group ease scores are solely used to determine relative difficulty levels, meaning that they are not intrinsically linked to any particular phrase density pattern. Therefore, cut scores need to be set for each individual test, for their particular target test-taker group. In the case of the KIT Speaking Test, the cut scores were set based on the four density patterns identified in Figure 4; they are set out in Table 8 and presented visually in Figure 5 in relation to the model of phrase density.
Cut scores for predicted MWE group ease scores for each phrase density Pattern.
| Range of MWE group ease scores | Phrase density pattern |
|---|---|
| <6 | Increasing only |
| 6 to <12 | (Mostly) increasing |
| 12 to <16 | Upside-down ‘U’ or flattish |
| ≥16 | (Mostly) decreasing |

Phrase density pattern across speaker proficiency for phrase groups of varying MWE group ease score ranges for the KIT speaking test.
5 Results Part 2: Application of Prediction Model
In Part 2, the cut scores set in Part 1 were checked to see if the expected phrase density patterns across proficiency levels would accurately predict observed patterns.
5.1 Predictions
Table 9 shows the predicted MWE group ease scores for the Part 2 MWE groups. These ease scores were compared to the cut scores set in Part 1 (see Table 8) to determine expected phrase density patterns. Verb-that phrases were expected to be the most difficult, with phrase density showing a somewhat increasing pattern across all proficiency groups. Phrase density of preposition-noun and verb-adjective collocations was predicted to expected to either remain flattish or show a somewhat upside-down ‘U’ shape. Verb-noun collocations were anticipated to be easiest and thus phrase density to tend to decrease across proficiency levels.
Predicted MWE group ease scores for the Part 2 MWE groups.
| MWE group | Node word | Collocate | MWE group ease | ||
|---|---|---|---|---|---|
| PoS | Ease | PoS | Ease | ||
| Vthat | V | 5 | that | 1.5 | 7.5 |
| PrepN | N | 5 | Prep | 2.5 | 12.5 |
| VAdj | light.V-heavya | 3.5 | Adj | 4.5 | 15.75 |
| VN | V | 5 | N | 5 | 25 |
-
aThe lower word ease value was used where multiple PoS were possible.
5.2 Findings
Table 10 shows the number of times MWEs from each MWE group were produced, by proficiency level. Chi-square tests of proportion (see Table 11) showed that preposition-noun and verb-noun collocations significantly differed in phrase density across proficiency groups overall, while verb-that phrases and verb-adjective collocations did not. Note that because the MWE groups selected for Part 2 were, overall, less frequent than those selected for Part 1, hence the small effect sizes found; differences in phrase density were also less likely to be significant.
Number of instances of each Part 2 MWE group produced, by proficiency level.
| MWE group | Proficiency level | ||||
|---|---|---|---|---|---|
| Low | Intermediate | High-intermediate | Advanced | Total | |
| Vthat | 61 | 208 | 187 | 127 | 583 |
| PrepN | 22 | 112 | 113 | 60 | 307 |
| VAdj | 18 | 61 | 74 | 37 | 190 |
| VN | 348 | 888 | 808 | 410 | 2,454 |
Chi-square tests of phrase density for Part 2 MWE groups.
| MWE group | Chi-square value | p-value | Cramér’s V |
|---|---|---|---|
| Vthat | 7.48 | 0.06 | 0.01 |
| PrepN | 9.93 | 0.02 | 0.01 |
| VAdj | 4.68 | 0.20 | <0.01 |
| VN | 8.43 | 0.04 | 0.01 |
-
df = 3, so Cramér’s V is small at 0.06, medium at 0.17 and large at 0.29.
Pairwise comparisons were run (via chi-square tests of proportions) for the four MWE groups, to check if significant differences between phrase density of the four proficiency groups would reveal expected phrase density patterns (see Figure 6). Overall, the phrase density pattern predictions reflected observed patterns quite nicely, with verb-that phrases showing a roughly increasing pattern across proficiency levels, preposition-noun and verb-adjective collocations following a flattish trend and verb-noun collocations showing a somewhat decreasing pattern. This indicates that the model predicting MWE group ease and the model of phrase density worked well together to make reasonably accurate predictions of phrase density patterns for the KIT Speaking Test. Therefore, the answer to the research question (Can the PoS-related variables of MWE constituent words be used to create a model that predicts the difficulty level of different MWE groups, which, when used in conjunction with the hypothesized model of phrase density, can accurately predict the phrase density pattern across proficiency levels of the KIT Speaking Test?) is yes.

Significant differences in phrase density (proportion of all words produced) between proficiency levels for each Part 2 MWE group. Note. Overlapping boxes indicate non-significance.
6 Discussion and Implications
As there is currently no model that explains or attempts to explain MWE phrase use differences across speaker proficiency levels, this paper has proposed a model of phrase density that accounts for the non-linear acquisition of MWEs. This paper then goes another step further to create a model predicting relative difficulty of acquisition for different MWE groups. When taken together, the models then explain differences in phrase density patterns for MWE groups with constituent words differing in their part of speech.
In theory, the model of phrase density is widely applicable across contexts. This means that it can be applied to second language acquisition research on MWEs, to explain changes in MWE use over time and across proficiency levels. It could also account for differences in research findings where there are differences in the target language, target MWE group(s), test-taker L1, and/or test-taker proficiency levels, as exemplified in Figure 3 earlier for three recent longitudinal corpus-based studies with seemingly contradictory findings. Further research is required to further refine the shape of the model (which is currently not to scale), and to check for any differences between spoken and written production of MWEs and whether the model is applicable to L1 learners. It is important to remember that when phrase density of learners fall toward the right-hand side of the model (i.e., high proficiency level), this does not automatically equal full acquisition of individual MWEs in the target language, only acquisition of the general pattern and use of the MWE group in question. Language users can still learn and produce more words and more MWEs that are less frequent but strongly cohere.
The model predicting relative difficulty of MWE groups can be adapted to any spoken test for any target language, for their target test-taker groups (accounting for test-taker L1) and target proficiency ranges, to more accurately pinpoint where test takers fall on the model of phrase density. We explored the MWE production of the Japanese L1 test takers of the KIT Speaking Test (target language English) in this paper. In Part 1 of the study, the hypothesized models were tested on eight MWE groups and cut scores set for MWE group ease scores for the KIT Speaking Test. Part 2 of the study used those cut scores to quite accurately predict phrase density patterns for four more MWE groups. Future research is needed to further refine the models for use in other contexts (e.g., other target languages, other test-taker L1s, other target proficiency ranges) and to explore the possibility of extending the model of MWE group difficulty beyond the current two-word phrases to longer MWEs, which may mean changes to the five variables or to the equations presented in this paper.
Note that task-related factors may potentially affect phrase density produced and may need to be taken into account for certain tests. The nature of the speaking tasks was not considered in this paper, as the KIT Speaking Test covers multiple tasks eliciting different language functions and it is a semi-direct computer-based test, so it was assumed that test takers did not need to alter their typical MWE use for their interlocutor. If one were to focus on specific speaking tasks, then the type of MWE groups likely to be elicited would need to be considered (e.g., requests would likely elicit more modal verb-verb collocations than would narratives). It is expected that task effects would not make much of a difference in terms of the pattern of MWE use across proficiency levels, as the difficulty of the MWE groups are not anticipated to be affected by task, though the clarity of the pattern may be affected, such that differences in phrase density of MWE groups often elicited may be larger and those for MWE groups that tend not to be elicited may be much smaller. Similarly, the interlocutor(s) or target reader(s) may influence phrase density, especially when they are expected to have very low proficiency and speakers/writers may accommodate by simplifying the language through (a) only producing words essential to the core meaning of the message, leading to a reduction in the number of MWEs composed of less necessary words (e.g., significantly fewer adjective-noun collocations vs. less/no difference in compound word use, as modifiers such as adjectives are often not strictly necessary); and (b) focusing on semantic transparency, leading to relatively more grammatically compositional word combinations (e.g., avoidance of semantically opaque phrasal verbs vs. modal verb-verb collocations).
A further note for clarity is that while the model of phrase density explains relative differences in phrase density across proficiency/acquisition level for the same MWE group in question, the model of MWE group difficulty does not similarly predict differences in phrase density across MWE groups at the same proficiency level. This is because higher MWE group difficulty does not necessarily lead to lower phrase density. There are two main reasons for this. First, assuming all else is equal, relative difficulty may or may not lead to lower phrase density, as can be seen in Figure 7, where the three models from Figure 2 have been transposed on top of each other to enable easy comparison of phrase density between MWE groups of three different difficulty levels for the same acquisition level range of a hypothetical group of test takers. In the figure, while it is expected that speakers/writers tend to produce fewer MWEs from the difficult phrase group than the medium difficult and the easy phrase groups, this is not true for those at the higher proficiency/acquisition end of the range, where it is expected that fewest MWEs are produced from the easy phrase group.

Comparisons of phrase density of a hypothetical test-taker group for different MWE group difficulty levels, assuming all else is Equal.
Second, and most importantly, it is incorrect to assume that all other variables are equal. This assumption can be seen in Figure 7 in that the shape of the model of phrase density is the same for the three MWE groups. What is more likely is that the height of the ‘peak’ differs across MWE groups, as does the speed of acquisition (i.e., faster acquisition means a more ‘squished’ model while slower acquisition means a ‘stretched’ out model), and that speed of acquisition may not be constant. This is because many variables can affect phrase density, including the effect of test task mentioned above. Notably, the five variables used to calculate MWE group difficulty have varying effects on the height of the peak of phrase density. For instance, although modal-verb verb collocations were predicted to be the most difficult, phrase density of modal-verb verb collocations was higher than that of verb-adverb collocations for all four proficiency groups. One reason for this is that despite modal verbs being less frequent than adverbs (2.0 % vs 5.7 % and therefore with frequency ratings of 0 and 0.5, respectively), all modal verbs (when used correctly) modify verbs (position rating of 1) and therefore almost always appeared as part of modal-verb verb collocations, while adverbs were not always used to modify verbs (position rating 0), leading to comparatively fewer verb-adjective collocations.
To conclude, the models proposed and tested in this paper not only contribute to theory in terms of the acquisition of MWEs (amount of spoken production and relative difficulty of acquisition), but also has practical applications in the teaching and learning of MWEs and in the rating of MWE production. For the teaching and learning of MWEs, the models can help educators identify which MWE groups are likely to be difficult for their learners, by learner L1, at roughly each proficiency group. For instance, the educator may wish to focus more class time on certain MWE groups that their learners find difficult, especially if they have L1s that are very different from the target language. Where there is a heterogenous group of students, language teachers can still target certain MWE groups when providing individual feedback, with better knowledge of the effect of learner L1s.
When applied to testing and assessment, the models can help identify MWE groups that raters should focus on when rating speaking performances and which MWE groups to pay less attention to, noting that these are likely to be different for each test. In the case of the KIT Speaking Test, findings showed that the MWE group showing the clearest differences across proficiency groups (and with the largest effect size) is the modal verb-verb collocation, where there was increasing phrase density as proficiency increased. On the other hand, findings showed that less overall use of adjective-noun collocations in fact corresponded to higher proficiency levels, so low(er) usage rates should not be incorrectly attributed to a lower proficiency level.
The next step in this project is to examine all MWEs in the KIT Speaking Test (i.e., extract word combinations where the constituent words are not necessarily adjacent to each other) rather than bigrams, to check that the models proposed in this paper still hold, and to further explore individual differences to determine the extent to which these general phrase density patterns are applicable to individuals, and thus the extent to which the findings are useful for raters. These general phrase density patterns should therefore be used by raters as guides rather than hard rules.
Award Identifier / Grant number: 19K00849, 22K00736
-
Research funding: This work was supported by Grants-in-Aid for Scientific Research, Japan Society for the Promotion of Science under grant 19K00849, 22K00736 (http://dx.doi.org/10.13039/501100001691).
References
Ambridge, B., E. Kidd, C. F. Rowland, and A. L. Theakston. 2015. “The Ubiquity of Frequency Effects in First Language Acquisition.” Journal of Child Language 42 (2): 239–73. https://doi.org/10.1017/s030500091400049x.Search in Google Scholar
Bensaid, M. 2015. “Arab ESL Learners and Modals.” Arab World English Journal 6 (4): 90–7. https://doi.org/10.2139/ssrn.2843929.Search in Google Scholar
Bernhardt, E. B., R. J. Rivera, and M. L. Kamil. 2004. “The Practicality and Efficiency of Web-Based Placement Testing for College-Level Language Programs.” Foreign Language Annals 37 (3): 356–65. https://doi.org/10.1111/j.1944-9720.2004.tb02694.x.Search in Google Scholar
Boers, F., and S. Webb. 2018. “Teaching and Learning Collocation in Adult Second and Foreign Language Learning.” Language Teaching 51 (1): 77–89. https://doi.org/10.1017/S0261444817000301.Search in Google Scholar
Bybee, J. 1998. “The Emergent Lexicon.” In Papers from the Thirty-Fourth Regional Meeting of the Chicago Linguistic Society, edited by M. C. Gruber, D. Higgins, K. S. Olson, and T. Wysocki, 421–35. Chicago: Chicago Linguistics Society.Search in Google Scholar
Dimova, S., X. Yan, and A. Ginther. 2020. Local Language Testing: Design, Implementation, and Development. London: Routledge.Search in Google Scholar
Durrant, P., and N. Schmitt. 2009. “To what Extent Do Native and Non-native Writers Make Use of Collocations?” International Review of Applied Linguistics in Language Teaching 47 (2): 157–77. https://doi.org/10.1515/iral.2009.007.Search in Google Scholar
Erman, B., and B. Warren. 2000. “The Idiom Principle and the Open Choice Principle.” Text-Interdisciplinary Journal for the Study of Discourse 20 (1): 29–62. https://doi.org/10.1515/text.1.2000.20.1.29.Search in Google Scholar
Gablasova, D., V. Brezina, and T. McEnery. 2017. “Collocations in Corpus‐based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence.” Language Learning 67 (S1): 155–79, https://doi.org/10.1111/lang.12225.Search in Google Scholar
Garner, J., and S. Crossley. 2018. “A Latent Curve Model Approach to Studying L2 N‐gram Development.” The Modern Language Journal 102 (3): 494–511. https://doi.org/10.1111/MODL.12494.Search in Google Scholar
Granger, S., ed. 1998. Learner English on Computer. London: Routledge.Search in Google Scholar
Granger, S., and Y. Bestgen. 2014. “The Use of Collocations by Intermediate vs. Advanced Non-native Writers: A Bigram-Based Study.” International Review of Applied Linguistics in Language Teaching 52 (3): 229–52. https://doi.org/10.1515/iral-2014-0011.Search in Google Scholar
Hato, Y., K. Kanzawa, H. Mitsunaga, and S. Healy. 2018. “Developing a Computer-Based Speaking Test of English as a Lingua Franca: Preliminary Results and Remaining Challenges.” In Waseda Working Papers in ELF, Vol. 7, 87–99.Search in Google Scholar
Hato, Y., K. Kanzawa, Y. Tsubota, H. Mitsunaga, and N. Underhill. 2016. “Developing Rating Scales for a CBT Speaking Test of English as a Lingua Franca.” ETAS Journal 34 (1): 32–4.Search in Google Scholar
Hyland, K., and P. Tse. 2004. “Metadiscourse in Academic Writing: A Reappraisal.” Applied Linguistics 25 (2): 156–77. https://doi.org/10.1093/applin/25.2.156.Search in Google Scholar
Isaacs, T., P. Trofimovich, G. Yu, and B. M. Chereau. 2015. “Examining the Linguistic Aspects of Speech that Most Efficiently Discriminate between Upper Levels of the Revised IELTS Pronunciation Scale.” IELTS Research Reports Online Series 48, https://ielts.org/researchers/our-research/research-reports/examining-the-linguistic-aspects-of-speech-that-most-efficiently-discriminate-between-upper-levels-of-the-revised-ielts-pronunciation-scale (accessed August 21, 2024).Search in Google Scholar
Ishikawa, S. I. 2023. The ICNALE Guide: An Introduction to a Learner Corpus Study on Asian Learners’ L2 English. London: Routledge.Search in Google Scholar
Izumi, E., K. Uchimoto, and H. Isahara. 2004. Nihonjin 1200 Nin No Eigo Supikingu Kopasu [Corpus of English Speeches by 12,000 Japanese]. Tokyo: ALC.Search in Google Scholar
Jenkins, J. 2007. English as a Lingua Franca: Attitude and Identity. Oxford: Oxford University Press.Search in Google Scholar
Koya, T. 2003. “A Study of Collocation in English and Japanese Noun-Verb Combinations.” Intercultural Communication Studies 12 (1): 125–41.Search in Google Scholar
Martinez, R., and V. A. Murphy. 2011. “Effect of Frequency and Idiomaticity on Second Language Reading Comprehension.” TESOL Quarterly 45 (2): 267–29. https://doi.org/10.5054/TQ.2011.247708.Search in Google Scholar
Martinez, R., and N. Schmitt. 2012. “A Phrasal Expressions List.” Applied Linguistics 33 (3): 299–32. https://doi.org/10.1093/applin/ams010.Search in Google Scholar
Nguyen, T. M. H., and S. Webb. 2017. “Examining Second Language Receptive Knowledge of Collocation and Factors that Affect Learning.” Language Teaching Research 21 (3): 298–320. https://doi.org/10.1177/1362168816639619.Search in Google Scholar
O’Sullivan, B. 2019. “Localisation.” In English Language Proficiency Testing in Asia: A New Paradigm Bridging Global and Local Contexts, edited by L. I. Su, C. J. Weir, and J. R. Wu, xiii–xxviii. New York: Routledge.Search in Google Scholar
Pawley, A., and F. H. Syder. 1983. “Two Puzzles for Linguistic Theory: Nativelike Selection and Nativelike Fluency.” In Language and Communication, edited by J. C. Richards, and R. W. Schmidt, 191–226. London: Longman.Search in Google Scholar
Poulsen, S. 2005. Collocations as a Language Resource: A Functional and Cognitive Study in English Phraseology (Unpublished doctoral dissertation). Odense: University of Southern Denmark.Search in Google Scholar
Qian, D. D. 2002. “Investigating the Relationship between Vocabulary Knowledge and Academic Reading Performance: An Assessment Perspective.” Language Learning 52 (3): 513–36. https://doi.org/10.1111/1467-9922.00193.Search in Google Scholar
Read, J. 1998. “Validating a Test to Measure Depth of Vocabulary Knowledge.” In Validation in Language Assessment, edited by A. Kunnan, 41–60. Mahwah, NJ: Lawrence Erlbaum.Search in Google Scholar
Read, J. 2000. Assessing Vocabulary. Cambridge: Cambridge University Press.Search in Google Scholar
Rutledge, J., and Z. Fitton. 2015. “Teaching ESL Students Adverb Position to Develop Rhetorical Emphasis.” Linguistic Portfolios 4 (1): 12.Search in Google Scholar
Salsbury, T., S. A. Crossley, and D. S. McNamara. 2011. “Psycholinguistic Word Information in Second Language Oral Discourse.” Second Language Research 27 (3): 343–60. https://doi.org/10.1177/0267658310395851.Search in Google Scholar
Scrivener, J. 1994. Learning Teaching. Oxford: Heinemann.Search in Google Scholar
Seedhouse, P., A. Harris, R. Naeb, and E. Üstünel. 2014. “The Relationship between Speaking Features and Band Descriptors: A Mixed Methods Study.” IELTS Research Reports Online Series 2014 (2).Search in Google Scholar
Seidlhofer, B. 2013. Understanding English as a Lingua Franca. Oxford: Oxford University Press.Search in Google Scholar
Shin, D., and P. Nation. 2008. “Beyond Single Words: The Most Frequent Collocations in Spoken English.” ELT Journal 62 (4): 339–48. https://doi.org/10.1093/ELT/CCM091.Search in Google Scholar
Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.Search in Google Scholar
Siyanova-Chanturia, A. 2015. “Collocation in Beginner Learner Writing: A Longitudinal Study.” System 53: 148–60. https://doi.org/10.1016/j.system.2015.07.003.Search in Google Scholar
Siyanova-Chanturia, A., and A. Pellicer-Sanchez, eds. 2018. Understanding Formulaic Language: A Second Language Acquisition Perspective. New York: Routledge.Search in Google Scholar
Siyanova‐Chanturia, A., and S. Spina. 2020. “Multi‐word Expressions in Second Language Writing: A Large‐scale Longitudinal Learner Corpus Study.” Language Learning 70 (2): 420–63. https://doi.org/10.1111/lang.12383.Search in Google Scholar
The KIT Speaking Test Corpus (KISTEC). n.d. KISTEC. https://kitstcorpus.jp/ (accessed December 18, 2023).Search in Google Scholar
The Ministry of Education, Culture, Sports, Science and Technology (MEXT). n.d. The English Proficiency Survey to Improve English Education. [in Japanese]. https://www.mext.go.jp/a_menu/kokusai/gaikokugo/__icsFiles/afieldfile/2018/04/06/1403470_03_1.pdf (accessed August 21, 2024).Search in Google Scholar
Tyler, A. E., L. Ortega, M. Uno, and H. I. Park, eds. 2018. Usage-inspired L2 Instruction: Researched Pedagogy. Amsterdam, Netherlands: John Benjamins Publishing Company.Search in Google Scholar
Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.Search in Google Scholar
Xiao, R., and T. McEnery. 2006. “Collocation, Semantic Prosody, and Near Synonymy: A Cross-Linguistic Perspective.” Applied Linguistics 27 (1): 103–29. https://doi.org/10.1093/applin/ami045.Search in Google Scholar
Xu, J. 2018. “Measuring “Spoken Collocational Competence” in Communicative Speaking Assessment.” Language Assessment Quarterly 15 (3): 255–72. https://doi.org/10.1080/15434303.2018.1482900.Search in Google Scholar
© 2025 the author(s), published by De Gruyter on behalf of Shanghai International Studies University
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Research Articles
- Research on the Semantic and Pragmatic Reasoning of Chinese Hyperbolic Numerical Idioms Based on the Bayesian Model
- Compilation and Analysis of a Parallel English-Arabic Corpus of Stand-Up Comedy Shows Subtitles
- Stance Markers in Industry Report: A Corpus-Based Study of the Deloitte Art & Finance Report
- The Reception of Auteurs: Data Mining Reviews of Wes Anderson’s Films
- Balanced or Biased? Voice Engagement in Australian News Media’s Communication of China
- Differences in Journal Paper Titles between Natural Sciences and Humanities: A Corpus-based Analysis
- Mental Distress in English Posts from r/AmITheAsshole Subreddit Community with Language Models
- Modelling Differences in Multiword Expression Use Across Proficiency Levels in a Test of Spoken English
- Book Reviews
- Di Cristofaro, M: Corpus Approaches to Language in Social Media
- L. S. Ming and C. Sin-wai: Applying Technology to Language and Translation
Articles in the same Issue
- Frontmatter
- Research Articles
- Research on the Semantic and Pragmatic Reasoning of Chinese Hyperbolic Numerical Idioms Based on the Bayesian Model
- Compilation and Analysis of a Parallel English-Arabic Corpus of Stand-Up Comedy Shows Subtitles
- Stance Markers in Industry Report: A Corpus-Based Study of the Deloitte Art & Finance Report
- The Reception of Auteurs: Data Mining Reviews of Wes Anderson’s Films
- Balanced or Biased? Voice Engagement in Australian News Media’s Communication of China
- Differences in Journal Paper Titles between Natural Sciences and Humanities: A Corpus-based Analysis
- Mental Distress in English Posts from r/AmITheAsshole Subreddit Community with Language Models
- Modelling Differences in Multiword Expression Use Across Proficiency Levels in a Test of Spoken English
- Book Reviews
- Di Cristofaro, M: Corpus Approaches to Language in Social Media
- L. S. Ming and C. Sin-wai: Applying Technology to Language and Translation