Abstract
Lexical bundles are highly frequent and functionally significant in written academic discourse. Many studies have explored lexical bundles through a disciplinary lens, but their findings are not typically incorporated into published L2 teaching-learning materials. As a result, there are a number of challenges facing teachers who want to include a focus on disciplinary lexical bundles in their academic writing instruction at tertiary level. This paper describes a method for deriving a functional list of disciplinary lexical bundles from a corpus of academic writing, and then discusses the results quantitatively and qualitatively. The findings suggest that a small number of bundles (n=47) occur across genres, levels and institutions in the academic writing of the Arts and Humanities disciplines. Finally, a series of Computer-Assisted Language Learning (CALL) resources are presented, including a tool for finding the bundles in academic texts. The functional list and resources will be of interest to those involved in the teaching-learning of academic writing in the Arts and Humanities disciplines.
1 Introduction
With the development of corpus linguistics, it has become increasingly clear that formulaic language, that is, the use of conventionalised combinations of two or more words, is pervasive in natural language use. Empirical corpus studies have demonstrated that up to 80% of language could be formulaic in nature (see Altenberg 1998; Biber et al. 1999; Erman and Warren 2000; Howarth 1998). The ubiquity of formulaic language, though, does not dovetail with traditional theories of language, in which language is decoded and composed as single words which are stitched together according to the syntactic rules of grammar (e.g., Chomsky 1965). This disjuncture between empirical corpus findings and traditional linguistic theory paved the way for new and radical theories of language. These new theories posit that language is composed of and decoded as multi-word lexical units (rather than as single words), and that grammar is the outcome of this complex lexical structure (rather than vice versa) (see Sinclair 1991; Hoey 2005). These theories reverse the traditional role of lexis and grammar, compelling linguists to elevate the importance of lexis to being at least as important as grammar.
Despite the emergence (and widespread acceptance) of these new theories, language learning was still for a long time construed as the word-by-word generation of sentences with reference to grammatical rules (Wray 2013; Meunier 2012). In fact, it was not all that long ago that practically no published L2 learning materials capitalised on the formulaic nature of language (Wood 2002). This has started to change, though, and it is now well attested that L2 learners need to be exposed to formulaic language in reading and writing for implicit learning to take place, but also that formulaic language needs to be practised extensively to facilitate explicit learning (see Cortes 2006; Jones and Haywood 2004; Ackerley 2017). To that end, a great deal of phraseology research has been dedicated to advancing the teaching-learning of formulaic language in L2 classrooms, with particular attention paid to the formulaic language of academic writing.
In its broadest sense, phraseology research covers a whole range of different word combinations, from collocations to idioms, and plays an important part in language teaching (Autelli 2021). Within this subfield of linguistic research, corpora of academic writing have been used to look at various types of formulaic language, and lists of high frequency phraseological items have been developed for pedagogical purposes. Ackermann and Chen (2013) created the Academic Collocation List containing 2,500 frequent and pedagogically useful two-word combinations that occur in writing across the academy. Simpson-Vlach and Ellis (2010) developed an Academic Formula List, which includes 207 statistically and psychologically significant three- and four-word phrases that are used in speech and writing across a broad range of academic disciplines. Miller (2020) derived a list of 545 frequent idioms from mixed disciplinary corpora of academic speech and writing. Byrd and Coxhead (2010) presented a list of 21 high frequency four-word lexical bundles occurring across academic writing. The latter, lexical bundles, is the focus of the present study.
Lexical bundles are formulaic sequences of three or more words that occur in a given register above a certain frequency threshold (Biber et al. 1999). Lexical bundles are pervasive in academic prose, with three- and four-word bundles, e.g., in order to, one of the, part of the, in the case of, on the other hand, covering around 20% of spoken and written academic discourse (Biber et al. 1999). These high-frequency lexical bundles also serve important discourse functions in academic prose. A number of studies have classified bundles using taxonomies that correspond to the metafunctions of systemic functional linguistics (Biber, Conrad and Cortes 2004; Hyland 2008; Byrd and Coxhead 2010; Durrant 2017). With research consistently confirming that lexical bundles are highly frequent and readily interpretable in functional terms, each study highlights that lexical bundles are a fundamentally important part of the student writers’ communicative repertoire.
It follows, then, that a major strand of lexical bundle studies has been examining them through a disciplinary lens. Corpus research in general tends to show that “specific disciplines […] have different ideas about what is worth communicating and how this should be done” (Hyland 2016: 22), and lexical bundle research is no exception. Studies investigating lexical bundles from a disciplinary perspective have demonstrated that bundle usage is reflective of the distinct social and rhetorical practices of different academic communities, particularly along the hard/ soft division (see Cortes 2004, 2006; Hyland 2008; Durrant 2017). This research highlights the need for a disciplinary approach to the teaching-learning of lexical bundles in academic writing. That being said, comprehensive lists of disciplinary lexical bundles that have been generated for research purposes and published in journal articles (e.g., Durrant 2017) are not widely incorporated into published classroom materials. Many materials dealing with the concerns of academic writing are, after all, designed to be as relevant and saleable to as many students as possible (Hyland 2016), meaning research that distinguishes the discoursal practices of one discipline from others can be an inconvenience to publishers. As a result, research-oriented disciplinary lists of lexical bundles (and other formulaic language, for that matter) are often not transferred to writing instruction. As Conrad (2011: 55) writes, “many teachers […] do not have the time to figure out how to incorporate new information from corpus studies into their teaching”. The aim of this paper, therefore, is not only to develop a disciplinary list of lexical bundles organised by their discourse functions, but also to offer some ready-made computer-based materials to facilitate the teaching-learning of the bundles to student writers in the Arts and Humanities (AH) disciplines.
The AH disciplines have been chosen because they are more discursive than their counterparts in the hard knowledge fields. In the ‘soft’ AH disciplines, knowledge is viewed as mediated and contested, meaning there is certain degree of internal discord (North 2005). By contrast, the ‘hard’ disciplines are characterised by internal unity and a set of shared assumptions (North 2005). These differences are, of course, epistemological, but they have a direct impact on a variety of discourse features. For example, in the soft knowledge fields, writers have to use more hedges because knowledge is contested (Hyland 2005), and more citation to establish context and persuade their readers (Hyland 1999). These differences have practical implications for AH writers throughout their academic careers. The IELTS entry requirements of British universities, for example, are typically higher for students in the AH disciplines than for those in the hard sciences. AH writers are also afforded a maximum of 80,000 words for PhD theses in comparison with 50,000 in the hard sciences. For reasons such as these, the present study focuses specifically on developing a list of lexical bundles and associated CALL resources for students in the AH disciplines.
CALL, or Computer Assisted Language Learning, refers to the application of a computer, or, more broadly, technology, to language teaching-learning. In the present study, the computer-based resources will be developed according to the principles of Data Driven Learning (DDL), a type of CALL. DDL requires students to analyse corpus data and make generalisations (Johns 1991), i.e. it is an inductive approach whereby learners are also language researchers. DDL has been found to improve students’ abilities to draw inferences, create a heightened awareness of language patterns, and facilitate the expansion of vocabulary knowledge (O’Sullivan 2007; Gilquin and Granger 2010). While some advocates of DDL encourage students to look at first-hand corpus data using corpus management software (e.g., Lee and Swales 2006; Charles 2012, 2014), others have stated that the use of professional corpus management software in the classroom can be overwhelming for both teachers and students alike (Warren 2016; Hafner and Candlin 2007). As a result, a number of user-friendly DDL tools have been successfully created and used to aid L2 writing instruction, such as Compleat Lexical Tutor (Horst, Cobb and Nicolae 2005), SKELL (Hirata and Hirata 2018) and Collocaid (Frankenberg-Garcia et al. 2019). Each of these corpus-informed CALL resources provides an excellent example of how a data-driven approach can be taken to writing instruction. However, each of the aforementioned resources comes with a caveat in that they are generic, rather than disciplinary, in nature. In contrast, the current paper will introduce disciplinary computer-based resources for the teaching-learning of lexical bundles specifically in the Arts and Humanities disciplines.
2 Corpus and methods
2.1 Data collection and corpus compilation
The corpus was structured according to the disciplinary map of the University of Warwick’s Faculty of Arts, from where texts were collected. PhD theses were chosen as the target texts for the corpus. Convincing arguments have been made elsewhere in favour of using proficient student writing, especially PhD theses, as the model of good writing practice promoted in L2 academic writing instruction (see Adel 2006; Krishnamurthy and Kosem 2007). Theses were collected from the University of Warwick’s online research repository. 90 theses (at c. 80,000 words per thesis) were collected, that is, 10 theses from each of the nine departments of the Faculty of Arts. Theses completed within the previous fifteen years (2006–2020 inclusive) were targeted for collection to mitigate against diachronic change, which has been found to affect lexical bundle usage in research writing (Hyland and Jiang 2018). The theses were cleaned, i.e. everything before the introduction and everything after the conclusion was removed (e.g., abstract, contents, bibliography, appendices, etc.), and the Sketch Engine was used to compile the c. 8.4 million-word corpus (Table 1).
The corpus, subcorpora and word counts
Department name | Subcorpus name | No. theses | Date range | No. words |
---|---|---|---|---|
Classics and Ancient History | ANCHIST | 10 | 2009–2017 | ∼ 940,346 |
English and Comparative Literary Studies | ENGCOMP | 10 | 2014–2016 | ∼ 1,047,412 |
Film and Television Studies | FILMTV | 10 | 2013–2017 | ∼ 936,806 |
History | HIST | 10 | 2015–2018 | ∼ 925,942 |
History of Art | ARTHIST | 10 | 2009–2017 | ∼ 985,436 |
Modern Languages and Culture | MODLANG | 10 | 2013–2016 | ∼ 933,851 |
Study of the Renaissance | RENAIS | 10 | 2001–2017[1] | ∼ 989,698 |
Theatre and Performance Studies | THEATPER | 10 | 2012–2016 | ∼ 712,691 |
Translation and Comparative Cultural Studies | TRANSCULT | 10 | 2006–2008 | ∼ 892,726 |
Total | – | 90 | 2001–2018 | 8,364,916 |
2.2 Quantitative analysis of the AH corpus to derive a list of lexical bundle
The n-gram function of the Sketch Engine was used to derive a list of four-word lexical bundles occurring with a minimum frequency of 10 per million words (PMW) in each of the nine subcorpora. To arrive at this definition of lexical bundles, a number of decisions have been made regarding length, frequency and range. In terms of length, most studies focus exclusively on four-word bundles because they perform a wider range of functions than three- and five-word bundles and provide a manageable data set (e.g., Hyland 2008; Cortes 2004). In terms of frequency, thresholds are “decided by the researcher based on what seems reasonable given the volume of data” (Byrd and Coxhead 2010: 32). In the present study, 10 occurrences PMW, as used in the seminal description of lexical bundles (Biber et al. 1999), seemed reasonable given the volume of data. In terms of range, many studies only require a bundle to recur across five different texts (out of hundreds) (e.g., Cortes 2004; Biber, Conrad and Cortes 2004), but the present study set a more conservative range threshold. Bundles must occur 10 times PMW in each of the nine subcorpora, i.e. each bundle must have a minimum frequency of 10 PMW in each and every individual subcorpus. This range threshold should guard against subject-specific and authorial idiosyncrasies, ensuring that the lexical items on the final list are useful to students across the AH disciplines. Applying these quantitative criteria yielded a list of 47 four-word bundles (Table 2).
Bundles with a minimum frequency of 10 PMW in each of the nine subcorpora
the end of the | in relation to the | to the fact that | the part of the |
at the end of | at the beginning of | the history of the | as one of the |
at the same time | in the form of | in the same way | in the face of |
on the other hand | as a result of | in terms of the | towards the end of |
in the case of | the way in which | it is important to | the development of the |
as well as the | the rest of the | the case of the | as part of a |
in the context of | on the part of | it is possible to | the relationship between the |
one of the most | the nature of the | on the basis of | can be seen as |
the fact that the | the role of the | at the time of | be seen as a |
the ways in which | the use of the | in the history of | at the centre of |
the beginning of the | the extent to which | it is clear that | a member of the |
on the one hand | the context of the | by the fact that |
2.3 Qualitative analysis of the functions of bundles in the AH corpus
The 47 bundles were classified qualitatively according to a functional taxonomy. Because PhD theses are oriented towards the research world (Swales 2004), Hyland’s functional taxonomy that “specifically reflect[s] the concerns of research writing” (Hyland 2008: 13) was chosen as the point of departure for the functional analysis (Figure 1). The Sketch Engine’s random sample function was used to generate samples of 100 concordances for each of the 47 lexical bundles, which were then examined through a functional lens (i.e. a total of 4,700 concordances, approximately one third of the total concordances, were examined). It has been argued elsewhere that random samples of 100 concordances are sufficient for the purposes of categorising lexical items in a taxonomy (Groom 2007: 101, cited in Whiteside 2016). Bundles were categorised according to their primary function in the random samples of 100 (i.e. the function they served in the majority of occurrences).[2] This process was qualitative, carried out by the researcher alone, and should not be considered absolute. As Durrant (2017: 185) points out in relation to his own categorisation, “this is not intended to be a definitive enumeration […] but rather a means of interpreting the present list”. Two modifications were made to Hyland’s taxonomy (Figure 1) for the purposes of the present study:
The names of the three main categories were simplified to (1) presentation of content, (2) organisation of the text, (3) expression of attitudes by the writer. This aims to facilitate use of the list in classroom settings with less-proficient language learners (Byrd and Coxhead 2010: 42).
The description sub-category was divided into physical description and abstract description. While some researchers use the sub-category intangible framing attributes to describe bundles indicating an abstract property of something (e.g., Biber et al. 2004; Durrant 2017), it was felt that abstract description was again more teaching-learning-friendly.

Hyland’s (2008) functional taxonomy modified
3 Results and discussion
3.1 The frequencies of AH lexical bundles
The full frequency data for the 47 bundles can be found in Appendix A. The most frequent bundle is the end of the (FPMW 137.1), with the top seven bundles all occurring over 60 times per million words (see Table 3). After this, the frequencies per million words diminish at a relatively stable rate until the lowest frequency bundle, a member of the (FPMW 18.9), which occurs close to 20 times PMW. In fact, 43 out of 47 bundles occur with a minimum frequency of 20 PMW in the AH corpus. Overall, the 47 bundles occur 16,159 times for a total of 64,636 words in the 8.4 million-word corpus, making up 0.8% of all the words in the corpus. A list of just 47 lexical items which occur across AH disciplines and cover close to 1% of all the words in a corpus of AH writing could be a powerful resource for L2 writing instruction in the AH.
Raw and normalised frequencies (PMW) of the top seven lexical bundles
Order (freq.) | Bundle | AH Corpus Frequency | AH Corpus FPMW |
---|---|---|---|
1 | the end of the | 1,147 | 137.1 |
2 | at the end of | 864 | 103.3 |
3 | at the same time | 835 | 99.8 |
4 | on the other hand | 718 | 85.8 |
5 | in the case of | 627 | 75.0 |
6 | as well as the | 571 | 68.3 |
7 | in the context of | 524 | 62.6 |
… | … | … | … |
47 | a member of the | 158 | 18.9 |
The total coverage of the 47 lexical bundles was also tested in a different corpus of similar texts to ascertain whether the methodology identified bundles specific to the AH disciplines and, if so, whether the bundles are relevant to the wider AH community (i.e. beyond PhD writers at University of Warwick). The British Academic Written English (BAWE) corpus was used to represent a different collection of similar texts. BAWE is a 6.5-million-word collection of 3,000 student written assignments of various genres at undergraduate and Master’s level across a range of institutions (Nesi 2008). The list of 47 bundles was searched in each of the four preloaded disciplinary subcorpora of the BAWE corpus (Arts and Humanities [AH], Social Sciences [SS], Physical Sciences [PS] and Life Sciences [LS]). As can be seen in Table 4, the frequency per million words of the 47 bundles is the highest by a substantial margin in the AH subcorpus at 2,177.7. They occur around 10% more frequently in the AH than the SS, and more than 25% more frequently in the AH than they do in the PS and LS. Furthermore, the 47 bundles occur in the BAWE AH subcorpus 4,080 times for a total of 16,320 words, making up 0.9% of all the words in the corpus. That is to say, the bundles provide a slightly higher coverage of the BAWE AH subcorpus than they do of the AH corpus from which they derive. These results augur well for the list. They confirm that the 47 bundles are specific to the AH disciplines. They also suggest that they could be pedagogically relevant across AH genres, levels and institutions (as represented by the BAWE corpus).
Frequencies of the 47 lexical bundles in the disciplinary subcorpora of BAWE
BAWE Subcorpus | Number of words in subcorpus | Raw frequency of AH lexical bundles | FPMW |
---|---|---|---|
Arts and Humanities (AH) | 1,873,541 | 4,080 | 2,177.7 |
Social Sciences (SS) | 2,239,457 | 4,421 | 1,974.1 |
Physical Sciences (PS) | 1,376,045 | 2,034 | 1,478.1 |
Life Sciences (LS) | 1,479,046 | 2,177 | 1,471.9 |
3.2 The functions of AH lexical bundles
3.2.1 Bundles to help writers present and discuss content
The 47 lexical bundles each served a clear primary function in the random samples of 100 extracted from the corpus. As can be seen in Table 5, the most frequent bundles in the corpus are the 20 bundles serving the primary function of presenting and discussing content (with a raw frequency 7,796). These ideational bundles may be used a lot because PhD theses are a pedagogic genre in which the writer is expected to present and discuss ideas in a manner which demonstrates knowledge. After all, “the reception of the text will determine whether the writer is to be judged worthy of the award of a doctorate” (Thompson 2009: 51).
Bundles with the primary function of presenting and discussing content
No. of bundles | Raw freq. | |
---|---|---|
Bundles to help writers present and discuss content | 20 | 7,796 |
Location | the beginning of the, the end of the, at the time of, at the beginning of, at the same time, at the end of, towards the end of | |
Procedure | the development of the, the role of the, the ways in which, the way in which, the use of the | |
Quantification | the extent to which, a member of the, one of the most, the rest of the, as one of the, as part of the | |
Abstract description | the history of the, the nature of the |
There are seven location bundles, each of which is used almost exclusively for locating events temporally (1). At the same time is used temporally, but it can be used to indicate either that two events occurred simultaneously (2) or that two processes are performed simultaneously (3). Durrant (2017: 185) too found that in the soft knowledge fields, location bundles are used predominantly in a temporal context. This reflects the nature of the soft AH disciplines, which deal with abstract constructs like time (in contrast with the hard sciences which deal with more physical constructs).
However, at the time of his arrival in Moscow Maximus did not know Church Slavonic and initially had to work in cooperation with his Russian associates. (RENAIS)
In order to prove this notion, Pico refers to an example of two people born on the same day, at the same time and under the same position of celestial spheres whose lives cannot be identical due to the individual causes. (RENAIS)
The electrician begins to fiddle with some of the switches, at the same time turning one of the wheels very slowly. (THEATPER)
While most of the location bundles in the AH corpus are temporal, there are also examples of location bundles referring to external textual locations (4), reflecting the intertextual nature of AH research writing.
Mrs. Carruthers states in the beginning of the film that she hates India but that she is willing to devote her life to Carruthers and therefore live where he is needed. (FILMTV)
The next largest sub-type is quantification bundles, of which there are six. Three of these bundles were used in an abstract way, i.e. not allowing (or requiring) precise measurement or counting (5, 6, 7). Quantification of this type reflects the abstract subject matter of the soft sciences, and is less likely to be found in the hard sciences in which there is greater emphasis on the accuracy and replicability of procedures and results. The other three quantification bundles are more precise, referring to one countable element of an entity, but still in most cases these refer to an intangible entity, e.g., needlework (8) or Japan’s economy (9), further reinforcing the focus on abstract constructs that typifies the soft knowledge fields.
We defined this as the extent to which the performers were ‘immersed in the general gravity’ of the archive. (THEATPER)
In Taiwan, Suzuki spent the rest of the war as a sub-lieutenant at an isolated weather observation outpost. (FILMTV)
The imagery of the novel contains symbols such as the honeycomb and bees, which could be understood both as part of the Jewish-Christian cultural baggage and in the context of a pagan system of thought.
However, perhaps a consideration of one of the most popular female pastimes of the period – needlework – can shed some light on this curious correlation. (ARTHIST)
My texts suggest that there is a disjuncture between Japan’s possession of economic capital as one of the world’s largest and most advanced economies, and its perceived lack of the cultural capital that it assumes the British have. (ENGCOMP)
Next are the five procedural bundles for indicating how or why something is done or what something is for (10, 11). A particularly interesting example of a procedural bundle is the use of the. Durrant (2017) found many bundles containing use in the hard disciplines, and these typically referred to the use of interchangeable objects as instruments. However, in the AH corpus, there is only one bundle containing use (the use of the) and in the overwhelming majority of instances (88/ 100) it is used to refer to an abstract concept (12), with more than half of those instances (45/88) referring to an abstract linguistic concept, such as metaphor, phrase, term, word, verb, adjective, participle, plural, prefix, slash or language among others (13). Unlike objects with a physical or concrete existence (e.g., potentiometer or magnet), these are abstract concepts used to discuss features of language. This again demonstrates the focus on abstract constructs in AH discourse.
I believe more and more that the role of the cinema is to destroy myths, to be pessimistic. (FILMTV)
This chapter studies the different types of micro-/macrocosm models that are found in texts throughout this period and the ways in which they were put forward. (ANCHIST)
This alignment with Mike’s subjectivity is then intensified by the use of the dream device. (FILMTV)
The use of the term to characterise free-born persons as delicate, as opposed to slaves who are inured to hardship, is found in poetry. (ANCHIST)
The final ideational sub-type is abstract description, which includes two bundles (14, 15). Note that there are no bundles serving the primary function of describing the physical properties of something. The fact that there are two bundles serving the primary function of abstract description and no bundles serving the primary function of physical description again elucidates the focus on abstract constructs in the soft AH disciplines.
The nature of the inscription as well as its physical form differs, thus, greatly from the inscriptions to Asclepius Zimidrenus in Thrace. (ANCHIST)
I place the fresco in the early years of the sixteenth century estimating that these Sibyls are late in the history of the genre on a basis of textual analysis. (RENAIS)
3.2.2 Bundles to help writers organise their text
There are 21 bundles serving the primary function of helping writers to organise their text (Table 6). The large number of textual bundles perhaps reflects the more discursive nature of the soft knowledge fields. Academic writing in the AH disciplines is, after all, a kind of rhetorical performance (North 2005) which requires clear and well organised arguments.
Bundles with the primary function of organising the text
No. of bundles | Raw freq. | |
---|---|---|
Bundles to help writers organise their text | 21 | 7,209 |
Transition signals | in the same way, the relationship between the, on the other hand, on the one hand, as well as the | |
Resultative signals | as a result of | |
Framing signals | in relation to the, in the face of, the fact that the, in the history of, by the fact that, to the fact that, on the basis of, the context of the, the part of the, in the case of, in the context of, on the part of, in terms of the, the case of the, in the form of |
As shown in Table 6, the largest sub-type of bundles to help writers organise their text is framing signals, of which there are 15. These are used to provide a specific instance of something (16), specify the context in which a statement applies (17), or set conditions used to interpret or explain preceding/ forthcoming text (18). The high number of framing signals might be explained by the epistemology of the soft AH disciplines. Knowledge in these fields is mediated and contested, meaning writers have to work harder to persuade their reader (North 2005). In other words, they have to pay greater attention to contextualising their research and drawing careful connections, and they can do this using framing signals.
Epigraphic evidence exists for the Basileia during the fourth century BC in the form of two inscriptions from Boiotia. (ANCHIST)
In the context of live performance, Rebecca Schneider has most prominently argued for an understanding of performance as an act of remaining and a means of re-appearance and re-participation. (THEATPER)
Evidently, in auto/biographical writing myths of self merge with myths of historical and cultural beginnings in relation to the matrix of peoples and histories that constitute the whole of the community or nation. (TRANSCULT)
The five transition signals are the second largest sub-type of text-organising bundles. They are used to establish additive (19), contrastive (20), or comparative (21) links between elements.
As well as the uncertainty about the exact dating of Astley’s ownership, there is some debate over the chronology and succession, whether or not it concerns the exact same premises, and, where precisely the shop was located. (RENAIS)
Disease comes about on the one hand through an excess of heat or cold; on the other hand through surfeit or lack of nutriment; its location is the blood, marrow, or brain. (ANCHIST)
In the same way that contemporary pornography depends upon mass production, the commercial exchange of erotica on vases was driven by the market. (ANCHIST)
There is only one bundle, as a result of, in the resultative signals sub-category. This is not particularly surprising since both Hyland (2008) and Durrant (2017) found that cause and effect bundles were not typical of writing in the soft sciences. In the AH corpus, as a result of typically marks the effect/s caused by a social, political and/or historical event (22).
These leanings were strengthened by a mission to ‘rechristianize the poorer classes’ (possibly as a result of the divorce of Church and state in 1905). (MODLANG)
Referring back to the functional taxonomy (Figure 1), there are no topic bundles, nor are there any bundles serving the primary function of structuring signal. The topic bundles were likely filtered out by the conservative range threshold, but the absence of structuring signals is somewhat unexpected in a corpus comprising whole PhD theses.[3] In these extended texts it is “all the more important for the writer to continue to orient the reader throughout the thesis as to how the current subject matter relates to the overall thesis” (Bunton 1999: 41), and one way to do this is using structuring signals. Perhaps this function is served in the corpus by 3-word bundles that do not fit the criteria of this study, e.g., as discussed above, as noted earlier, as already mentioned, inter alia.
3.2.3 Bundles to help writers express their attitudes
The smallest category is bundles to help writers express their attitudes (Table 7). Although there are only six of these interpersonal bundles, they will be of particular importance to AH writers who have to explicitly evaluate opposing arguments while actively getting behind one of them.
Bundles with the primary function of expressing the attitudes of the writer
No. of bundles | Raw freq. | |
---|---|---|
Bundles to help writers to express attitudes | 6 | 1,154 |
Stance features | can be seen as, be seen as a, at the centre of, it is possible to, it is clear that | |
Engagement features | it is important to |
Five of the interpersonal bundles in the AH corpus are stance features, which have been found to be a distinctive feature of the soft knowledge fields (Hyland 2008; Durrant 2017). The five stance features in the AH corpus all perform the similar but slightly nuanced functions of conveying a degree of certainty, possibility or importance. There are three for conveying a degree of certainty, which can either hedge (23), boost (24) or both (depending on the preceding modal verb) (25, 26). In 86 out of 100 instances, be seen as a is preceded by a modal verb within a span of -3 (e.g., can, cannot, could, would, should, may, might, must). Then, there is one stance feature which indicates a degree of possibility (27) and another that indicates the importance of something (28). At the centre of is a particularly interesting stance feature because in Durrant’s (2017) Science and Technology corpus its primary function was as a physical location marker, which clearly exemplifies the physical/ abstract contrast between the hard/ soft disciplines.
For many, this can be seen as a clear advantage of the genre… (ENGCOMP)
It is clear that this particular inscription was deliberately constructed as an effective medium through which to display the character and achievements of the individual concerned. (ANCHIST)
The fusion of micro dialogue with visual satire, arguably, could be seen as a step towards the satirical prints of William Hogarth, George Townshend and the political cartoons that would develop in the eighteenth century. (HIST)
The image cannot be viewed as independent, but must be seen as a component of the vase in its entirety. (ANCHIST)
However, it is possible to see how the war may have increased the risk of IPV developing by providing other conducive psychological factors. (HIST)
These depictions place the loutrophoros at the centre of bridal preparations and more significantly, at the centre of socially determined feminine beauty. (ANCHIST)
Just one of the six bundles for the expression of writer attitudes, it is important to, is an engagement feature. It explicitly marks the presence of the reader and “acknowledges the dialogic dimension of research writing, intervening to direct the reader to some action or understanding” (Hyland 2008: 19). This is typically done using a mental verb (29, 30, 31). It is perhaps surprising that only one engagement bundle was identified, since in the AH disciplines where knowledge is contested it is all the more important for the writer to engage with the reader. It could be possible that there is some intradisciplinary variation in terms of engagement bundles within the AH disciplines, with it is important to being the only bundle used universally and therefore meeting the range threshold.
Although the Acropolis was the religious focal point of the city, it is important to appreciate that every part of the city had a religious association. (ANCHIST)
At this point, it is important to consider what an ordinary traveller (not a scientist) says about his experience regarding Balinese words or the local language - Malay. (TRANSCULT)
It is important to note that Marx saw class not simply as a description, but as a tool by which he could analyse the development of the capitalist system. (THEATPER)
4 Pedagogical implications and applications
With reference to the quantitative and qualitative findings discussed above, this section presents a series of ready-to-use computer-based resources to ensure that this research can be directly applied to the classroom. The first of these resources is the Bundle Finder (O’Flynn 2019) (Figure 2), a powerful but user-friendly and free-to-use tool developed using H5P. This tool can search any text for all 47 lexical bundles and return their discourse functions, enabling both teachers and learners to find examples of the bundles in context. Teachers can use the Bundle Finder to develop classroom activities and materials and set learning goals, while students can use it for independent learning activities. The Bundle Finder could, for example, be used for rhetorical consciousness-raising activities, in which students identify where a writer, be it themselves or a professional, has chosen to use a certain bundle (e.g., a stance feature) and then provide possible explanations for this rhetorical choice (Hyland 2016: 24). In many cases lists of formulaic language items are presented as static and decontextualised lists, but it is hoped that by enabling students to see the bundles functioning in the texts they read and write, they will consider the bundles a valid use of their learning effort. This is important because one of the pedagogical challenges of working with lexical bundles is a lack of face validity for some students (Byrd and Coxhead 2010). It is, however, important that both teachers and learners know how the list of bundles has been developed (Byrd and Coxhead 2010), which is why the Bundle Finder also includes information, in layman’s terms, regarding the corpus from which the list was derived and the criteria by which the bundles were selected for the list.

Arts and Humanities Bundle Finder (URL: https://h5p.org/node/438388)
The second set of materials exemplify how the findings of this study could be realised in corpus-based classroom materials. The sample materials presented below (Figure 3) were developed in H5P using the BAWE AH subcorpus and aim to demonstrate how some of the bundles could be used in DDL classroom activities. These sample materials require students to analyse the concordances not only in respect to the bundle itself, but also in respect to the dense academic language to the left and right to draw inferences about function and use in context. These materials, inspired by Johns’ (2000) Kibbitzers, do not require students to analyse a corpus directly but rather to analyse a selection of corpus data presented to them by the teacher. Materials such as these will be well suited to the majority of student writers who do not have prior knowledge of corpus management software, and could be used to introduce students to key DDL concepts, e.g., concordances. The use of corpora and concordances in the classroom by students has been increasingly investigated since it was pioneered by Johns (1991) as Data-Driven Learning (DDL). Charles (2012) found students in the Arts and Humanities disciplines to be particularly receptive to DDL, and several researchers have used corpus-based instruction to improve students’ use of formulaic language in their academic writing (see Thurstun and Candlin 1998; Wu, Witten and Franken 2010; Charles 2012; Ackerley 2017). In fact, Byrd and Coxhead (2010) specifically recommend the use of corpora in the classroom to teach lexical bundles.

DDL sample materials (URL: https://h5p.org/node/727097)
The third resource is a set of Quizlet flashcards, which can be used in a variety of games, exercises and tests (Figure 4). Quizlet and other game-based apps have been found to have a range of advantages for vocabulary learning, including a positive effect on acquisition, motivation, and autonomy (see Chien 2015; Anjaniputra and Salsabila 2018; Chen, Liu and Huang 2019; Bueno-Alastuey and Nemeth 2020). The Quizlet sets can be introduced by teachers on short pre-sessional courses and then used independently by students on a long-term basis. This is important because it can be particularly challenging for teachers to help learners improve their use of bundles during short courses of instruction (Byrd and Coxhead 2010). In fact, improving the active use of lexical bundles by learners is likely to be a much longer-term project (Cortes 2006). It is, therefore, important that learners continue their independent study of lexical bundles beyond their short courses of writing instruction and into their full courses of study in AH departments, which is something Quizlet can facilitate.

Quizlet sets (URL: https://bit.ly/3NWEwkj)
The final resource for the teaching-learning of the lexical bundles is the list itself. A printer-friendly version of the full list organised by discourse functions is included in Appendix 2. Although this list is static and presents the bundles as decontextualised items, it can be a useful resource if used appropriately. The functional list will be useful as a quick reference guide for teachers to keep a record of the bundles that have been subject to attention in classroom-based teaching-learning activities, or for students to monitor their independent learning activities. Keeping accurate records is important because the bundles will need to be revisited regularly (Byrd and Coxhead 2010). It is also important that the explicit teaching of the bundles on the list is mixed with ample opportunities for implicit learning (Hultsjin 2001), which means “ensur[ing] that students are reading academic prose so that they encounter academic vocabulary” (Byrd and Coxhead 2010: 56). Empirical studies suggest that through implicit learning and explicit instruction, L2 learners can start to develop better strategies for producing lexical bundles more frequently and appropriately in their academic writing (see Cortes 2006; Jones and Haywood 2004; Ackerley 2017).
5 Limitations
It is important to acknowledge some potential limitations of the study. One potential limitation is that the corpus includes only one genre of text produced by university students at one level and one institution. Although it would have been desirable to compile a corpus of texts produced by university students at various levels across institutions, empirical evidence from the BAWE corpus presented in this paper suggests that the list has scope beyond just the genre, level and institution from which it was derived. Another potential limitation is that the qualitative analysis was based on random samples of 100 rather than on every instance of the bundles in the corpus. As such, the functional categorisation, like others, should be treated with some level of caution and seen as a guide to facilitate teaching-learning, rather than as a definitive enumeration.
6 Conclusion
This paper has developed, discussed and presented a functional list of lexical bundles and associated resources for the teaching-learning of lexical bundles in the academic writing of the AH disciplines. At every stage of the development of the list and resources, methodological decisions were guided by pedagogical considerations. The final product is a short and powerful list of lexical bundles. The 47 bundles alone cover up to an impressive 0.9% of academic writing across genres, levels and institutions in the AH disciplines. The ready-made computer-based resources facilitate the immediate incorporation of the list into academic writing programs. It is hoped that by adding further evidence to demonstrate that bundle usage is reflective of the distinct social and rhetorical practices of different academic communities, this paper will spur more researchers and teachers to develop disciplinary lists of lexical bundles (or, for that matter, other types of formulaic language), and to incorporate their findings into their curriculums. Further studies could investigate the effects of incorporating a list of disciplinary lexical bundles into the curriculum on the accurate and appropriate production of lexical bundles in student writing.
Acknowledgements
I would like to thank Dr. Sue Wharton for her continued support and guidance throughout this research project (and various other research projects).
Appendix A
Arts and Humanities lexical bundles by frequency
Bundle | ANCHIST | ARTSHIST | ENGCOMP | FILMTV | HIST | MODLANG | RENAIS | THEAT | TRANSCULT | AH CORPUS (total) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | Freq | PMW | |
the end of the | 175 | 186.1 | 143 | 136.5 | 127 | 135.6 | 116 | 125.3 | 124 | 125.8 | 132 | 141.4 | 159 | 160.7 | 64 | 89.8 | 107 | 119.9 | 1147 | 137.1 |
at the end of | 121 | 128.7 | 95 | 90.7 | 88 | 93.9 | 91 | 98.3 | 65 | 66.0 | 126 | 134.9 | 146 | 147.5 | 57 | 80.0 | 75 | 84.0 | 864 | 103.3 |
at the same time | 94 | 100.0 | 79 | 75.4 | 117 | 124.9 | 83 | 89.6 | 65 | 66.0 | 128 | 137.1 | 90 | 90.9 | 41 | 57.5 | 138 | 154.6 | 835 | 99.8 |
on the other hand | 121 | 128.7 | 81 | 77.3 | 50 | 53.4 | 57 | 61.6 | 29 | 29.4 | 111 | 118.9 | 125 | 126.3 | 35 | 49.1 | 109 | 122.1 | 718 | 85.8 |
in the case of | 98 | 104.2 | 36 | 34.4 | 70 | 74.7 | 48 | 51.8 | 79 | 80.2 | 84 | 90.0 | 63 | 63.7 | 48 | 67.4 | 101 | 113.1 | 627 | 75.0 |
as well as the | 92 | 97.8 | 81 | 77.3 | 50 | 53.4 | 55 | 59.4 | 59 | 59.9 | 63 | 67.5 | 67 | 67.7 | 44 | 61.7 | 60 | 67.2 | 571 | 68.3 |
in the context of | 77 | 81.9 | 76 | 72.6 | 54 | 57.6 | 108 | 116.6 | 39 | 39.6 | 33 | 35.3 | 51 | 51.5 | 46 | 64.5 | 40 | 44.8 | 524 | 62.6 |
one of the most | 48 | 51.0 | 63 | 60.1 | 34 | 36.3 | 37 | 40.0 | 31 | 31.5 | 68 | 72.8 | 79 | 79.8 | 72 | 101.0 | 54 | 60.5 | 486 | 58.1 |
the fact that the | 75 | 79.8 | 52 | 49.6 | 51 | 54.4 | 51 | 55.1 | 51 | 51.8 | 39 | 41.8 | 50 | 50.5 | 26 | 36.5 | 75 | 84.0 | 470 | 56.2 |
the ways in which | 28 | 29.8 | 42 | 40.1 | 57 | 60.8 | 102 | 110.2 | 35 | 35.5 | 48 | 51.4 | 46 | 46.5 | 55 | 77.2 | 32 | 35.8 | 445 | 53.2 |
the beginning of the | 70 | 74.4 | 55 | 52.5 | 38 | 40.6 | 32 | 34.6 | 27 | 27.4 | 70 | 75.0 | 82 | 82.9 | 25 | 35.1 | 38 | 42.6 | 437 | 52.2 |
on the one hand | 41 | 43.6 | 55 | 52.5 | 43 | 45.9 | 41 | 44.3 | 30 | 30.4 | 83 | 88.9 | 65 | 65.7 | 26 | 36.5 | 44 | 49.3 | 428 | 51.2 |
in relation to the | 43 | 45.7 | 49 | 46.8 | 46 | 49.1 | 92 | 99.4 | 34 | 34.5 | 37 | 39.6 | 23 | 23.2 | 59 | 82.8 | 26 | 29.1 | 409 | 48.9 |
at the beginning of | 61 | 64.9 | 46 | 43.9 | 28 | 29.9 | 34 | 36.7 | 22 | 22.3 | 79 | 84.6 | 84 | 84.9 | 24 | 33.7 | 29 | 32.5 | 407 | 48.7 |
in the form of | 46 | 48.9 | 46 | 43.9 | 49 | 52.3 | 41 | 44.3 | 59 | 59.9 | 39 | 41.8 | 28 | 28.3 | 37 | 51.9 | 42 | 47.0 | 387 | 46.3 |
as a result of | 112 | 119.1 | 13 | 12.4 | 31 | 33.1 | 18 | 19.4 | 53 | 53.8 | 35 | 37.5 | 20 | 20.2 | 46 | 64.5 | 42 | 47.0 | 370 | 44.2 |
the way in which | 33 | 35.1 | 37 | 35.3 | 57 | 60.8 | 34 | 36.7 | 59 | 59.9 | 48 | 51.4 | 23 | 23.2 | 33 | 46.3 | 19 | 21.3 | 343 | 41.0 |
the rest of the | 71 | 75.5 | 42 | 40.1 | 34 | 36.3 | 47 | 50.8 | 15 | 15.2 | 25 | 26.8 | 60 | 60.6 | 17 | 23.9 | 30 | 33.6 | 341 | 40.8 |
on the part of | 40 | 42.5 | 26 | 24.8 | 21 | 22.4 | 31 | 33.5 | 62 | 62.9 | 65 | 69.6 | 21 | 21.2 | 21 | 29.5 | 23 | 25.8 | 310 | 37.1 |
the nature of the | 144 | 153.1 | 25 | 23.9 | 11 | 11.7 | 13 | 14.0 | 26 | 26.4 | 13 | 13.9 | 27 | 27.3 | 23 | 32.3 | 28 | 31.4 | 310 | 37.1 |
the role of the | 53 | 56.4 | 18 | 17.2 | 30 | 32.0 | 30 | 32.4 | 74 | 75.1 | 24 | 25.7 | 17 | 17.2 | 12 | 16.8 | 25 | 28.0 | 283 | 33.8 |
the use of the | 65 | 69.1 | 28 | 26.7 | 14 | 14.9 | 47 | 50.8 | 22 | 22.3 | 24 | 25.7 | 20 | 20.2 | 17 | 23.9 | 44 | 49.3 | 281 | 33.6 |
the extent to which | 17 | 18.1 | 25 | 23.9 | 34 | 36.3 | 23 | 24.8 | 61 | 61.9 | 43 | 46.0 | 15 | 15.2 | 19 | 26.7 | 25 | 28.0 | 262 | 31.3 |
the context of the | 52 | 55.3 | 35 | 33.4 | 21 | 22.4 | 47 | 50.8 | 17 | 17.3 | 16 | 17.1 | 31 | 31.3 | 16 | 22.5 | 18 | 20.2 | 253 | 30.2 |
to the fact that | 59 | 62.7 | 23 | 22.0 | 27 | 28.8 | 27 | 29.2 | 42 | 42.6 | 18 | 19.3 | 14 | 14.1 | 15 | 21.0 | 27 | 30.2 | 252 | 30.1 |
the history of the | 28 | 29.8 | 23 | 22.0 | 21 | 22.4 | 22 | 23.8 | 62 | 62.9 | 14 | 15.0 | 34 | 34.4 | 8 | 11.2 | 37 | 41.4 | 249 | 29.8 |
in the same way | 46 | 48.9 | 17 | 16.2 | 44 | 47.0 | 20 | 21.6 | 17 | 17.3 | 36 | 38.6 | 14 | 14.1 | 18 | 25.3 | 24 | 26.9 | 236 | 28.2 |
in terms of the | 17 | 18.1 | 24 | 22.9 | 21 | 22.4 | 31 | 33.5 | 24 | 24.4 | 52 | 55.7 | 18 | 18.2 | 32 | 44.9 | 16 | 17.9 | 235 | 28.1 |
it is important to | 28 | 29.8 | 13 | 12.4 | 32 | 34.2 | 40 | 43.2 | 30 | 30.4 | 26 | 27.8 | 20 | 20.2 | 25 | 35.1 | 20 | 22.4 | 234 | 28.0 |
the case of the | 26 | 27.6 | 19 | 18.1 | 26 | 27.8 | 17 | 18.4 | 52 | 52.8 | 25 | 26.8 | 31 | 31.3 | 9 | 12.6 | 28 | 31.4 | 233 | 27.9 |
it is possible to | 33 | 35.1 | 20 | 19.1 | 19 | 20.3 | 17 | 18.4 | 27 | 27.4 | 28 | 30.0 | 24 | 24.2 | 31 | 43.5 | 23 | 25.8 | 222 | 26.5 |
on the basis of | 38 | 40.4 | 23 | 22.0 | 13 | 13.9 | 16 | 17.3 | 21 | 21.3 | 29 | 31.1 | 27 | 27.3 | 19 | 26.7 | 28 | 31.4 | 214 | 25.6 |
at the time of | 35 | 37.2 | 18 | 17.2 | 18 | 19.2 | 22 | 23.8 | 15 | 15.2 | 31 | 33.2 | 45 | 45.5 | 19 | 26.7 | 10 | 11.2 | 213 | 25.5 |
in the history of | 11 | 11.7 | 34 | 32.5 | 15 | 16.0 | 20 | 21.6 | 51 | 51.8 | 10 | 10.7 | 35 | 35.4 | 10 | 14.0 | 27 | 30.2 | 213 | 25.5 |
it is clear that | 28 | 29.8 | 18 | 17.2 | 29 | 31.0 | 14 | 15.1 | 33 | 33.5 | 32 | 34.3 | 22 | 22.2 | 15 | 21.0 | 19 | 21.3 | 210 | 25.1 |
by the fact that | 20 | 21.3 | 30 | 28.6 | 18 | 19.2 | 25 | 27.0 | 20 | 20.3 | 25 | 26.8 | 21 | 21.2 | 23 | 32.3 | 21 | 23.5 | 203 | 24.3 |
the part of the | 19 | 20.2 | 21 | 20.0 | 13 | 13.9 | 19 | 20.5 | 33 | 33.5 | 36 | 38.6 | 22 | 22.2 | 20 | 28.1 | 16 | 17.9 | 199 | 23.8 |
as one of the | 22 | 23.4 | 22 | 21.0 | 18 | 19.2 | 20 | 21.6 | 20 | 20.3 | 19 | 20.3 | 27 | 27.3 | 22 | 30.9 | 26 | 29.1 | 196 | 23.4 |
in the face of | 13 | 13.8 | 15 | 14.3 | 27 | 28.8 | 23 | 24.8 | 24 | 24.4 | 46 | 49.3 | 15 | 15.2 | 8 | 11.2 | 15 | 16.8 | 186 | 22.2 |
towards the end of | 34 | 36.2 | 33 | 31.5 | 20 | 21.3 | 18 | 19.4 | 20 | 20.3 | 19 | 20.3 | 21 | 21.2 | 11 | 15.4 | 9 | 10.1 | 185 | 22.1 |
the development of the | 19 | 20.2 | 33 | 31.5 | 10 | 10.7 | 13 | 14.0 | 29 | 29.4 | 19 | 20.3 | 25 | 25.3 | 20 | 28.1 | 13 | 14.6 | 181 | 21.6 |
as part of a | 20 | 21.3 | 19 | 18.1 | 11 | 11.7 | 21 | 22.7 | 34 | 34.5 | 16 | 17.1 | 14 | 14.1 | 28 | 39.3 | 10 | 11.2 | 173 | 20.7 |
the relationship between the | 17 | 18.1 | 16 | 15.3 | 21 | 22.4 | 16 | 17.3 | 40 | 40.6 | 17 | 18.2 | 14 | 14.1 | 10 | 14.0 | 20 | 22.4 | 171 | 20.4 |
can be seen as | 12 | 12.8 | 12 | 11.5 | 13 | 13.9 | 10 | 10.8 | 10 | 10.1 | 22 | 23.6 | 13 | 13.1 | 41 | 57.5 | 33 | 37.0 | 166 | 19.8 |
be seen as a | 20 | 21.3 | 14 | 13.4 | 16 | 17.1 | 14 | 15.1 | 15 | 15.2 | 17 | 18.2 | 10 | 10.1 | 36 | 50.5 | 21 | 23.5 | 163 | 19.5 |
at the centre of | 15 | 16.0 | 18 | 17.2 | 12 | 12.8 | 19 | 20.5 | 15 | 15.2 | 30 | 32.1 | 25 | 25.3 | 15 | 21.0 | 10 | 11.2 | 159 | 19.0 |
a member of the | 12 | 12.8 | 30 | 28.6 | 12 | 12.8 | 13 | 14.0 | 14 | 14.2 | 10 | 10.7 | 28 | 28.3 | 13 | 18.2 | 26 | 29.1 | 158 | 18.9 |
Total | 2349 | 2498.0 | 1743 | 1664.1 | 1611 | 1719.7 | 1735 | 1873.8 | 1786 | 1812.4 | 2010 | 2152.4 | 1911 | 1930.9 | 1311 | 1839.5 | 1703 | 1907.6 | 16159 | 1931.8 |
Appendix B
AH lexical bundles by function (printable)
Bundles to help writers to present and discuss content | |||
---|---|---|---|
Function | Description | Bundle | Notes/ examples |
Location | Indicating time/place | the beginning of the the end of the at the time of at the beginning of at the same time at the end of towards the end of |
|
Procedure | Indicating how or why something is done (or what something is for) | the development of the the role of the the ways in which the way in which the use of the |
|
Quantification | Indicating the quantity or extent of something | the extent to which a member of the one of the most the rest of the as part of the |
|
Abstract description | Indicating an abstract property of something | the history of the the nature of the |
|
Bundles to help writers organise their text | |||
Transition signals | Establish additive, comparative, or contrastive links between elements | in the same way the relationship between the on the other hand on the one hand as well as the |
|
Resultative signals | Mark inferential or causative relations between elements | as a result of | |
Framing signals | Situate statements by specifying a context or limiting conditions | in relation to the in the face of the fact that the in the history of by the fact that to the fact that on the basis of the context of the |
|
the part of the in the case of in the context of on the part of in terms of the the case of the in the form of |
|||
Bundles to help writers to express attitudes | |||
Stance features | Convey a degree of importance, certainty or possibility | can be seen as be seen as a at the centre of it is possible to it is clear that |
|
Engagement features | Address the reader directly | it is important to |
References
Ackerley, Katherine. 2017. Effects of corpus-based instruction on phraseology in learner English. Language Learning and Technology 21(3). 195–216.Search in Google Scholar
Autelli, Erica. 2021. The origins of the term “phraseology”. Yearbook of Phraseology 12. 7–32.10.1515/phras-2021-0003Search in Google Scholar
Ackermannn, Kirsten & Yu-Hua Chen. 2013. Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes 12. 235–247.10.1016/j.jeap.2013.08.002Search in Google Scholar
Ädel, Annelie. 2006. The Use of Metadiscourse in Argumentative Texts by Advanced Learners and Native Speakers of English. Amsterdam: John Benjamins.Search in Google Scholar
Altenberg, Bengt. 1998. On the phraseology of spoken English: The evidence of recurrent word-combinations. In Anthony P. Cowie (ed.), Phraseology: Theory, analysis, and applications. 101–122. Oxford: Clarendon Press.10.1093/oso/9780198294252.003.0005Search in Google Scholar
Anjaniputra, Agung Ginanjar & Vina Aini Salsabila. 2018. The merits of Quizlet for vocabulary learning at tertiary level. Indonesian EFL Journal 4(2). 1–11.10.25134/ieflj.v4i2.1370Search in Google Scholar
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Search in Google Scholar
Biber, Douglas. 1989. A Typology of English Texts. Linguistics 27(1). 3–43.10.1515/ling.1989.27.1.3Search in Google Scholar
Biber, Douglas, Susan Conrad & Viviana Cortes. 2004. If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics 25. 371–405.10.1093/applin/25.3.371Search in Google Scholar
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Longman.Search in Google Scholar
Bueno-Alastuey, M Camino & Katalin Nemeth. 2020. Quizlet and podcasts: effects on vocabulary acquisition. Computer Assisted Language Learning. 1–30.10.1080/09588221.2020.1802601Search in Google Scholar
Bunton, David. 1999. The use of higher level metatext in PhD theses. English for Specific Purposes 18. 41–56.10.1016/S0889-4906(98)00022-2Search in Google Scholar
Byrd, Pat & Averil Coxhead. 2010. On the other hand: Lexical bundles in academic writing and in the teaching of EAP. University of Sydney papers in TESOL 5. 31–64.Search in Google Scholar
Charles, Maggie. 2012. ‘Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself corpus-building. English for Specific Purposes 31(2). 93–102.10.1016/j.esp.2011.12.003Search in Google Scholar
Charles, Maggie. 2014. Getting the corpus habit: EAP students’ long-term use of personal corpora. English for Specific Purposes 35. 30–40.10.1016/j.esp.2013.11.004Search in Google Scholar
Chen, Yu-Hua & Paul Baker. 2010. Lexical bundles in L1 and L2 academic writing. Language learning and technology 14(2). 30–49.Search in Google Scholar
Chen, Chih-Ming, Huimei Liu & Hong-Bin Huang. 2019. Effects of a mobile game-based English vocabulary learning app on learners’ perceptions and learning performance: A case study of Taiwanese EFL learners. ReCALL 31(2). 170–188.10.1017/S0958344018000228Search in Google Scholar
Chien, Chin-Wen. 2015. Analysis the Effectiveness of Three Online Vocabulary Flashcard Websites on L2 Learners’ Level of Lexical Knowledge. English Language Teaching 8(5). 111–121.10.5539/elt.v8n5p111Search in Google Scholar
Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Massachusetts, USA: MIT Press.10.21236/AD0616323Search in Google Scholar
Conrad, Susan. 2011. Variation in corpora and its pedagogical implications. In Vander Viana, Sonia Zyngier & Geoff Barnbrook (eds.), Perspectives on corpus linguistics, 47–62. Amsterdam: John Benjamins.10.1075/scl.48.04conSearch in Google Scholar
Cortes, Viviana. 2004. Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes 23. 397–423.10.1016/j.esp.2003.12.001Search in Google Scholar
Cortes, Viviana. 2006. Teaching lexical bundles in the disciplines: An example from a writing intensive history class. Linguistics and Education 17. 391–406.10.1016/j.linged.2007.02.001Search in Google Scholar
Coxhead, Averil. 2000. A new academic word list. TESOL Quarterly, 34(2). 213–238.10.2307/3587951Search in Google Scholar
Durrant, Philip. 2009. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes 28(3). 157–169.10.1016/j.esp.2009.02.002Search in Google Scholar
Durrant, Philip. 2017. Lexical bundles and disciplinary variation in university students’ writing: Mapping the territories. Applied Linguistics 38(2). 165–193.10.1093/applin/amv011Search in Google Scholar
Frankenberg-Garcia, Ana, Robert Lew, Jonathan Roberts, Geraint Rees & Nirwan Sharma. 2019. Developing a writing assistant to help EAP writers with collocations in real time. ReCALL 31(1). 23–39.10.1017/S0958344018000150Search in Google Scholar
Gilquin, Gaëtanelle & Sylviane Granger. 2010. How can data-driven learning be used in language teaching? In Anne O’Keeffe & Michael McCarthy (eds.), The Routledge Handbook of Corpus Linguistics, 359–370. London: Routledge.10.4324/9780203856949-26Search in Google Scholar
Hafner, Cristoph & Christopher Candlin. 2007. Corpus tools as an affordance to learning in professional legal education. Journal of English for Academic Purposes 6(4). 303–318.10.1016/j.jeap.2007.09.005Search in Google Scholar
Hirata, Yoko & Yoshihiro Hirata. 2018. Students’ Evaluation of SkELL: The ‘Sketch Engine for Language Learning’. Paper presented at the International Conference on Blended Learning, Osaka, July 31–2 August.10.1007/978-3-319-94505-7_30Search in Google Scholar
Hoey, Michael. 2005. Lexical priming: A new theory of words and language. London: Routledge.Search in Google Scholar
Horst, Marlise, Tom Cobb & Ioana Nicolae. 2005. Expanding academic vocabulary with an interactive on-line database. Language learning and technology 9(2). 90–110.Search in Google Scholar
Hulstijn, Jan H. 2001. Intentional and incidental second language vocabulary learning: A reappraisal of elaboration, rehearsal, and automaticity. In Peter Robinson (ed.), Cognition and Second Language Instruction, 258–286. Cambridge: CUP.10.1017/CBO9781139524780.011Search in Google Scholar
Hyland, Ken & Kevin Jiang. 2018. Academic lexical bundles: how are they changing? International Journal of Corpus Linguistics 23(4). 383–407.10.1075/ijcl.17080.hylSearch in Google Scholar
Hyland, Ken & Philip Shaw. 2016. The Routledge handbook of English for Academic Purposes. Abingdon: RoutledgeSearch in Google Scholar
Hyland, Ken. 1999. Academic attribution: citation and the construction of disciplinary knowledge. Applied Linguistics 20(3). 341–367.10.1093/applin/20.3.341Search in Google Scholar
Hyland, Ken. 2005. Stance and engagement: a model of interaction in academic discourse. Discourse studies 7(2). 173–191.10.1177/1461445605050365Search in Google Scholar
Hyland, Ken. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27(1). 4–21.10.1016/j.esp.2007.06.001Search in Google Scholar
Hyland, Ken. 2016. General and specific EAP. In Ken Hyland and Philip Shaw (eds.). The Routledge handbook of English for Academic Purposes, 17–43. Abingdon: Routledge.10.4324/9781315657455Search in Google Scholar
Johns, Tim & Philip King. 1991. Classroom Concordancing. Birmingham: University of Birmingham.Search in Google Scholar
Krishnamurthy, Ramesh & Iztok Kosem. 2007. Issues in creating a corpus for EAP pedagogy and research. Journal of English for Academic Purposes 6. 356–373.10.1016/j.jeap.2007.09.003Search in Google Scholar
Lee, David & John Swales. 2006. A Corpus-Based EAP Course for NNS Doctoral Students: Moving from Available Specialized Corpora to Self-Compiled Corpora. International journal of corpus linguistics 11(2). 256–257.Search in Google Scholar
Miller, Julia. 2020. The bottom line: Are idioms used in English academic speech and writing? Journal of English for Academic Purposes 43. 1–14.10.1016/j.jeap.2019.100810Search in Google Scholar
Nesi, Hilary. 2008. BAWE: an introduction to a new resource. In Ana Frankenberg-Garcia, Tawfiq Rkibi, Maria Braga da Cruz, Ricardo Carvalho, Cristina Direito & Diogo Santos-Rosa (eds.) Proceedings of the Eighth Teaching and Language Corpora Conference, 239–46. Lisbon, Portugal: ISLA.Search in Google Scholar
North, Sarah. 2005. Different values, different skills? A comparison of essay writing by students from arts and science backgrounds. Studies in Higher Education 30(5). 517–533.10.1080/03075070500249153Search in Google Scholar
O’Flynn, James. 2019. Arts and Humanities Bundle Finder https://h5p.org/node/438388 (accessed 15 June 2022)Search in Google Scholar
O’Sullivan, Íde. 2007. Enhancing a process-oriented approach to literacy and language learning: The role of corpus consultation literacy. ReCall 19(3). 269–286.10.1017/S095834400700033XSearch in Google Scholar
Sinclair, John. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.Search in Google Scholar
Thompson, Paul. 2009. Literature reviews in applied PhD theses: Evidence and Problems. In Ken Hyland & Giuliana Diani (eds.), Academic Evaluation: Review genres in university settings, 50–67. London: Palgrave Macmillan.10.1057/9780230244290_4Search in Google Scholar
Thurstun, Jennifer & Christopher Candlin. 1998. Concordancing and the teaching of the vocabulary of academic English. English for Specific Purposes 17(3). 267–280.10.1016/S0889-4906(97)00013-6Search in Google Scholar
Warren, Martin. 2016. Introduction to data driven learning. In Fiona Farr & Liam Murray (eds.), The Routledge Handbook of Language Learning and Technology, 337–347. London: Routledge.10.4324/9781315657899Search in Google Scholar
Whiteside, Karin 2016. A corpus-driven investigation into the semantic patterning of grammatical keywords in undergraduate History and PIR (Politics and International Relations) essays. Coventry: University of Warwick PhD thesis.Search in Google Scholar
Wood, David. 2002. Formulaic language in acquisition and production: Implications for teaching. TESL Canada Journal/Revue TESL du Canada 20. 1–15.10.18806/tesl.v20i1.935Search in Google Scholar
Wray, Alison. 2013. Formulaic language. Language Teaching 46. 316–334.10.1017/S0261444813000013Search in Google Scholar
Wu, Shaoqun, Ian Witten & Margaret Franken. 2010. Utilizing lexical data from a Web-derived corpus to expand productive collocation knowledge. ReCALL 22(1). 83–102.10.1017/S0958344009990218Search in Google Scholar
© 2023 James O’Flynn, published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial (English)
- Editorial (Deutsch)
- Articles
- Criteria for sample sentences in phraseological dialect dictionaries: a proposal based on GEPHRAS2
- ¿Coger con las manos en la masa es una locución o una colocación?
- The contextual behaviour of specialised collocations: typology and lexicographic treatment
- Lexical bundles in the academic writing of the Arts and Humanities: from corpus to CALL
- Proverbial markers and their significance for linguistic proverb definitions: an experimental investigation
- Polysemie, Ambiguität und Vagheit der Idiome aus kognitiver Perspektive
- Idioms in Syrian Arabic: a semantic and grammatical approach to the verb
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Obituary
- Elena Arsenteva In Memoriam (1956–2022)
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial (English)
- Editorial (Deutsch)
- Articles
- Criteria for sample sentences in phraseological dialect dictionaries: a proposal based on GEPHRAS2
- ¿Coger con las manos en la masa es una locución o una colocación?
- The contextual behaviour of specialised collocations: typology and lexicographic treatment
- Lexical bundles in the academic writing of the Arts and Humanities: from corpus to CALL
- Proverbial markers and their significance for linguistic proverb definitions: an experimental investigation
- Polysemie, Ambiguität und Vagheit der Idiome aus kognitiver Perspektive
- Idioms in Syrian Arabic: a semantic and grammatical approach to the verb
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Book reviews
- Obituary
- Elena Arsenteva In Memoriam (1956–2022)