Lexical bundles in the academic writing of the Arts and Humanities: from corpus to CALL

James O’Flynn

doi:10.1515/phras-2022-0006

Artikel Open Access

Lexical bundles in the academic writing of the Arts and Humanities: from corpus to CALL

James O’Flynn

Veröffentlicht/Copyright: 7. Dezember 2022

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Informationen für Autor*innen

Aus der Zeitschrift Yearbook of Phraseology Band 13 Heft 1

Abstract

Lexical bundles are highly frequent and functionally significant in written academic discourse. Many studies have explored lexical bundles through a disciplinary lens, but their findings are not typically incorporated into published L2 teaching-learning materials. As a result, there are a number of challenges facing teachers who want to include a focus on disciplinary lexical bundles in their academic writing instruction at tertiary level. This paper describes a method for deriving a functional list of disciplinary lexical bundles from a corpus of academic writing, and then discusses the results quantitatively and qualitatively. The findings suggest that a small number of bundles (n=47) occur across genres, levels and institutions in the academic writing of the Arts and Humanities disciplines. Finally, a series of Computer-Assisted Language Learning (CALL) resources are presented, including a tool for finding the bundles in academic texts. The functional list and resources will be of interest to those involved in the teaching-learning of academic writing in the Arts and Humanities disciplines.

Keywords: Lexical bundles; academic writing; corpus linguistics; word lists; data-driven learning

1 Introduction

With the development of corpus linguistics, it has become increasingly clear that formulaic language, that is, the use of conventionalised combinations of two or more words, is pervasive in natural language use. Empirical corpus studies have demonstrated that up to 80% of language could be formulaic in nature (see Altenberg 1998; Biber et al. 1999; Erman and Warren 2000; Howarth 1998). The ubiquity of formulaic language, though, does not dovetail with traditional theories of language, in which language is decoded and composed as single words which are stitched together according to the syntactic rules of grammar (e.g., Chomsky 1965). This disjuncture between empirical corpus findings and traditional linguistic theory paved the way for new and radical theories of language. These new theories posit that language is composed of and decoded as multi-word lexical units (rather than as single words), and that grammar is the outcome of this complex lexical structure (rather than vice versa) (see Sinclair 1991; Hoey 2005). These theories reverse the traditional role of lexis and grammar, compelling linguists to elevate the importance of lexis to being at least as important as grammar.

Despite the emergence (and widespread acceptance) of these new theories, language learning was still for a long time construed as the word-by-word generation of sentences with reference to grammatical rules (Wray 2013; Meunier 2012). In fact, it was not all that long ago that practically no published L2 learning materials capitalised on the formulaic nature of language (Wood 2002). This has started to change, though, and it is now well attested that L2 learners need to be exposed to formulaic language in reading and writing for implicit learning to take place, but also that formulaic language needs to be practised extensively to facilitate explicit learning (see Cortes 2006; Jones and Haywood 2004; Ackerley 2017). To that end, a great deal of phraseology research has been dedicated to advancing the teaching-learning of formulaic language in L2 classrooms, with particular attention paid to the formulaic language of academic writing.

In its broadest sense, phraseology research covers a whole range of different word combinations, from collocations to idioms, and plays an important part in language teaching (Autelli 2021). Within this subfield of linguistic research, corpora of academic writing have been used to look at various types of formulaic language, and lists of high frequency phraseological items have been developed for pedagogical purposes. Ackermann and Chen (2013) created the Academic Collocation List containing 2,500 frequent and pedagogically useful two-word combinations that occur in writing across the academy. Simpson-Vlach and Ellis (2010) developed an Academic Formula List, which includes 207 statistically and psychologically significant three- and four-word phrases that are used in speech and writing across a broad range of academic disciplines. Miller (2020) derived a list of 545 frequent idioms from mixed disciplinary corpora of academic speech and writing. Byrd and Coxhead (2010) presented a list of 21 high frequency four-word lexical bundles occurring across academic writing. The latter, lexical bundles, is the focus of the present study.

Lexical bundles are formulaic sequences of three or more words that occur in a given register above a certain frequency threshold (Biber et al. 1999). Lexical bundles are pervasive in academic prose, with three- and four-word bundles, e.g., in order to, one of the, part of the, in the case of, on the other hand, covering around 20% of spoken and written academic discourse (Biber et al. 1999). These high-frequency lexical bundles also serve important discourse functions in academic prose. A number of studies have classified bundles using taxonomies that correspond to the metafunctions of systemic functional linguistics (Biber, Conrad and Cortes 2004; Hyland 2008; Byrd and Coxhead 2010; Durrant 2017). With research consistently confirming that lexical bundles are highly frequent and readily interpretable in functional terms, each study highlights that lexical bundles are a fundamentally important part of the student writers’ communicative repertoire.

It follows, then, that a major strand of lexical bundle studies has been examining them through a disciplinary lens. Corpus research in general tends to show that “specific disciplines […] have different ideas about what is worth communicating and how this should be done” (Hyland 2016: 22), and lexical bundle research is no exception. Studies investigating lexical bundles from a disciplinary perspective have demonstrated that bundle usage is reflective of the distinct social and rhetorical practices of different academic communities, particularly along the hard/ soft division (see Cortes 2004, 2006; Hyland 2008; Durrant 2017). This research highlights the need for a disciplinary approach to the teaching-learning of lexical bundles in academic writing. That being said, comprehensive lists of disciplinary lexical bundles that have been generated for research purposes and published in journal articles (e.g., Durrant 2017) are not widely incorporated into published classroom materials. Many materials dealing with the concerns of academic writing are, after all, designed to be as relevant and saleable to as many students as possible (Hyland 2016), meaning research that distinguishes the discoursal practices of one discipline from others can be an inconvenience to publishers. As a result, research-oriented disciplinary lists of lexical bundles (and other formulaic language, for that matter) are often not transferred to writing instruction. As Conrad (2011: 55) writes, “many teachers […] do not have the time to figure out how to incorporate new information from corpus studies into their teaching”. The aim of this paper, therefore, is not only to develop a disciplinary list of lexical bundles organised by their discourse functions, but also to offer some ready-made computer-based materials to facilitate the teaching-learning of the bundles to student writers in the Arts and Humanities (AH) disciplines.

The AH disciplines have been chosen because they are more discursive than their counterparts in the hard knowledge fields. In the ‘soft’ AH disciplines, knowledge is viewed as mediated and contested, meaning there is certain degree of internal discord (North 2005). By contrast, the ‘hard’ disciplines are characterised by internal unity and a set of shared assumptions (North 2005). These differences are, of course, epistemological, but they have a direct impact on a variety of discourse features. For example, in the soft knowledge fields, writers have to use more hedges because knowledge is contested (Hyland 2005), and more citation to establish context and persuade their readers (Hyland 1999). These differences have practical implications for AH writers throughout their academic careers. The IELTS entry requirements of British universities, for example, are typically higher for students in the AH disciplines than for those in the hard sciences. AH writers are also afforded a maximum of 80,000 words for PhD theses in comparison with 50,000 in the hard sciences. For reasons such as these, the present study focuses specifically on developing a list of lexical bundles and associated CALL resources for students in the AH disciplines.

CALL, or Computer Assisted Language Learning, refers to the application of a computer, or, more broadly, technology, to language teaching-learning. In the present study, the computer-based resources will be developed according to the principles of Data Driven Learning (DDL), a type of CALL. DDL requires students to analyse corpus data and make generalisations (Johns 1991), i.e. it is an inductive approach whereby learners are also language researchers. DDL has been found to improve students’ abilities to draw inferences, create a heightened awareness of language patterns, and facilitate the expansion of vocabulary knowledge (O’Sullivan 2007; Gilquin and Granger 2010). While some advocates of DDL encourage students to look at first-hand corpus data using corpus management software (e.g., Lee and Swales 2006; Charles 2012, 2014), others have stated that the use of professional corpus management software in the classroom can be overwhelming for both teachers and students alike (Warren 2016; Hafner and Candlin 2007). As a result, a number of user-friendly DDL tools have been successfully created and used to aid L2 writing instruction, such as Compleat Lexical Tutor (Horst, Cobb and Nicolae 2005), SKELL (Hirata and Hirata 2018) and Collocaid (Frankenberg-Garcia et al. 2019). Each of these corpus-informed CALL resources provides an excellent example of how a data-driven approach can be taken to writing instruction. However, each of the aforementioned resources comes with a caveat in that they are generic, rather than disciplinary, in nature. In contrast, the current paper will introduce disciplinary computer-based resources for the teaching-learning of lexical bundles specifically in the Arts and Humanities disciplines.

2 Corpus and methods

2.1 Data collection and corpus compilation

The corpus was structured according to the disciplinary map of the University of Warwick’s Faculty of Arts, from where texts were collected. PhD theses were chosen as the target texts for the corpus. Convincing arguments have been made elsewhere in favour of using proficient student writing, especially PhD theses, as the model of good writing practice promoted in L2 academic writing instruction (see Adel 2006; Krishnamurthy and Kosem 2007). Theses were collected from the University of Warwick’s online research repository. 90 theses (at c. 80,000 words per thesis) were collected, that is, 10 theses from each of the nine departments of the Faculty of Arts. Theses completed within the previous fifteen years (2006–2020 inclusive) were targeted for collection to mitigate against diachronic change, which has been found to affect lexical bundle usage in research writing (Hyland and Jiang 2018). The theses were cleaned, i.e. everything before the introduction and everything after the conclusion was removed (e.g., abstract, contents, bibliography, appendices, etc.), and the Sketch Engine was used to compile the c. 8.4 million-word corpus (Table 1).

Tab. 1:

The corpus, subcorpora and word counts

Department name	Subcorpus name	No. theses	Date range	No. words
Classics and Ancient History	ANCHIST	10	2009–2017	∼ 940,346
English and Comparative Literary Studies	ENGCOMP	10	2014–2016	∼ 1,047,412
Film and Television Studies	FILMTV	10	2013–2017	∼ 936,806
History	HIST	10	2015–2018	∼ 925,942
History of Art	ARTHIST	10	2009–2017	∼ 985,436
Modern Languages and Culture	MODLANG	10	2013–2016	∼ 933,851
Study of the Renaissance	RENAIS	10	2001–2017^[1]	∼ 989,698
Theatre and Performance Studies	THEATPER	10	2012–2016	∼ 712,691
Translation and Comparative Cultural Studies	TRANSCULT	10	2006–2008	∼ 892,726

Total	–	90	2001–2018	8,364,916

2.2 Quantitative analysis of the AH corpus to derive a list of lexical bundle

The n-gram function of the Sketch Engine was used to derive a list of four-word lexical bundles occurring with a minimum frequency of 10 per million words (PMW) in each of the nine subcorpora. To arrive at this definition of lexical bundles, a number of decisions have been made regarding length, frequency and range. In terms of length, most studies focus exclusively on four-word bundles because they perform a wider range of functions than three- and five-word bundles and provide a manageable data set (e.g., Hyland 2008; Cortes 2004). In terms of frequency, thresholds are “decided by the researcher based on what seems reasonable given the volume of data” (Byrd and Coxhead 2010: 32). In the present study, 10 occurrences PMW, as used in the seminal description of lexical bundles (Biber et al. 1999), seemed reasonable given the volume of data. In terms of range, many studies only require a bundle to recur across five different texts (out of hundreds) (e.g., Cortes 2004; Biber, Conrad and Cortes 2004), but the present study set a more conservative range threshold. Bundles must occur 10 times PMW in each of the nine subcorpora, i.e. each bundle must have a minimum frequency of 10 PMW in each and every individual subcorpus. This range threshold should guard against subject-specific and authorial idiosyncrasies, ensuring that the lexical items on the final list are useful to students across the AH disciplines. Applying these quantitative criteria yielded a list of 47 four-word bundles (Table 2).

Tab. 2:

Bundles with a minimum frequency of 10 PMW in each of the nine subcorpora

the end of the	in relation to the	to the fact that	the part of the
at the end of	at the beginning of	the history of the	as one of the
at the same time	in the form of	in the same way	in the face of
on the other hand	as a result of	in terms of the	towards the end of
in the case of	the way in which	it is important to	the development of the
as well as the	the rest of the	the case of the	as part of a
in the context of	on the part of	it is possible to	the relationship between the
one of the most	the nature of the	on the basis of	can be seen as
the fact that the	the role of the	at the time of	be seen as a
the ways in which	the use of the	in the history of	at the centre of
the beginning of the	the extent to which	it is clear that	a member of the
on the one hand	the context of the	by the fact that

2.3 Qualitative analysis of the functions of bundles in the AH corpus

The 47 bundles were classified qualitatively according to a functional taxonomy. Because PhD theses are oriented towards the research world (Swales 2004), Hyland’s functional taxonomy that “specifically reflect[s] the concerns of research writing” (Hyland 2008: 13) was chosen as the point of departure for the functional analysis (Figure 1). The Sketch Engine’s random sample function was used to generate samples of 100 concordances for each of the 47 lexical bundles, which were then examined through a functional lens (i.e. a total of 4,700 concordances, approximately one third of the total concordances, were examined). It has been argued elsewhere that random samples of 100 concordances are sufficient for the purposes of categorising lexical items in a taxonomy (Groom 2007: 101, cited in Whiteside 2016). Bundles were categorised according to their primary function in the random samples of 100 (i.e. the function they served in the majority of occurrences).^[2] This process was qualitative, carried out by the researcher alone, and should not be considered absolute. As Durrant (2017: 185) points out in relation to his own categorisation, “this is not intended to be a definitive enumeration […] but rather a means of interpreting the present list”. Two modifications were made to Hyland’s taxonomy (Figure 1) for the purposes of the present study:

The names of the three main categories were simplified to (1) presentation of content, (2) organisation of the text, (3) expression of attitudes by the writer. This aims to facilitate use of the list in classroom settings with less-proficient language learners (Byrd and Coxhead 2010: 42).
The description sub-category was divided into physical description and abstract description. While some researchers use the sub-category intangible framing attributes to describe bundles indicating an abstract property of something (e.g., Biber et al. 2004; Durrant 2017), it was felt that abstract description was again more teaching-learning-friendly.

Fig. 1:

Hyland’s (2008) functional taxonomy modified

3 Results and discussion

3.1 The frequencies of AH lexical bundles

The full frequency data for the 47 bundles can be found in Appendix A. The most frequent bundle is the end of the (FPMW 137.1), with the top seven bundles all occurring over 60 times per million words (see Table 3). After this, the frequencies per million words diminish at a relatively stable rate until the lowest frequency bundle, a member of the (FPMW 18.9), which occurs close to 20 times PMW. In fact, 43 out of 47 bundles occur with a minimum frequency of 20 PMW in the AH corpus. Overall, the 47 bundles occur 16,159 times for a total of 64,636 words in the 8.4 million-word corpus, making up 0.8% of all the words in the corpus. A list of just 47 lexical items which occur across AH disciplines and cover close to 1% of all the words in a corpus of AH writing could be a powerful resource for L2 writing instruction in the AH.

Tab. 3:

Raw and normalised frequencies (PMW) of the top seven lexical bundles

Order (freq.)	Bundle	AH Corpus Frequency	AH Corpus FPMW
1	the end of the	1,147	137.1
2	at the end of	864	103.3
3	at the same time	835	99.8
4	on the other hand	718	85.8
5	in the case of	627	75.0
6	as well as the	571	68.3
7	in the context of	524	62.6
…	…	…	…
47	a member of the	158	18.9

The total coverage of the 47 lexical bundles was also tested in a different corpus of similar texts to ascertain whether the methodology identified bundles specific to the AH disciplines and, if so, whether the bundles are relevant to the wider AH community (i.e. beyond PhD writers at University of Warwick). The British Academic Written English (BAWE) corpus was used to represent a different collection of similar texts. BAWE is a 6.5-million-word collection of 3,000 student written assignments of various genres at undergraduate and Master’s level across a range of institutions (Nesi 2008). The list of 47 bundles was searched in each of the four preloaded disciplinary subcorpora of the BAWE corpus (Arts and Humanities [AH], Social Sciences [SS], Physical Sciences [PS] and Life Sciences [LS]). As can be seen in Table 4, the frequency per million words of the 47 bundles is the highest by a substantial margin in the AH subcorpus at 2,177.7. They occur around 10% more frequently in the AH than the SS, and more than 25% more frequently in the AH than they do in the PS and LS. Furthermore, the 47 bundles occur in the BAWE AH subcorpus 4,080 times for a total of 16,320 words, making up 0.9% of all the words in the corpus. That is to say, the bundles provide a slightly higher coverage of the BAWE AH subcorpus than they do of the AH corpus from which they derive. These results augur well for the list. They confirm that the 47 bundles are specific to the AH disciplines. They also suggest that they could be pedagogically relevant across AH genres, levels and institutions (as represented by the BAWE corpus).

Tab. 4:

Frequencies of the 47 lexical bundles in the disciplinary subcorpora of BAWE

BAWE Subcorpus	Number of words in subcorpus	Raw frequency of AH lexical bundles	FPMW
Arts and Humanities (AH)	1,873,541	4,080	2,177.7
Social Sciences (SS)	2,239,457	4,421	1,974.1
Physical Sciences (PS)	1,376,045	2,034	1,478.1
Life Sciences (LS)	1,479,046	2,177	1,471.9

3.2 The functions of AH lexical bundles

3.2.1 Bundles to help writers present and discuss content

The 47 lexical bundles each served a clear primary function in the random samples of 100 extracted from the corpus. As can be seen in Table 5, the most frequent bundles in the corpus are the 20 bundles serving the primary function of presenting and discussing content (with a raw frequency 7,796). These ideational bundles may be used a lot because PhD theses are a pedagogic genre in which the writer is expected to present and discuss ideas in a manner which demonstrates knowledge. After all, “the reception of the text will determine whether the writer is to be judged worthy of the award of a doctorate” (Thompson 2009: 51).

Tab. 5:

Bundles with the primary function of presenting and discussing content

	No. of bundles	Raw freq.
Bundles to help writers present and discuss content	20	7,796
Location	the beginning of the, the end of the, at the time of, at the beginning of, at the same time, at the end of, towards the end of
Procedure	the development of the, the role of the, the ways in which, the way in which, the use of the
Quantification	the extent to which, a member of the, one of the most, the rest of the, as one of the, as part of the
Abstract description	the history of the, the nature of the

There are seven location bundles, each of which is used almost exclusively for locating events temporally (1). At the same time is used temporally, but it can be used to indicate either that two events occurred simultaneously (2) or that two processes are performed simultaneously (3). Durrant (2017: 185) too found that in the soft knowledge fields, location bundles are used predominantly in a temporal context. This reflects the nature of the soft AH disciplines, which deal with abstract constructs like time (in contrast with the hard sciences which deal with more physical constructs).

However, at the time of his arrival in Moscow Maximus did not know Church Slavonic and initially had to work in cooperation with his Russian associates. (RENAIS)
In order to prove this notion, Pico refers to an example of two people born on the same day, at the same time and under the same position of celestial spheres whose lives cannot be identical due to the individual causes. (RENAIS)
The electrician begins to fiddle with some of the switches, at the same time turning one of the wheels very slowly. (THEATPER)

While most of the location bundles in the AH corpus are temporal, there are also examples of location bundles referring to external textual locations (4), reflecting the intertextual nature of AH research writing.
Mrs. Carruthers states in the beginning of the film that she hates India but that she is willing to devote her life to Carruthers and therefore live where he is needed. (FILMTV)

The next largest sub-type is quantification bundles, of which there are six. Three of these bundles were used in an abstract way, i.e. not allowing (or requiring) precise measurement or counting (5, 6, 7). Quantification of this type reflects the abstract subject matter of the soft sciences, and is less likely to be found in the hard sciences in which there is greater emphasis on the accuracy and replicability of procedures and results. The other three quantification bundles are more precise, referring to one countable element of an entity, but still in most cases these refer to an intangible entity, e.g., needlework (8) or Japan’s economy (9), further reinforcing the focus on abstract constructs that typifies the soft knowledge fields.

We defined this as the extent to which the performers were ‘immersed in the general gravity’ of the archive. (THEATPER)
In Taiwan, Suzuki spent the rest of the war as a sub-lieutenant at an isolated weather observation outpost. (FILMTV)
The imagery of the novel contains symbols such as the honeycomb and bees, which could be understood both as part of the Jewish-Christian cultural baggage and in the context of a pagan system of thought.
However, perhaps a consideration of one of the most popular female pastimes of the period – needlework – can shed some light on this curious correlation. (ARTHIST)
My texts suggest that there is a disjuncture between Japan’s possession of economic capital as one of the world’s largest and most advanced economies, and its perceived lack of the cultural capital that it assumes the British have. (ENGCOMP)

Next are the five procedural bundles for indicating how or why something is done or what something is for (10, 11). A particularly interesting example of a procedural bundle is the use of the. Durrant (2017) found many bundles containing use in the hard disciplines, and these typically referred to the use of interchangeable objects as instruments. However, in the AH corpus, there is only one bundle containing use (the use of the) and in the overwhelming majority of instances (88/ 100) it is used to refer to an abstract concept (12), with more than half of those instances (45/88) referring to an abstract linguistic concept, such as metaphor, phrase, term, word, verb, adjective, participle, plural, prefix, slash or language among others (13). Unlike objects with a physical or concrete existence (e.g., potentiometer or magnet), these are abstract concepts used to discuss features of language. This again demonstrates the focus on abstract constructs in AH discourse.

I believe more and more that the role of the cinema is to destroy myths, to be pessimistic. (FILMTV)
This chapter studies the different types of micro-/macrocosm models that are found in texts throughout this period and the ways in which they were put forward. (ANCHIST)
This alignment with Mike’s subjectivity is then intensified by the use of the dream device. (FILMTV)
The use of the term to characterise free-born persons as delicate, as opposed to slaves who are inured to hardship, is found in poetry. (ANCHIST)

The final ideational sub-type is abstract description, which includes two bundles (14, 15). Note that there are no bundles serving the primary function of describing the physical properties of something. The fact that there are two bundles serving the primary function of abstract description and no bundles serving the primary function of physical description again elucidates the focus on abstract constructs in the soft AH disciplines.

The nature of the inscription as well as its physical form differs, thus, greatly from the inscriptions to Asclepius Zimidrenus in Thrace. (ANCHIST)
I place the fresco in the early years of the sixteenth century estimating that these Sibyls are late in the history of the genre on a basis of textual analysis. (RENAIS)

3.2.2 Bundles to help writers organise their text

There are 21 bundles serving the primary function of helping writers to organise their text (Table 6). The large number of textual bundles perhaps reflects the more discursive nature of the soft knowledge fields. Academic writing in the AH disciplines is, after all, a kind of rhetorical performance (North 2005) which requires clear and well organised arguments.

Tab. 6:

Bundles with the primary function of organising the text

	No. of bundles	Raw freq.
Bundles to help writers organise their text	21	7,209
Transition signals	in the same way, the relationship between the, on the other hand, on the one hand, as well as the
Resultative signals	as a result of
Framing signals	in relation to the, in the face of, the fact that the, in the history of, by the fact that, to the fact that, on the basis of, the context of the, the part of the, in the case of, in the context of, on the part of, in terms of the, the case of the, in the form of

As shown in Table 6, the largest sub-type of bundles to help writers organise their text is framing signals, of which there are 15. These are used to provide a specific instance of something (16), specify the context in which a statement applies (17), or set conditions used to interpret or explain preceding/ forthcoming text (18). The high number of framing signals might be explained by the epistemology of the soft AH disciplines. Knowledge in these fields is mediated and contested, meaning writers have to work harder to persuade their reader (North 2005). In other words, they have to pay greater attention to contextualising their research and drawing careful connections, and they can do this using framing signals.

Epigraphic evidence exists for the Basileia during the fourth century BC in the form of two inscriptions from Boiotia. (ANCHIST)
In the context of live performance, Rebecca Schneider has most prominently argued for an understanding of performance as an act of remaining and a means of re-appearance and re-participation. (THEATPER)
Evidently, in auto/biographical writing myths of self merge with myths of historical and cultural beginnings in relation to the matrix of peoples and histories that constitute the whole of the community or nation. (TRANSCULT)

The five transition signals are the second largest sub-type of text-organising bundles. They are used to establish additive (19), contrastive (20), or comparative (21) links between elements.

As well as the uncertainty about the exact dating of Astley’s ownership, there is some debate over the chronology and succession, whether or not it concerns the exact same premises, and, where precisely the shop was located. (RENAIS)
Disease comes about on the one hand through an excess of heat or cold; on the other hand through surfeit or lack of nutriment; its location is the blood, marrow, or brain. (ANCHIST)
In the same way that contemporary pornography depends upon mass production, the commercial exchange of erotica on vases was driven by the market. (ANCHIST)

There is only one bundle, as a result of, in the resultative signals sub-category. This is not particularly surprising since both Hyland (2008) and Durrant (2017) found that cause and effect bundles were not typical of writing in the soft sciences. In the AH corpus, as a result of typically marks the effect/s caused by a social, political and/or historical event (22).

These leanings were strengthened by a mission to ‘rechristianize the poorer classes’ (possibly as a result of the divorce of Church and state in 1905). (MODLANG)

Referring back to the functional taxonomy (Figure 1), there are no topic bundles, nor are there any bundles serving the primary function of structuring signal. The topic bundles were likely filtered out by the conservative range threshold, but the absence of structuring signals is somewhat unexpected in a corpus comprising whole PhD theses.^[3] In these extended texts it is “all the more important for the writer to continue to orient the reader throughout the thesis as to how the current subject matter relates to the overall thesis” (Bunton 1999: 41), and one way to do this is using structuring signals. Perhaps this function is served in the corpus by 3-word bundles that do not fit the criteria of this study, e.g., as discussed above, as noted earlier, as already mentioned, inter alia.

3.2.3 Bundles to help writers express their attitudes

The smallest category is bundles to help writers express their attitudes (Table 7). Although there are only six of these interpersonal bundles, they will be of particular importance to AH writers who have to explicitly evaluate opposing arguments while actively getting behind one of them.

Tab. 7:

Bundles with the primary function of expressing the attitudes of the writer

	No. of bundles	Raw freq.
Bundles to help writers to express attitudes	6	1,154
Stance features	can be seen as, be seen as a, at the centre of, it is possible to, it is clear that
Engagement features	it is important to

Five of the interpersonal bundles in the AH corpus are stance features, which have been found to be a distinctive feature of the soft knowledge fields (Hyland 2008; Durrant 2017). The five stance features in the AH corpus all perform the similar but slightly nuanced functions of conveying a degree of certainty, possibility or importance. There are three for conveying a degree of certainty, which can either hedge (23), boost (24) or both (depending on the preceding modal verb) (25, 26). In 86 out of 100 instances, be seen as a is preceded by a modal verb within a span of -3 (e.g., can, cannot, could, would, should, may, might, must). Then, there is one stance feature which indicates a degree of possibility (27) and another that indicates the importance of something (28). At the centre of is a particularly interesting stance feature because in Durrant’s (2017) Science and Technology corpus its primary function was as a physical location marker, which clearly exemplifies the physical/ abstract contrast between the hard/ soft disciplines.

For many, this can be seen as a clear advantage of the genre… (ENGCOMP)
It is clear that this particular inscription was deliberately constructed as an effective medium through which to display the character and achievements of the individual concerned. (ANCHIST)
The fusion of micro dialogue with visual satire, arguably, could be seen as a step towards the satirical prints of William Hogarth, George Townshend and the political cartoons that would develop in the eighteenth century. (HIST)
The image cannot be viewed as independent, but must be seen as a component of the vase in its entirety. (ANCHIST)
However, it is possible to see how the war may have increased the risk of IPV developing by providing other conducive psychological factors. (HIST)
These depictions place the loutrophoros at the centre of bridal preparations and more significantly, at the centre of socially determined feminine beauty. (ANCHIST)

Just one of the six bundles for the expression of writer attitudes, it is important to, is an engagement feature. It explicitly marks the presence of the reader and “acknowledges the dialogic dimension of research writing, intervening to direct the reader to some action or understanding” (Hyland 2008: 19). This is typically done using a mental verb (29, 30, 31). It is perhaps surprising that only one engagement bundle was identified, since in the AH disciplines where knowledge is contested it is all the more important for the writer to engage with the reader. It could be possible that there is some intradisciplinary variation in terms of engagement bundles within the AH disciplines, with it is important to being the only bundle used universally and therefore meeting the range threshold.

Although the Acropolis was the religious focal point of the city, it is important to appreciate that every part of the city had a religious association. (ANCHIST)
At this point, it is important to consider what an ordinary traveller (not a scientist) says about his experience regarding Balinese words or the local language - Malay. (TRANSCULT)
It is important to note that Marx saw class not simply as a description, but as a tool by which he could analyse the development of the capitalist system. (THEATPER)

4 Pedagogical implications and applications

With reference to the quantitative and qualitative findings discussed above, this section presents a series of ready-to-use computer-based resources to ensure that this research can be directly applied to the classroom. The first of these resources is the Bundle Finder (O’Flynn 2019) (Figure 2), a powerful but user-friendly and free-to-use tool developed using H5P. This tool can search any text for all 47 lexical bundles and return their discourse functions, enabling both teachers and learners to find examples of the bundles in context. Teachers can use the Bundle Finder to develop classroom activities and materials and set learning goals, while students can use it for independent learning activities. The Bundle Finder could, for example, be used for rhetorical consciousness-raising activities, in which students identify where a writer, be it themselves or a professional, has chosen to use a certain bundle (e.g., a stance feature) and then provide possible explanations for this rhetorical choice (Hyland 2016: 24). In many cases lists of formulaic language items are presented as static and decontextualised lists, but it is hoped that by enabling students to see the bundles functioning in the texts they read and write, they will consider the bundles a valid use of their learning effort. This is important because one of the pedagogical challenges of working with lexical bundles is a lack of face validity for some students (Byrd and Coxhead 2010). It is, however, important that both teachers and learners know how the list of bundles has been developed (Byrd and Coxhead 2010), which is why the Bundle Finder also includes information, in layman’s terms, regarding the corpus from which the list was derived and the criteria by which the bundles were selected for the list.

Fig. 2:

Arts and Humanities Bundle Finder (URL: https://h5p.org/node/438388)

The second set of materials exemplify how the findings of this study could be realised in corpus-based classroom materials. The sample materials presented below (Figure 3) were developed in H5P using the BAWE AH subcorpus and aim to demonstrate how some of the bundles could be used in DDL classroom activities. These sample materials require students to analyse the concordances not only in respect to the bundle itself, but also in respect to the dense academic language to the left and right to draw inferences about function and use in context. These materials, inspired by Johns’ (2000) Kibbitzers, do not require students to analyse a corpus directly but rather to analyse a selection of corpus data presented to them by the teacher. Materials such as these will be well suited to the majority of student writers who do not have prior knowledge of corpus management software, and could be used to introduce students to key DDL concepts, e.g., concordances. The use of corpora and concordances in the classroom by students has been increasingly investigated since it was pioneered by Johns (1991) as Data-Driven Learning (DDL). Charles (2012) found students in the Arts and Humanities disciplines to be particularly receptive to DDL, and several researchers have used corpus-based instruction to improve students’ use of formulaic language in their academic writing (see Thurstun and Candlin 1998; Wu, Witten and Franken 2010; Charles 2012; Ackerley 2017). In fact, Byrd and Coxhead (2010) specifically recommend the use of corpora in the classroom to teach lexical bundles.

Fig. 3:

DDL sample materials (URL: https://h5p.org/node/727097)

The third resource is a set of Quizlet flashcards, which can be used in a variety of games, exercises and tests (Figure 4). Quizlet and other game-based apps have been found to have a range of advantages for vocabulary learning, including a positive effect on acquisition, motivation, and autonomy (see Chien 2015; Anjaniputra and Salsabila 2018; Chen, Liu and Huang 2019; Bueno-Alastuey and Nemeth 2020). The Quizlet sets can be introduced by teachers on short pre-sessional courses and then used independently by students on a long-term basis. This is important because it can be particularly challenging for teachers to help learners improve their use of bundles during short courses of instruction (Byrd and Coxhead 2010). In fact, improving the active use of lexical bundles by learners is likely to be a much longer-term project (Cortes 2006). It is, therefore, important that learners continue their independent study of lexical bundles beyond their short courses of writing instruction and into their full courses of study in AH departments, which is something Quizlet can facilitate.

Fig. 4:

Quizlet sets (URL: https://bit.ly/3NWEwkj)

The final resource for the teaching-learning of the lexical bundles is the list itself. A printer-friendly version of the full list organised by discourse functions is included in Appendix 2. Although this list is static and presents the bundles as decontextualised items, it can be a useful resource if used appropriately. The functional list will be useful as a quick reference guide for teachers to keep a record of the bundles that have been subject to attention in classroom-based teaching-learning activities, or for students to monitor their independent learning activities. Keeping accurate records is important because the bundles will need to be revisited regularly (Byrd and Coxhead 2010). It is also important that the explicit teaching of the bundles on the list is mixed with ample opportunities for implicit learning (Hultsjin 2001), which means “ensur[ing] that students are reading academic prose so that they encounter academic vocabulary” (Byrd and Coxhead 2010: 56). Empirical studies suggest that through implicit learning and explicit instruction, L2 learners can start to develop better strategies for producing lexical bundles more frequently and appropriately in their academic writing (see Cortes 2006; Jones and Haywood 2004; Ackerley 2017).

5 Limitations

It is important to acknowledge some potential limitations of the study. One potential limitation is that the corpus includes only one genre of text produced by university students at one level and one institution. Although it would have been desirable to compile a corpus of texts produced by university students at various levels across institutions, empirical evidence from the BAWE corpus presented in this paper suggests that the list has scope beyond just the genre, level and institution from which it was derived. Another potential limitation is that the qualitative analysis was based on random samples of 100 rather than on every instance of the bundles in the corpus. As such, the functional categorisation, like others, should be treated with some level of caution and seen as a guide to facilitate teaching-learning, rather than as a definitive enumeration.

6 Conclusion

This paper has developed, discussed and presented a functional list of lexical bundles and associated resources for the teaching-learning of lexical bundles in the academic writing of the AH disciplines. At every stage of the development of the list and resources, methodological decisions were guided by pedagogical considerations. The final product is a short and powerful list of lexical bundles. The 47 bundles alone cover up to an impressive 0.9% of academic writing across genres, levels and institutions in the AH disciplines. The ready-made computer-based resources facilitate the immediate incorporation of the list into academic writing programs. It is hoped that by adding further evidence to demonstrate that bundle usage is reflective of the distinct social and rhetorical practices of different academic communities, this paper will spur more researchers and teachers to develop disciplinary lists of lexical bundles (or, for that matter, other types of formulaic language), and to incorporate their findings into their curriculums. Further studies could investigate the effects of incorporating a list of disciplinary lexical bundles into the curriculum on the accurate and appropriate production of lexical bundles in student writing.

Acknowledgements

I would like to thank Dr. Sue Wharton for her continued support and guidance throughout this research project (and various other research projects).

Appendix A

Arts and Humanities lexical bundles by frequency

Bundle	ANCHIST		ARTSHIST		ENGCOMP		FILMTV		HIST		MODLANG		RENAIS		THEAT		TRANSCULT		AH CORPUS (total)
Bundle	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW	Freq	PMW
the end of the	175	186.1	143	136.5	127	135.6	116	125.3	124	125.8	132	141.4	159	160.7	64	89.8	107	119.9	1147	137.1
at the end of	121	128.7	95	90.7	88	93.9	91	98.3	65	66.0	126	134.9	146	147.5	57	80.0	75	84.0	864	103.3
at the same time	94	100.0	79	75.4	117	124.9	83	89.6	65	66.0	128	137.1	90	90.9	41	57.5	138	154.6	835	99.8
on the other hand	121	128.7	81	77.3	50	53.4	57	61.6	29	29.4	111	118.9	125	126.3	35	49.1	109	122.1	718	85.8
in the case of	98	104.2	36	34.4	70	74.7	48	51.8	79	80.2	84	90.0	63	63.7	48	67.4	101	113.1	627	75.0
as well as the	92	97.8	81	77.3	50	53.4	55	59.4	59	59.9	63	67.5	67	67.7	44	61.7	60	67.2	571	68.3
in the context of	77	81.9	76	72.6	54	57.6	108	116.6	39	39.6	33	35.3	51	51.5	46	64.5	40	44.8	524	62.6
one of the most	48	51.0	63	60.1	34	36.3	37	40.0	31	31.5	68	72.8	79	79.8	72	101.0	54	60.5	486	58.1
the fact that the	75	79.8	52	49.6	51	54.4	51	55.1	51	51.8	39	41.8	50	50.5	26	36.5	75	84.0	470	56.2
the ways in which	28	29.8	42	40.1	57	60.8	102	110.2	35	35.5	48	51.4	46	46.5	55	77.2	32	35.8	445	53.2
the beginning of the	70	74.4	55	52.5	38	40.6	32	34.6	27	27.4	70	75.0	82	82.9	25	35.1	38	42.6	437	52.2
on the one hand	41	43.6	55	52.5	43	45.9	41	44.3	30	30.4	83	88.9	65	65.7	26	36.5	44	49.3	428	51.2
in relation to the	43	45.7	49	46.8	46	49.1	92	99.4	34	34.5	37	39.6	23	23.2	59	82.8	26	29.1	409	48.9
at the beginning of	61	64.9	46	43.9	28	29.9	34	36.7	22	22.3	79	84.6	84	84.9	24	33.7	29	32.5	407	48.7
in the form of	46	48.9	46	43.9	49	52.3	41	44.3	59	59.9	39	41.8	28	28.3	37	51.9	42	47.0	387	46.3
as a result of	112	119.1	13	12.4	31	33.1	18	19.4	53	53.8	35	37.5	20	20.2	46	64.5	42	47.0	370	44.2
the way in which	33	35.1	37	35.3	57	60.8	34	36.7	59	59.9	48	51.4	23	23.2	33	46.3	19	21.3	343	41.0
the rest of the	71	75.5	42	40.1	34	36.3	47	50.8	15	15.2	25	26.8	60	60.6	17	23.9	30	33.6	341	40.8
on the part of	40	42.5	26	24.8	21	22.4	31	33.5	62	62.9	65	69.6	21	21.2	21	29.5	23	25.8	310	37.1
the nature of the	144	153.1	25	23.9	11	11.7	13	14.0	26	26.4	13	13.9	27	27.3	23	32.3	28	31.4	310	37.1
the role of the	53	56.4	18	17.2	30	32.0	30	32.4	74	75.1	24	25.7	17	17.2	12	16.8	25	28.0	283	33.8
the use of the	65	69.1	28	26.7	14	14.9	47	50.8	22	22.3	24	25.7	20	20.2	17	23.9	44	49.3	281	33.6
the extent to which	17	18.1	25	23.9	34	36.3	23	24.8	61	61.9	43	46.0	15	15.2	19	26.7	25	28.0	262	31.3
the context of the	52	55.3	35	33.4	21	22.4	47	50.8	17	17.3	16	17.1	31	31.3	16	22.5	18	20.2	253	30.2
to the fact that	59	62.7	23	22.0	27	28.8	27	29.2	42	42.6	18	19.3	14	14.1	15	21.0	27	30.2	252	30.1
the history of the	28	29.8	23	22.0	21	22.4	22	23.8	62	62.9	14	15.0	34	34.4	8	11.2	37	41.4	249	29.8
in the same way	46	48.9	17	16.2	44	47.0	20	21.6	17	17.3	36	38.6	14	14.1	18	25.3	24	26.9	236	28.2
in terms of the	17	18.1	24	22.9	21	22.4	31	33.5	24	24.4	52	55.7	18	18.2	32	44.9	16	17.9	235	28.1
it is important to	28	29.8	13	12.4	32	34.2	40	43.2	30	30.4	26	27.8	20	20.2	25	35.1	20	22.4	234	28.0
the case of the	26	27.6	19	18.1	26	27.8	17	18.4	52	52.8	25	26.8	31	31.3	9	12.6	28	31.4	233	27.9
it is possible to	33	35.1	20	19.1	19	20.3	17	18.4	27	27.4	28	30.0	24	24.2	31	43.5	23	25.8	222	26.5
on the basis of	38	40.4	23	22.0	13	13.9	16	17.3	21	21.3	29	31.1	27	27.3	19	26.7	28	31.4	214	25.6
at the time of	35	37.2	18	17.2	18	19.2	22	23.8	15	15.2	31	33.2	45	45.5	19	26.7	10	11.2	213	25.5
in the history of	11	11.7	34	32.5	15	16.0	20	21.6	51	51.8	10	10.7	35	35.4	10	14.0	27	30.2	213	25.5
it is clear that	28	29.8	18	17.2	29	31.0	14	15.1	33	33.5	32	34.3	22	22.2	15	21.0	19	21.3	210	25.1
by the fact that	20	21.3	30	28.6	18	19.2	25	27.0	20	20.3	25	26.8	21	21.2	23	32.3	21	23.5	203	24.3
the part of the	19	20.2	21	20.0	13	13.9	19	20.5	33	33.5	36	38.6	22	22.2	20	28.1	16	17.9	199	23.8
as one of the	22	23.4	22	21.0	18	19.2	20	21.6	20	20.3	19	20.3	27	27.3	22	30.9	26	29.1	196	23.4
in the face of	13	13.8	15	14.3	27	28.8	23	24.8	24	24.4	46	49.3	15	15.2	8	11.2	15	16.8	186	22.2
towards the end of	34	36.2	33	31.5	20	21.3	18	19.4	20	20.3	19	20.3	21	21.2	11	15.4	9	10.1	185	22.1
the development of the	19	20.2	33	31.5	10	10.7	13	14.0	29	29.4	19	20.3	25	25.3	20	28.1	13	14.6	181	21.6
as part of a	20	21.3	19	18.1	11	11.7	21	22.7	34	34.5	16	17.1	14	14.1	28	39.3	10	11.2	173	20.7
the relationship between the	17	18.1	16	15.3	21	22.4	16	17.3	40	40.6	17	18.2	14	14.1	10	14.0	20	22.4	171	20.4
can be seen as	12	12.8	12	11.5	13	13.9	10	10.8	10	10.1	22	23.6	13	13.1	41	57.5	33	37.0	166	19.8
be seen as a	20	21.3	14	13.4	16	17.1	14	15.1	15	15.2	17	18.2	10	10.1	36	50.5	21	23.5	163	19.5
at the centre of	15	16.0	18	17.2	12	12.8	19	20.5	15	15.2	30	32.1	25	25.3	15	21.0	10	11.2	159	19.0
a member of the	12	12.8	30	28.6	12	12.8	13	14.0	14	14.2	10	10.7	28	28.3	13	18.2	26	29.1	158	18.9

Total	2349	2498.0	1743	1664.1	1611	1719.7	1735	1873.8	1786	1812.4	2010	2152.4	1911	1930.9	1311	1839.5	1703	1907.6	16159	1931.8

Appendix B

AH lexical bundles by function (printable)

Bundles to help writers to present and discuss content
Function	Description	Bundle	Notes/ examples
Location	Indicating time/place	the beginning of the the end of the at the time of at the beginning of at the same time at the end of towards the end of
Procedure	Indicating how or why something is done (or what something is for)	the development of the the role of the the ways in which the way in which the use of the
Quantification	Indicating the quantity or extent of something	the extent to which a member of the one of the most the rest of the as part of the
Abstract description	Indicating an abstract property of something	the history of the the nature of the
Bundles to help writers organise their text
Transition signals	Establish additive, comparative, or contrastive links between elements	in the same way the relationship between the on the other hand on the one hand as well as the
Resultative signals	Mark inferential or causative relations between elements	as a result of
Framing signals	Situate statements by specifying a context or limiting conditions	in relation to the in the face of the fact that the in the history of by the fact that to the fact that on the basis of the context of the
		the part of the in the case of in the context of on the part of in terms of the the case of the in the form of
Bundles to help writers to express attitudes
Stance features	Convey a degree of importance, certainty or possibility	can be seen as be seen as a at the centre of it is possible to it is clear that
Engagement features	Address the reader directly	it is important to

References

Ackerley, Katherine. 2017. Effects of corpus-based instruction on phraseology in learner English. Language Learning and Technology 21(3). 195–216.Suche in Google Scholar

Autelli, Erica. 2021. The origins of the term “phraseology”. Yearbook of Phraseology 12. 7–32.10.1515/phras-2021-0003Suche in Google Scholar

Ackermannn, Kirsten & Yu-Hua Chen. 2013. Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach. Journal of English for Academic Purposes 12. 235–247.10.1016/j.jeap.2013.08.002Suche in Google Scholar

Ädel, Annelie. 2006. The Use of Metadiscourse in Argumentative Texts by Advanced Learners and Native Speakers of English. Amsterdam: John Benjamins.Suche in Google Scholar

Altenberg, Bengt. 1998. On the phraseology of spoken English: The evidence of recurrent word-combinations. In Anthony P. Cowie (ed.), Phraseology: Theory, analysis, and applications. 101–122. Oxford: Clarendon Press.10.1093/oso/9780198294252.003.0005Suche in Google Scholar

Anjaniputra, Agung Ginanjar & Vina Aini Salsabila. 2018. The merits of Quizlet for vocabulary learning at tertiary level. Indonesian EFL Journal 4(2). 1–11.10.25134/ieflj.v4i2.1370Suche in Google Scholar

Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.10.1017/CBO9780511621024Suche in Google Scholar

Biber, Douglas. 1989. A Typology of English Texts. Linguistics 27(1). 3–43.10.1515/ling.1989.27.1.3Suche in Google Scholar

Biber, Douglas, Susan Conrad & Viviana Cortes. 2004. If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics 25. 371–405.10.1093/applin/25.3.371Suche in Google Scholar

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad & Edward Finegan. 1999. Longman grammar of spoken and written English. Harlow: Longman.Suche in Google Scholar

Bueno-Alastuey, M Camino & Katalin Nemeth. 2020. Quizlet and podcasts: effects on vocabulary acquisition. Computer Assisted Language Learning. 1–30.10.1080/09588221.2020.1802601Suche in Google Scholar

Bunton, David. 1999. The use of higher level metatext in PhD theses. English for Specific Purposes 18. 41–56.10.1016/S0889-4906(98)00022-2Suche in Google Scholar

Byrd, Pat & Averil Coxhead. 2010. On the other hand: Lexical bundles in academic writing and in the teaching of EAP. University of Sydney papers in TESOL 5. 31–64.Suche in Google Scholar

Charles, Maggie. 2012. ‘Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself corpus-building. English for Specific Purposes 31(2). 93–102.10.1016/j.esp.2011.12.003Suche in Google Scholar

Charles, Maggie. 2014. Getting the corpus habit: EAP students’ long-term use of personal corpora. English for Specific Purposes 35. 30–40.10.1016/j.esp.2013.11.004Suche in Google Scholar

Chen, Yu-Hua & Paul Baker. 2010. Lexical bundles in L1 and L2 academic writing. Language learning and technology 14(2). 30–49.Suche in Google Scholar

Chen, Chih-Ming, Huimei Liu & Hong-Bin Huang. 2019. Effects of a mobile game-based English vocabulary learning app on learners’ perceptions and learning performance: A case study of Taiwanese EFL learners. ReCALL 31(2). 170–188.10.1017/S0958344018000228Suche in Google Scholar

Chien, Chin-Wen. 2015. Analysis the Effectiveness of Three Online Vocabulary Flashcard Websites on L2 Learners’ Level of Lexical Knowledge. English Language Teaching 8(5). 111–121.10.5539/elt.v8n5p111Suche in Google Scholar

Chomsky, Noam. 1965. Aspects of the Theory of Syntax. Massachusetts, USA: MIT Press.10.21236/AD0616323Suche in Google Scholar

Conrad, Susan. 2011. Variation in corpora and its pedagogical implications. In Vander Viana, Sonia Zyngier & Geoff Barnbrook (eds.), Perspectives on corpus linguistics, 47–62. Amsterdam: John Benjamins.10.1075/scl.48.04conSuche in Google Scholar

Cortes, Viviana. 2004. Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes 23. 397–423.10.1016/j.esp.2003.12.001Suche in Google Scholar

Cortes, Viviana. 2006. Teaching lexical bundles in the disciplines: An example from a writing intensive history class. Linguistics and Education 17. 391–406.10.1016/j.linged.2007.02.001Suche in Google Scholar

Coxhead, Averil. 2000. A new academic word list. TESOL Quarterly, 34(2). 213–238.10.2307/3587951Suche in Google Scholar

Durrant, Philip. 2009. Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes 28(3). 157–169.10.1016/j.esp.2009.02.002Suche in Google Scholar

Durrant, Philip. 2017. Lexical bundles and disciplinary variation in university students’ writing: Mapping the territories. Applied Linguistics 38(2). 165–193.10.1093/applin/amv011Suche in Google Scholar

Frankenberg-Garcia, Ana, Robert Lew, Jonathan Roberts, Geraint Rees & Nirwan Sharma. 2019. Developing a writing assistant to help EAP writers with collocations in real time. ReCALL 31(1). 23–39.10.1017/S0958344018000150Suche in Google Scholar

Gilquin, Gaëtanelle & Sylviane Granger. 2010. How can data-driven learning be used in language teaching? In Anne O’Keeffe & Michael McCarthy (eds.), The Routledge Handbook of Corpus Linguistics, 359–370. London: Routledge.10.4324/9780203856949-26Suche in Google Scholar

Hafner, Cristoph & Christopher Candlin. 2007. Corpus tools as an affordance to learning in professional legal education. Journal of English for Academic Purposes 6(4). 303–318.10.1016/j.jeap.2007.09.005Suche in Google Scholar

Hirata, Yoko & Yoshihiro Hirata. 2018. Students’ Evaluation of SkELL: The ‘Sketch Engine for Language Learning’. Paper presented at the International Conference on Blended Learning, Osaka, July 31–2 August.10.1007/978-3-319-94505-7_30Suche in Google Scholar

Hoey, Michael. 2005. Lexical priming: A new theory of words and language. London: Routledge.Suche in Google Scholar

Horst, Marlise, Tom Cobb & Ioana Nicolae. 2005. Expanding academic vocabulary with an interactive on-line database. Language learning and technology 9(2). 90–110.Suche in Google Scholar

Hulstijn, Jan H. 2001. Intentional and incidental second language vocabulary learning: A reappraisal of elaboration, rehearsal, and automaticity. In Peter Robinson (ed.), Cognition and Second Language Instruction, 258–286. Cambridge: CUP.10.1017/CBO9781139524780.011Suche in Google Scholar

Hyland, Ken & Kevin Jiang. 2018. Academic lexical bundles: how are they changing? International Journal of Corpus Linguistics 23(4). 383–407.10.1075/ijcl.17080.hylSuche in Google Scholar

Hyland, Ken & Philip Shaw. 2016. The Routledge handbook of English for Academic Purposes. Abingdon: RoutledgeSuche in Google Scholar

Hyland, Ken. 1999. Academic attribution: citation and the construction of disciplinary knowledge. Applied Linguistics 20(3). 341–367.10.1093/applin/20.3.341Suche in Google Scholar

Hyland, Ken. 2005. Stance and engagement: a model of interaction in academic discourse. Discourse studies 7(2). 173–191.10.1177/1461445605050365Suche in Google Scholar

Hyland, Ken. 2008. As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes 27(1). 4–21.10.1016/j.esp.2007.06.001Suche in Google Scholar

Hyland, Ken. 2016. General and specific EAP. In Ken Hyland and Philip Shaw (eds.). The Routledge handbook of English for Academic Purposes, 17–43. Abingdon: Routledge.10.4324/9781315657455Suche in Google Scholar

Johns, Tim & Philip King. 1991. Classroom Concordancing. Birmingham: University of Birmingham.Suche in Google Scholar

Krishnamurthy, Ramesh & Iztok Kosem. 2007. Issues in creating a corpus for EAP pedagogy and research. Journal of English for Academic Purposes 6. 356–373.10.1016/j.jeap.2007.09.003Suche in Google Scholar

Lee, David & John Swales. 2006. A Corpus-Based EAP Course for NNS Doctoral Students: Moving from Available Specialized Corpora to Self-Compiled Corpora. International journal of corpus linguistics 11(2). 256–257.Suche in Google Scholar

Miller, Julia. 2020. The bottom line: Are idioms used in English academic speech and writing? Journal of English for Academic Purposes 43. 1–14.10.1016/j.jeap.2019.100810Suche in Google Scholar

Nesi, Hilary. 2008. BAWE: an introduction to a new resource. In Ana Frankenberg-Garcia, Tawfiq Rkibi, Maria Braga da Cruz, Ricardo Carvalho, Cristina Direito & Diogo Santos-Rosa (eds.) Proceedings of the Eighth Teaching and Language Corpora Conference, 239–46. Lisbon, Portugal: ISLA.Suche in Google Scholar

North, Sarah. 2005. Different values, different skills? A comparison of essay writing by students from arts and science backgrounds. Studies in Higher Education 30(5). 517–533.10.1080/03075070500249153Suche in Google Scholar

O’Flynn, James. 2019. Arts and Humanities Bundle Finder https://h5p.org/node/438388 (accessed 15 June 2022)Suche in Google Scholar

O’Sullivan, Íde. 2007. Enhancing a process-oriented approach to literacy and language learning: The role of corpus consultation literacy. ReCall 19(3). 269–286.10.1017/S095834400700033XSuche in Google Scholar

Sinclair, John. 1991. Corpus, concordance, collocation. Oxford: Oxford University Press.Suche in Google Scholar

Thompson, Paul. 2009. Literature reviews in applied PhD theses: Evidence and Problems. In Ken Hyland & Giuliana Diani (eds.), Academic Evaluation: Review genres in university settings, 50–67. London: Palgrave Macmillan.10.1057/9780230244290_4Suche in Google Scholar

Thurstun, Jennifer & Christopher Candlin. 1998. Concordancing and the teaching of the vocabulary of academic English. English for Specific Purposes 17(3). 267–280.10.1016/S0889-4906(97)00013-6Suche in Google Scholar

Warren, Martin. 2016. Introduction to data driven learning. In Fiona Farr & Liam Murray (eds.), The Routledge Handbook of Language Learning and Technology, 337–347. London: Routledge.10.4324/9781315657899Suche in Google Scholar

Whiteside, Karin 2016. A corpus-driven investigation into the semantic patterning of grammatical keywords in undergraduate History and PIR (Politics and International Relations) essays. Coventry: University of Warwick PhD thesis.Suche in Google Scholar

Wood, David. 2002. Formulaic language in acquisition and production: Implications for teaching. TESL Canada Journal/Revue TESL du Canada 20. 1–15.10.18806/tesl.v20i1.935Suche in Google Scholar

Wray, Alison. 2013. Formulaic language. Language Teaching 46. 316–334.10.1017/S0261444813000013Suche in Google Scholar

Wu, Shaoqun, Ian Witten & Margaret Franken. 2010. Utilizing lexical data from a Web-derived corpus to expand productive collocation knowledge. ReCALL 22(1). 83–102.10.1017/S0958344009990218Suche in Google Scholar

Published Online: 2022-12-07

Published in Print: 2022-12-16

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/phras-2022-0006

Schlagwörter für diesen Artikel

Lexical bundles; academic writing; corpus linguistics; word lists; data-driven learning

Creative Commons

BY-NC-ND 4.0