Opening open science to all: demystifying reproducibility and transparency practices in linguistic research

Joseph V. Casillas; Gabriela Constantin-Dureci; Iván Andreu Rascón; Jiawei Shao; Stephanie A. Rodríguez; Adrija Gadamsetty; Alexandria Minetti; Krishita Laungani; John Thatcher; Rhode-Taina Gardere; Katherine Taveras; Isabelle Chang; Nicole Rodríguez; Kyle Parrish; Meritxell Feliu-Ribas; Robert Esposito

doi:10.1515/ling-2023-0249

Article Open Access

Opening open science to all: demystifying reproducibility and transparency practices in linguistic research

Joseph V. Casillas , Gabriela Constantin-Dureci , Iván Andreu Rascón , Jiawei Shao , Stephanie A. Rodríguez , Adrija Gadamsetty , Alexandria Minetti , Krishita Laungani , John Thatcher , Rhode-Taina Gardere , Katherine Taveras , Isabelle Chang , Nicole Rodríguez , Kyle Parrish , Meritxell Feliu-Ribas and Robert Esposito

Published/Copyright: July 2, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Linguistics Volume 63 Issue 6

Abstract

In recent years, numerous fields of research have seen a push for increased reproducibility and transparency. As a result, specific transparency practices have emerged, such as open access publishing, preregistration, sharing data, analyses, and code, performing study replications, and declaring positionality and conflicts of interest. While many agree that open science practices represent a positive step forward in improving scientific rigor, these practices, by and large, have not been adopted in the field of linguistics (Bochynska et al. 2023. Reproducible research practices and transparency across linguistics. Glossa Psycholinguistics 2(1). 1–36). Few, if any, researchers have had explicit instruction on the practices of open science as part of their professional training. Nonetheless, today’s speech researcher is expected to be up to date on the current protocols of open science in order to incorporate the methodological practices aimed at improving reproducibility/replicability. The present work intends to help make open science practices understandable and accessible to researchers in linguistics from all backgrounds and at every stage, from students/early career researchers to senior researchers and advisors. We outline eight specific open science practices that linguists can adopt to make their research more open, transparent, inclusive, and accessible to a wider audience.

Keywords: open science; reproducibility; replicability; transparency; positionality; linguistics

1 Introduction: what is open science?

In recent years, numerous fields of research have seen a push for increased reproducibility and transparency practices. These practices, collectively, have been referred to as open science. Parsons et al. (2022) refer to open science as an umbrella term “[…] reflecting the idea that scientific knowledge of all kinds, where appropriate, should be openly accessible, transparent, rigorous, reproducible, replicable, accumulative, and inclusive, all which are considered fundamental features of the scientific endeavor” (2022: 11). As a result, specific transparency practices have emerged, such as open access publishing, preregistration, sharing data, analyses, and code, performing study replications, and declaring positionality and conflicts of interest. Though it may come as a surprise to some, these open, transparent research practices have not been the norm in empirical and quantitative sciences, despite painstaking efforts being made in recent years (e.g., Berez-Kroeker et al. 2018, 2022, among others).

To properly contextualize the need for open science, one must first consider the so-called reproducibility (replication) crisis. In the early 2010s, a team of researchers in psychology embarked on a large-scale replication project to scrutinize what many considered to be the field’s major findings. Specifically, they attempted to replicate 100 influential studies (Open Science Collaboration 2015). The endeavor produced astounding results – of note, that approximately 53 % of the major findings did not replicate – and inspired similar large-scale replication projects in other fields, yielding similar results.^[1] This series of events represents what is now referred to as the replication (or reproducibility) crisis (see also FORRT 2021). Unsurprisingly, the results generated an uproar in the psychological sciences. The alarming findings garnered media attention (e.g., Oliver 2016) and have led to periods of introspection and self-reflection in many adjacent fields, among them linguistics (e.g., Berez-Kroeker et al. 2018; Bochynska et al. 2023).

Researchers have pointed to questionable research practices (QRPs), such as p-hacking – knowingly manipulating an analysis until a significant p-value is obtained (see Head et al. 2015) – and HARKing – hypothesizing after the results are known (see Murphy and Aguinis 2019) – along with small sample sizes, poor theory, lack of transparency, misguided incentive structure in academia, etc., as factors that ultimately led to the replication crisis, though it is likely that many factors are/were simultaneously at play. For instance, the aforementioned QRPs may be an unfortunate consequence of misaligned incentive structures in academia, where publication is the universal currency. The pervasive pressure to publish likely leads many researchers to focus on quantity over quality. Couple this with the difficulty of publishing negative or null results, and the result is a research landscape in which many fields suffer from publication bias with little or no incentive to prioritize time-consuming open science practices. Taking this into account, it is not hard to understand why some researchers may turn to QRPs. While it is difficult to quantify how prevalent QRPs are in a given field, in a survey of applied linguists, Isbell et al. (2022) found that 94 % reported having engaged in one or more, and 17 % admitted to having committed some form of fraud.

In the aftermath of the aforementioned crisis, there has been a push for increased transparency and reproducible methodology to help mitigate the effects of QRPs. The clearest example of this is the Transparency and Openness Promotion Guidelines (TOP), author guidelines for journals that aim to help evaluate adherence to open science principles (see Nosek et al. 2015, as well as https://www.cos.io/initiatives/top-guidelines). The resulting methodological framework and associated techniques have reshaped research methods in psychology, and, slowly but surely, are making their way into related fields. While many agree that open science practices represent a positive step forward in improving scientific rigor, these practices, by and large, have not been adopted in the field of linguistics (Bochynska et al. 2023). One reason for the slow adoption in linguistics may be related to the fact that engaging in open science is no trivial feat. On the contrary, it often requires learning new skills, thoughtful planning, as well as an openness and willingness to share materials, code, and data. Many researchers need to implement new techniques with limited pedagogical resources and embrace alternative methods of disseminating their research, all of which can constitute a steep learning curve. That being said, what engaging in open science ultimately entails is sure to be field-specific and vary accordingly. In some disciplines, for instance, it may only involve a few of the practices we outline in the present work without the need for innovative methodologies. Nonetheless, given how new open science practices are, it is reasonable to assume that current senior researchers were not trained in these innovative methodologies. As a consequence, many early career researchers (ECR) find themselves at a crossroads in which they are forced to learn open science on their own, often without institutional support. Ironically, there is also a growing expectation that ECRs implement these novel tools in order to be successful in their programs, on the job market, or to advance in their careers.

The present work intends to both highlight and contribute to a line of research focused on making open science practices understandable and accessible to researchers in linguistics from all backgrounds and at every stage, from students/ECRs to senior researchers and advisors. We identify the following three areas of stance, workflow, and dissemination, in which linguists can engage in open science (see Figure 1). The first area, stance, refers to practices that focus on the researcher’s position or attitude towards openness and transparency. The second area, workflow, deals with methods and techniques researchers can implement to make their research projects more open and transparent. Finally, dissemination refers to novel ways in which researchers can help ensure that their research products are accessible and free from QRPs. While our coverage of these areas cannot be exhaustive, we highlight eight open science practices within these areas: positionality statements and declarations of conflict of interest, open data and materials,^[2] literate programming, reproducible code/projects, shareable computational environments, preregistration, registered reports, and pre-prints. We provide practical examples and detailed descriptions of the aforementioned practices with the goal of helping the interested linguist commence their journey of engaging in open science practices in their own research. Importantly, the present work should be considered a complement to the extant work fomenting open science practices in the speech sciences.

Figure 1:

Some open science practices amenable to research in linguistics as they pertain to one’s stance, workflow, and the dissemination of research products.

2 Stance

2.1 Positionality statements

A positionality statement is a reflective piece of writing that acknowledges a researcher’s stance/position toward a research topic, framework, and even participants. Similar to a statement of conflict of interest, a positionality statement can influence how results are interpreted (Rowe 2014). One’s positionality differs from a statement of conflict of interest in that it can also influence how research is undertaken and can encompass the researcher’s social, cultural, and personal identity, as well as their biases and assumptions (Holmes 2020). Among others, relevant personal characteristics that may be included in a positionality statement are gender and racial identity, age, sexual orientation, immigration status, and ideological stances (Berger 2015). These traits may indirectly impact research endeavors, since participants may be more willing to engage in a study if they perceive the researcher as sympathetic (De Tona 2006), or may even offer different responses based on the researcher’s perceived identity (Berger 2015). While positionality statements, due to their reflexive nature, may encompass larger pieces of writing, they can also take the form of short paragraphs that illustrate a few personal characteristics deemed relevant for the particular research endeavor. For instance, “Gabriela is a white immigrant cis-gender woman from Romania whose research focuses on how non-native speakers are ideologically framed as linguistically deficient in comparison to native speakers who are characterized by their linguistic authority and expertise”. When submitting a study for publication, the positionality statement can be included in additional materials if the word limit is a concern.

Though positionality statements have been adopted in some disciplines of the humanities and social sciences as a means to recognizing the various ways in which researchers’ backgrounds and identities may intersect with their research endeavors, they are a relatively new incorporation in the field of linguistics, appearing primarily in subfields, such as applied linguistics, linguistic anthropology, and linguistic ethnography (Bucholtz et al. 2023). Savolainen et al. (2023) draw connections between positionality statements and relatively more common statements of conflict of interest, arguing that, while researchers are required to disclose any and all financial gains associated with a research project, “positionality statements grant authors the freedom to decide which parts of their biography they choose to share and how they choose to frame it” (Savolainen et al. 2023: 1334). While statements of conflict of interest are notably underused in linguistic research (see Bochynska et al. 2023),^[3] positionality statements are likely even less common. Nonetheless, they are considered by some to be increasingly crucial components of the research process, as they increase transparency into research practices (Steltenpohl et al. 2022) and contextualize the environment in which studies take place, or, in other words, they “[define] the boundaries within which research was produced” (Jafar 2018: 1). Traditionally, positionality statements have been more prevalent in qualitative research. Our stance is that, when appropriate, they should be considered equally important in quantitative research as well. Aside from contributing to ongoing efforts to promote transparency and openness in research practices, recognizing and addressing one’s positionality can, in some instances, support a study’s quantitative validity by helping to reduce notions of bias (see Jafar 2018, for a discussion in the field of medicine).

The support and advocacy for the inclusion of positionality statements in research publications is increasing (Bucholtz et al. 2023; Jafar 2018; Steltenpohl et al. 2022). Bucholtz et al. (2023) note that considering a researcher’s positionality may be especially important in linguistic research on certain language communities, such as indigenous communities, “[…] which relies on racially minoritized communities as sources of data yet lack adequate (if any) representation of those communities among faculty researchers” (Bucholtz et al. 2023: 2). Nonetheless, others contest this practice. For example, some investigators point to the universalism of research, that is, the belief that scholarly endeavors should be assessed on their inherent merits, regardless of the status or personal identity of the person making the contribution (Savolainen et al. 2023). In addition and in opposition to Bucholtz et al. (2023), the self-identification associated with a positionality statement may also place some individuals, particularly women and individuals from marginalized groups, in a vulnerable position (Massoud 2022). Specifically, in the field of law and society, Massoud (2022) posits that the pressure to state one’s positionality can lead to increased anxiety, as well as cause readers to question the researcher’s neutrality, and, ultimately, shift the focus away from the contributions of the research.

How can one marry the aforementioned benefits of including one’s positionality with the legitimate counterpoints related to marginalized individuals? We believe researchers should only consider the option of including their positionality if they feel comfortable doing so. Roberts et al. (2020), for instance, argue against mandating one’s positionality. Some journals have started to encourage authors to include positionality statements with their submissions (e.g., the Journal of Social and Personal Relationships) as a means to show their commitment to Diversity, Equity, Inclusivity, and Belonging (DEIB) initiatives. No journals, to the best of our knowledge, require positionality statements.

In sum, we believe positionality statements can be productive in linguistic research, as they promote critical self-reflection, increase transparency, can potentially help address diversity and inclusion concerns, and may increase the validity of findings in quantitative research. By reflecting on who it is that does the research, linguistics can become a more diverse, inclusive, and transparent field. That being said, it is important to consider the impact and potential burden of disclosing positionality on marginalized researchers, particularly in collaborative research settings. In the end, regarding one’s positionality, what and how to share are fundamental considerations that cannot be overlooked by investigators, journals, publishing houses, and consumers of academic research. It is our stance that researchers should reflect on their positionality before starting a project, and, if and when it makes sense, consider including a positionality statement. For additional information and examples of positionality statements in linguistic research, the interested reader is directed to Bochynska et al. (2023), Weissler et al. (2023), and https://fosil-project.github.io/posts/positionality-statements/.

2.2 Open research data and materials

Recent efforts have pushed for researchers to make their materials (data, code, instruments, etc.) open to the public. Open data, specifically, refers to data collected for research that is freely and easily available to anybody interested in accessing it for any purpose (Open Knowledge 2023). In academic research, statements such as “data available upon request” are commonplace (see Hardwicke and Ioannidis 2018). In spite of such assurances, we now know they do not typically result in adequate sharing of research materials (Hardwicke and Ioannidis 2018; Spellman et al. 2017; Wicherts et al. 2006). Researchers are increasingly encouraged to make linguistic data open and accessible via servers. An illustrative example is the IRIS database (https://www.iris-database.org), a language sciences digital repository that is freely accessible and permits the up- and downloading of research instruments and materials. Additional efforts include open science badges – visual symbols offered by some journals (e.g., Language Learning, Language and Speech) on published articles. These badges are awarded to researchers for adhering to certain open science principles, such as sharing code, data, or preregistering a study. In arguably more extreme cases, other journals have made data sharing a requirement for publication (e.g., Applied Psycholinguistics). Nonetheless, open sharing of research materials is still the exception rather than the norm in linguistics (Bochynska et al. 2023). In this section, we provide more detail regarding the benefits of ‘openness’ and consider the specific challenges researchers face in the field of linguistics. Our primary focus is on data, but we also underscore the importance of making all research materials open.

The underlying motivation for open data is relatively straightforward, particularly in the wake of the reproducibility crisis. Though researchers may understandably hesitate to share their data, we believe understanding the benefits of open data can help alleviate many concerns. Among researchers, there can exist anxieties unrelated to technical considerations about sharing data (Stieglitz et al. 2020). Stieglitz et al. (2020), in a study investigating 995 researchers from 13 universities in Germany across various fields, found that there were anxieties about competitive pressures, such as losing the opportunity to publish again from the same data set before another researcher does. In this case, anxieties can be quelled with the knowledge that the data can be made available after all research inquiries by the original researchers have been completed. Making linguistic data freely available improves credibility in our findings, to other researchers, and the general public, and may help develop more accurate generalizations and theories (see Berez-Kroeker et al. 2022). Prohibiting or impeding access to data collected for publicly funded research is, in many cases, unethical and can be a detriment to inclusivity. Open data is fundamental for cumulative science in numerous ways. It affords third parties the opportunity to scrutinize original findings, which promotes reproducibility and reduces errors, such as those related to statistical analyses and reporting of outcomes (e.g., Roettger 2021b). Furthermore, it allows for published data to be reanalyzed in novel ways and utilized in meta-analyses. Revisiting old data sets using innovative techniques can support or contradict past narrative conclusions. For instance, using meta-analytic techniques, Casillas (2021) reexamined extant research regarding ‘compromise categories’ in early bilinguals. This line of research posits that bilingual individuals produce speech sounds intermediate to those produced by monolingual speakers of either language. By systematically reevaluating prior data and incorporating new acoustic analyses of coronal stops from early Spanish-English bilinguals, Casillas (2021) suggested that the cumulative evidence for ‘compromise’ stop categories was negligible. In lieu of intermediate phonetic categories, the study proposed early bilinguals can exhibit performance mismatches resulting from dynamic interlingual interactions. This reanalysis contradicted earlier assumptions about bilingual phonology and provided in-depth scrutiny of statistical power and evidence accumulation in bilingualism research. In short, open data is a cornerstone of scientific research in the 21st century that enables wider access to research information, which, in turn, facilitates validation, motivates replication, promotes reproducibility, and makes possible future scientific progress.

Open materials are particularly important for the field of linguistics, for all of the aforementioned reasons, and also because some linguists have described the state of the field, as far as English-language publications are concerned, as being Western, Educated, Industrialized, Rich, and Democratic (WEIRD, see Bochynska et al. 2023; Casillas et al. 2025; Faytak et al. 2024). That is to say, the majority of linguistic research appears to be concentrated on specific languages, mainly Indo-Germanic, in overrepresented communities, by privileged scholars. Making materials in linguistic research accessible to all researchers can promote participation in and with underrepresented communities. Furthermore, it can increase the study of diverse and underreported languages by affording more researchers the opportunity to interact and learn from data that would otherwise not be available to them, which, in turn, can foster a more inclusive and comprehensive understanding of the global linguistic landscape.

Having stated all the above, it is necessary to recognize that linguistics faces a unique set of challenges with regard to data, as there are a multitude of subfields, each of which potentially works with a variety of data formats. Due to such diversity, one must determine which aspects of open science are relevant to their data. For example, a neurolinguistic study investigating event related potentials (ERPs) could share raw data for transparency, as well as preprocessed data with the code used to transform the raw data and a corresponding description for facilitation of reanalysis. In another field, the creation of a corpus will benefit from open access and the use of standardized file formats; the analysis of a corpus will benefit from sharing the search queries, the analysis code, and a description of the analysis code. At the heart of these challenges are ethical concerns that must be considered with care. First and foremost, the privacy and consent of participants must be safeguarded. Linguistic data often include personal information, which can be especially difficult to anonymize. While on the surface written and behavioral data may not appear to pose as many issues as audio and video recordings, which constitute a large portion of linguistic research materials, it is imperative that one consider the sources from which all types of data are derived. As expressed by Holton et al. (2022), if we haphazardly take language to represent trivial data points and lose focus on the individual embedded within a community, as well as the values of said community, we are doomed to “dehumanize and decontextualize” it (Holton et al. 2022: 50). This is particularly true when working with minority languages and/or marginalized communities. In cases such as these, the researcher must be held accountable, not only for the anonymization of participant information, but also for respecting and upholding the specific goals and restrictions put forth by the community. This includes, but is not limited to, the use, access, and storage of all collected data. In sum, careful consideration of the priorities of the researcher and the researched, which often do not align, is paramount (for more detailed views, see Adetula et al. 2022; Holton et al. 2022; Hudley et al. 2020; Leonard 2021; Mufwene 2020; Singh et al. 2023; Tsikewa 2021, among others). In addition, generative artificial intelligence technologies, such as Large Language Models, are burgeoning. These technologies will certainly pose currently unknown challenges in the near future and may necessitate additional steps to secure the protection of sensitive data against misuse, particularly regarding adherence to the original agreement of informed consent, and, importantly, in upholding the conditions of use put forth by the stakeholders in marginalized communities.

While these challenges are substantial, we believe acceptable solutions exist in many, if not all, cases. When primary data, such as audio or video files, cannot be shared, derived data in the form of tabular files can take its place. For instance, if institutional policies prohibit the sharing of audio files, a comma-separated or tab-separated file (csv, tsv) containing the variables of interest (e.g., formant values, response times, etc.) can be made public instead. Tabular data files can be anonymized easily using arbitrary identification codes. Online data collection platforms, such as Prolific (https://www.prolific.com), typically remove identifying information by default and provide participant-specific identification numbers. In more uncommon cases in which institutional policies do not permit the sharing of derived data sets, synthetic data containing the same statistical properties can be generated and shared freely (see Quintana 2020).^[4] To quote the Directorate-General for Research & Innovation of the European Commission (2016: 4), we believe the field can follow the principle that data should be “as open as possible, as closed as necessary”.

Another substantial hurdle that cannot be overlooked revolves around the fact that researchers must learn to use new technologies to participate in open, transparent research. Making materials open and accessible is not as simple as merely uploading a data file. Ideally, researchers should include relevant information to contextualize the data set at the project level (i.e., a project-summary document), the data level (i.e., a README file explaining the data set), and the variable level (i.e., a data dictionary) (Lewis 2024). The inclusion of resources at these three levels is the optimal way for authors to provide the necessary context for an independent researcher to access and utilize their materials. Unfortunately, most publicly available materials do not adhere to this standard. For this reason, we direct the interested reader to templates provided in Lewis (2024) for documentation at the project level (https://osf.io/q6g8d, https://osf.io/d3pum), data level (https://osf.io/tk4cb), and variable level (https://osf.io/ynqcu). In addition, the reader is referred to the project-, data-, and variable-level documentation of the present project, all of which are freely available on the Open Science Framework: https://osf.io/bsu2q/.

Once the research materials have been prepared for sharing, the researcher must decide where to share them. Platforms such as Google Drive, Dropbox, etc. are not recommended because they are linked to personal accounts that may change or become unavailable over time. Free repositories designed for the purpose of sharing research materials, such as the Open Science Framework (Foster and Deardorff 2017), GitHub, etc., are preferable and can be accessed simply by sharing a link. These repositories represent stable, long-term solutions with ample storage capacity. The materials can be downloaded directly, free of any kind of payment or exchange of personal information (such as an email address) by the user. For relevant examples, we direct the interested reader to https://osf.io/zx9ky/, https://osf.io/3bmcp/, or https://github.com/RAP-group/empathy_intonation_perc. Table 1 summarizes some of the options used by researchers and describes which features are available on each platform.

Table 1:

Common data-sharing platforms and their respective features.

Platform	Long-term support	Version control	DOI assignment	Anonymous sharing	Key features
Open Science Framework (OSF)	+	+	+	+	Project management and collaboration
GitHub	+	+	Integrates with Zenodo	+ (public repositories)	Project management and collaboration, ideal for coding
GitLab	+	+	Integrates with Zenodo	Limited	Project management and collaboration, ideal for coding
Bitbucket	+	+	Integrates with Zenodo	Limited	Project management and collaboration, ideal for coding
Zenodo	+	+ (via GitHub integration)	+	+	Supports range of file types
Figshare	+	Limited	+	+	Sharing datasets and figures
Box	–	–	–	Limited	Basic file storage and sharing
Google Drive	–	–	–	Limited	Basic file storage and sharing

To summarize, open materials are important because they facilitate transparency, rigor, reproducibility, replication, accumulation of knowledge, and, importantly, they make participating in the scientific endeavor more inclusive. According to some accounts, linguistics, in general, does not engage in open science practices, including sharing research materials (see Bochynska et al. 2023), though others characterize its participation in different terms. For instance, as stated in Berez-Kroeker et al. (2018: 9) “Practitioners in different subfields ‘do transparency’ differently, and these practices could serve as models for an eventual amalgamated standard”. While linguistics does face legitimate, field-specific challenges related to non-WEIRD communities, ultimately, the benefits of open materials outnumber many of these challenges. Researchers should take the stance to share what is reasonable and ethically responsible all the while holding at the forefront the priorities of the individuals from which the materials are derived, especially regarding data from marginalized communities.

3 Workflow

Having seen the consequences from the reproducibility crisis in other fields, reproducibility must be a crucial aspect of any scientific study. Researchers must be able to provide a clear and transparent account of their findings, including the methods used to obtain them. Reproducibility can help to ensure that research results are valid, reliable, and can be used by others to build on existing knowledge. In this section, we explore the importance of reproducibility, what we know about it in the field of linguistics, and how researchers can make their code and projects more reproducible.

In general, reproducibility helps to increase the credibility of research findings and allows other researchers to verify and build on existing work. A lack of reproducibility can lead to findings that cannot be replicated, resulting in wasted resources, and, conceivably, downstream impacts on public health and policy decisions that are often grounded in funded research. For these reasons, among others, transparency in research methods is essential to ensure reproducibility, which includes not only the data collection and analysis methods, but also the code used to conduct the analysis. In linguistics there is increasing awareness of the importance of reproducibility and how a lack thereof could potentially impede advancements in linguistic theory and theories of language acquisition, in addition to having implications for education and language policy decisions based on research findings. As a consequence, many investigators are showing heightened interest in safeguarding the reproducibility of their research.

3.1 Literate programming

For quantitative research, there are several steps that researchers can take to make their code and projects more reproducible. One approach is to create reports that document the research process by including descriptions of the data, the methods used to analyze the data, and the results. This documentation can then be made publicly available and used by third parties to retrace the steps to reproduce the research findings. While better than nothing at all, a more complete approach includes the analysis code in the same document in which the very manuscript is written. This integration of analysis code and prose into a single, dynamic document is known as literate programming (Knuth 1984, 1992). Under the hood, a series of macros and functions are used to tangle the code and prose of the document into a separate file, usually a word document or a pdf, which can then be submitted for publication. Literate programming reduces the likelihood of copy and paste errors that often occur when passing the results of a statistical analysis from the analysis software to the word processing program. If the analysis changes in any way – e.g., more data is included, a different analytic strategy is applied, etc. – the document is retangled to update the output file. Currently there are several implementations of literate programming for research purposes, the most common of which are RMarkdown files (.Rmd), Quarto markdown files (.qmd), and Jupyter notebooks (.ipynb). In the case of the former two, RMarkdown and Quarto, the R package knitr (Xie 2015, 2023) tangles (also “knits” or “renders”) the output file. Jupyter notebooks require a front-end web page and a back-end kernel. The present manuscript was generated using literate programming via Quarto and is available for download on the Open Science Framework: https://osf.io/bsu2q/. Additionally, a brief tutorial in R is available at https://fosil-project.github.io/posts/literate-programming/.

3.2 Reproducible code/projects

While the implementation of literate programming into a research workflow is ideal, the gold standard is to use literate, dynamic documents in conjunction with reproducible projects. These projects include all of the data, code, and documentation necessary to reproduce the research findings, not only in a single report, but rather in many reports and/or presentations, simultaneously. This approach makes it easier for others to reproduce research findings and build on previous work because it obviates the complications involved with user-specific file paths and differing operating systems. Ideally, if the project works on one user’s computer, it should work on any computer running the same software. In this sense, any researcher could theoretically download an entire project and reproduce the analyses and reports at the click of a button. A popular choice for reproducible projects is the open source software Posit (formerly RStudio), which utilizes .Rproj files called RStudio projects. Posit has recently released a new integrated development environment (IDE) called Positron that also works at the project level, but has the added benefit of being relatively language neutral. That is to say, one can use R, Python, Julia, Stan, and a number of other programming languages within a single IDE. More information and examples of completed reproducible projects are available to the interested reader here: https://osf.io/un45x/, https://osf.io/cp9bs/, and https://fosil-project.github.io/posts/reproducible-code-projects/. Additionally, the project files of the present work, including data, code, and markdown files, are publicly available on the OSF: https://osf.io/bsu2q/.

3.3 Shareable computational environments

Exciting, new technology that facilitates open science is coming out at a rapid pace. This is excellent news for anybody interested in learning these new tools, but also creates other issues, particularly with regard to outdated software. There is no way to completely future-proof code or projects. Researchers must continually strive to maintain the reproducibility of their work. This may imply updating code and documentation as needed, and, where feasible, testing projects on different operating systems to ensure that they can be run in different environments. Dependency management tools like renv (Ushey and Wickham 2023) and targets (Landau 2021) can be helpful in future-proofing projects and ensuring reproducibility. These tools help to manage the dependencies that are necessary to run code by providing specific versions of the software used originally by the researchers. Computational reproducibility platforms like Binder, Code Ocean, and Nix can also be used to create instances of virtual environments in which projects can be reproduced online. Thus, these platforms allow researchers to share their code and data in ways that can be easily reproduced by anybody with an internet connection. As an example, the present project is also available online in a stable Code Ocean container that captures the original computational environment: https://codeocean.com/capsule/2385779/tree. The interested reader is encouraged to re-run our code and re-render our files to further their understanding of how computational reproducibility platforms work in conjunction with literate programming.

Summarizing, reproducibility is a crucial aspect of scientific research. It helps to ensure that research findings are valid, reliable, and can be used by others to build on existing knowledge. In linguistics there is increasing awareness of the importance of reproducibility, and many researchers are taking steps to improve the transparency of their research. Instances of shareable computational environments can make research projects available to anybody with an internet connection, independent of operating systems and software preferences. By creating dynamic reports using literate programming and integrating them into their projects in conjunction with dependency management tools, linguists can make their work more reproducible and accessible.

4 Dissemination

In this section, we will consider three open science innovations that are making a profound impact on how academic research is conducted, evaluated, and, ultimately, disseminated to the public. These innovations, preregistrations, registered reports, and pre-prints, were designed with the goal of reducing QRPs and publication bias.

4.1 Preregistration

A preregistration is a time-stamped document that provides comprehensive detail about a study, including, but not limited to, research questions, hypotheses, methodologies, and analytic strategies (Mellor and Nosek 2018). Preregistrations are written prior to data collection and do not undergo peer review. The depth of content detail within a preregistration spans a spectrum: in the simplest case, a preregistration can comprise merely a hypothesis or perhaps a brief description of the methods; on the other extreme, a detailed preregistration can include code, power analyses, participant exclusion criteria and beyond. In this section, we provide information regarding the various components of a preregistration, centering on their advantageous impact on linguistic research. Specifically, we focus on who might want to consider preregistrations, why they might want to do so, what content they can include, and how they can complete a preregistration for a linguistics research project.

Linguistic research is multifaceted and spans diverse areas such as phonetics, phonology, syntax, morphology, sociolinguistics, natural language processing, and conversation/discourse analysis, to name just a few. These areas range from purely theoretical to quantitative and experimental, with many falling somewhere in between. Importantly, as highlighted by Roettger (2021a), researchers are human and humans have evolved to filter the world in irrational ways, which can lead to QRPs and other problems that may affect the replicability of published research. Preregistration emerged as a powerful instrument empowering linguists to bolster the trustworthiness and credibility of their inquiries by establishing a systematic and predefined methodology. We believe the practice of preregistration extends its benefits to researchers at all levels, including students and ECRs, senior academics, and professionals alike.

Researchers face vital decisions while engaging in research, with inherent flexibility involved in the process of designing and carrying out projects, as well as in the analysis of the data and interpretation of the results (Simmons et al. 2011). This type of flexibility, termed “researcher degrees of freedom”, can have serious down-stream consequences in quantitative research, particularly in linguistics. For instance, Coretta et al. (2023) provided the same speech-production data set to different research teams and asked them to answer the same research question. They found substantial variability in both the acoustic analyses and the analytic strategies, neither of which could be explained by analysts’ prior beliefs, expertise, or the perceived quality of their analyses. Crucially, these decisions, both acoustic and analytic, impacted the teams’ answers to the research question. To provide a simple example, a researcher studying lexical stress could concentrate on distinct acoustic cues typically associated with stress, i.e., pitch, duration, and intensity. Beyond selecting acoustic cues to measure, she must also select a domain for these measurements, such as the mid-point of stressed/unstressed syllables or an average value over the entirety of the syllable. Choices such as these, i.e., the researcher degrees of freedom, can wield significant influence on subsequent outcomes. Preregistration serves the purpose of meticulously documenting these choices a priori, thus acting as a deterrent against QRPs, like HARKing or p-hacking (Wicherts et al. 2016). This is because the researcher establishes what decisions will be made, such as measurement choices and analytic strategies, before data collection commences. A benefit of including a high level of specificity in the preregistration is that it forces researchers to consider facets of their study that might usually be deferred to a later stage, e.g., specific statistical tests. This proactive approach demands more initial time investment from the researcher, but also increases the likelihood of uncovering crucial flaws in the study design.

The scope of preregistration extends to any facet of research deemed worthy of temporal documentation preceding the initiation of the study. The essential components often include research questions/hypotheses, the methodological framework, and analytic approaches. The specific elements that will comprise a preregistration can be considerably diverse, as they will depend on the specific domain within linguistics and the nuanced nature of the study in question. Consider, for instance, a psycholinguist conducting a self-paced reading study. In this context, the focus of the preregistration might include the formulation of hypotheses, as well as a complete description of the experimental paradigm. Additionally, the researcher may include a characterization of participant demographics, recruitment strategies, sample size considerations, independent variable manipulations, data transformations, and analytic strategies to test hypotheses. Importantly, not all of the aforementioned components are equally prioritized in all preregistrations. In sum, one can preregister any aspect of their research that they deem worthy of documenting a priori.

It is important to acknowledge that in many cases incorporating the entirety of these components into a preregistration represents a formidable challenge, as it front loads large portions of work that often take place after a study has begun, e.g., determining sample size, statistical models, etc. In such instances, researchers are encouraged to commence with elements they perceive as most valuable to their study. Many concerns about preregistration revolve around the potential burden of ‘extra work’. Conversely, preregistration is intended to streamline the workflow, fostering efficiency both in the short term and the long term, as it provides the researcher with complete control over the level of detail she chooses to include. The depth of preregistration directly correlates with the effort invested; the more comprehensive the preregistration, the greater the initial workload, leading to reduced effort in subsequent stages.

The Open Science Framework allows researchers to preregister a study.^[5] Since its inception, the amount of preregistrations has grown each year (see Figure 2, left panel) and the cumulative number of registrations totals over 167,413 at the time of writing this text (see Figure 2, right panel). We provide useful guides and examples of preregistrations at the following links: https://osf.io/nprgz, https://osf.io/qvjzy, and https://fosil-project.github.io/posts/preregistration/.

Figure 2:

Preregistrations on the open science framework. The left panel plots preregistrations as a function of year. The right panel plots cumulative preregistrations since 2011. Data scraped from https://osf.io/search on April 10, 2025.

4.2 Registered reports

The reproducibility crisis has drawn attention to the shortcomings of the traditional model of publishing scientific research. In the current model, researchers generate hypotheses, design studies, collect and analyze data, interpret results, and submit their findings for publication. However, this model has been criticized for lending itself to QRPs, such as p-hacking and HARKing, which can result in publication bias.

To address these issues, researchers have attempted various reforms, such as meta-analysis and preregistration. Meta-analysis is an analytic technique that combines the results of multiple studies to increase the statistical power. Preregistration, as we have seen, involves publicly registering a study’s design and methods before collecting data, to mitigate QRPs. Registered reports (RRs) represent a new publication model that conceptually combines preregistration with peer review (Nosek and Lakens 2014). Preregistration is often confused with RRs, but they differ in that preregistration is a separate step that occurs before the traditional publishing pipeline, whereas a RR is integrated into the publishing process. In this model, researchers submit a detailed proposal of their study, including their hypotheses, methods, and analyses, for review before data collection. If the proposal is accepted, the study is guaranteed publication, regardless of the results. This incentivizes rigorous methodology and reduces QRPs, as researchers cannot manipulate their analyses to obtain significant results. Figure 3 provides a side-by-side comparison of the standard publishing model and RRs.

Figure 3:

A comparison flow chart of the standard publication model and registered reports.

RRs were first introduced in 2013 by the Center for Open Science (COS), and have since been adopted by many journals across various fields, including psychology, neuroscience, and medicine. In 2019, there were approximately 156 journals offering RRs. This number has jumped to 318 at the time of writing this manuscript, an increase of 104 %. Of those 318, only 14 are journals related to language or linguistics. Table 2 lists the language/linguistics journals along with information regarding relevant restrictions.

Table 2:

Journals related to “language” or “linguistics” that include registered reports as a possible article submission type. The data were retrieved from the Open Science Framework on April 10, 2025. The complete list of journals is freely available at https://www.cos.io/initiatives/registered-reports. Empty cells indicate missing/unavailable data and TBA implies that a pending decision is ‘to be announced’.

Journal	Permanence	Permits replication studies	Permits meta-analytic studies	Permits use of existing data	Requires data deposition
Bilingualism: Language and Cognition	Indefinite	✓			✓
Biolinguistics	Indefinite	✓			✓
Cognitive Linguistics	Indefinite	✓			✓
Glossa Psycholinguistics	Indefinite	TBA	TBA	TBA	TBA
Journal of Child Language	Indefinite	✓
Journal of Memory and Language	Special issue	✓
Journal of Speech, Language, and Hearing Research
Language and Cognition
Language and Speech	Indefinite	✓		✓	✓
Language Learning	Indefinite	✓	✓	✓	✓
Linguistics	Indefinite	✓			✓
Neurobiology of Language	Indefinite	TBA	TBA	TBA	TBA
Second Language Research
Journal of Memory and Language	Special issue	✓

The majority of the listed journals plan to offer RRs as a possible submission option indefinitely (n = 9). Likewise, nine of these 14 journals permit RRs as an option for replication studies. Only one journal (Language Learning) specifically states that it will consider RRs that plan to conduct meta-analyses, and two of the journals consider RRs as an option for studies that propose analyzing data sets that already exist. Finally, for six of the 14 journals, a public data deposition is a requirement for RRs.^[6]

RRs cannot solve all the problems with the current model, but they can help reduce QRPs and increase transparency in scientific research. RRs are gaining popularity, but some fields, such as linguistics, have been slow to adopt them. RRs may particularly benefit ECRs, who can use them to increase their chances of publication and build a reputation for rigor. However, more senior researchers may be resistant to change and may need to be convinced of the benefits of RRs for the field as a whole. In sum, registered reports represent a promising new model for publishing scientific research that can help reduce QRPs and increase transparency. As more journals adopt RRs, the scientific community can move towards a more rigorous and trustworthy publishing model.

4.3 Pre-prints

A pre-print is a version of a research article, open and accessible, that has not yet undergone peer review but is publicly available online, through a pre-print server. The general process consists of an initial screening process, followed by a posting of the manuscript on the pre-print server within a few days of submission, bypassing peer review, and making the research findings freely accessible online (Puebla et al. 2021). Pre-prints allow researchers to share their findings with the scientific community and get feedback before their work is published in a traditional academic journal. This process can speed up the dissemination of knowledge and facilitate collaboration between researchers.

One of the primary benefits of pre-prints is that they allow researchers to share their findings quickly and easily. This can be especially important in fields where research moves quickly, such as biology or computer science. Pre-prints also allow researchers to receive feedback on their work from their peers, which can help to improve the quality of their research. The provision of commentary and reviews of pre-prints yields benefits, not only to the authors, but also to reviewers, journals, publishers, and the readership. This inclusive process allows more researchers and reviewers to participate in discussing the research findings and can reduce the need for repeated rounds of re-review or extensive revisions. Recognizing these benefits, more major publishers have either launched pre-print platforms or entered partnerships over the past five to seven years, allowing pre-prints to be incorporated into the workflow (Puebla et al. 2021). By making research findings available to the public before peer review, pre-prints not only improve the accuracy and reliability of research findings, but also encourage collaborative efforts to identify potential errors, refine methodologies, and accelerate knowledge dissemination.

Another benefit of pre-prints is that they can help to reduce publication bias, a widespread challenge in traditional publishing. Publication bias occurs when positive results are more likely to be published than negative results (Matosin et al. 2014). This can skew the scientific literature and lead to a misunderstanding of the state of the research. Pre-prints address this obstacle by openly sharing all research findings, regardless of outcome, creating a fairer and more accurate representation of the current scientific landscape of that field.

Pre-prints have become increasingly popular in recent years, particularly in fields such as biology, physics, and computer science. The adoption of pre-prints has been slower in some fields, such as the social sciences and humanities, but this is changing as more researchers become aware of the benefits of open science, and new national and regional platforms by open science advocates continue to emerge (Gawne and Styles 2022). Figure 4 illustrates the growth of pre-prints on the Open Science Framework since 2016.

Figure 4:

Pre-prints on the open science framework. The left panel plots pre-prints as a function of year. The right panel plots cumulative pre-prints since 2016. Data scraped from https://osf.io/search on April 10, 2025.

Despite the clear benefits, some researchers remain hesitant to use pre-prints. One concern is that publishing a pre-print may harm their chances of being published in a traditional academic journal. However, this concern is becoming less relevant as more journals are accepting pre-prints as a legitimate form of publication. According to Liu and De Cat (2021), who conducted a survey asking as to the barriers in sharing pre-prints, the following were raised as additional barriers: peer review, journal policy, lack of knowledge of the process, confidentiality issues, data types, utility of sharing pre-prints, time constraints, and issues in pre-print management. Fortunately, researchers interested in making a pre-print publicly available will find the process to be quite simple. One must first select a pre-print server that aligns with the course of research (see Table 3). Next, the pre-print is likely to undergo a short screening, confirming author background, basic research content, and compliance with the ethical standards of the pre-print platform. Once the pre-print passes the screening process, the content is made available online in open access format.

Table 3:

Available pre-print servers related to language and/or linguistics.

Server	Discipline(s)	Year created	URL
arXiv	Multidisciplinary (includes computational linguistics, NLP)	1991	https://arxiv.org/
Cogprints	Multidisciplinary (includes cognitive sciences and linguistics)	1995	https://web-archive.southampton.ac.uk/cogprints.org/
SciELO Preprints	Research pertinent to Latin America, Spain, Portugal and South Africa	1998	https://preprints.scielo.org/index.php/scielo/preprints
HAL (Hyper Articles en Ligne)	Multidisciplinary (includes language-specific French linguistics)	2001	https://hal.science/
ACL Anthology	Computational linguistics and NLP	2004	https://aclanthology.org/
LingBuzz	General linguistics	2006	https://ling.auf.net/lingbuzz
Open Science Framework	Multidisciplinary (includes linguistics)	2011	https://osf.io/preprints
Computational Linguistics Open Archive (CLARIN)	Language-based research	2012	https://www.clarin.eu/
PsyArXiv	Psychology, cognitive sciences, psycholinguistics, linguistics	2016	https://osf.io/preprints/psyarxiv
SocArXiv	Social sciences (includes sociolinguistics)	2016	https://osf.io/preprints/socarxiv
EdArXiv	Education research (includes applied linguistics)	2018	https://osf.io/preprints/edarxiv

The growing visibility of pre-prints, and their acceptance as valid research outputs by diverse stakeholders, including researchers, funders, and national institutions, has fueled collaborative research efforts and strengthened support for their presence in a variety of research disciplines. Pre-prints play an important role in advancing the tenets of open science by promoting transparency, reproducibility, and collaboration. While some researchers may still be hesitant to use this dissemination paradigm, the benefits of open science are becoming increasingly clear. By embracing pre-prints, linguists can accelerate the dissemination of knowledge, improve the quality of research, and ensure that their findings are available to the widest possible audience.

5 Concluding remarks

The early 2010s saw the reproducibility crisis take hold of the psychological sciences. As a consequence, there has been a push for increased transparency and reproducible methodology to help mitigate the effects of questionable research practices. The resulting methodological framework and associated techniques, now referred to as open science, have reshaped research methods in psychology and have slowly but surely made their way into adjacent fields, such as linguistics. While open science provides novel techniques and integrates state-of-the-art innovations, it also comes with challenges, particularly with regard to the steep learning curve researchers face when learning these new methods. We advocate for the “buffet” approach, in which select open science practices are integrated into the researcher’s workflow slowly over time (e.g., Bergmann 2018). We have provided descriptions and relevant examples of these practices to accompany the many guides already available for learning open science (e.g., Crüwell et al. 2018; Lewis 2020, https://FOSIL-project.github.io, https://book.fosteropenscience.eu/, among many others).

Crucially, the purpose of this article is to help foster open science in linguistics. Important considerations often overlooked in the wake of the open science movement deal with (1) how linguists actually learn open science practices and (2) how senior researchers can train the next generation of linguists. Few, if any, researchers have had explicit instruction on the practices of open science as part of their professional training. Nonetheless, today’s speech researcher is expected to be up to date on the current protocols of open science in order to incorporate the methodological practices aimed at improving reproducibility/replicability. What does it mean for the field? We believe that researchers – linguists specifically – have to adapt and learn the new methods of open science. Additionally, we must, as a field, concentrate our efforts to train current students/ECRs in open, transparent research practices, and linguistic journals must adapt to new models of publishing. In the present work we have outlined nine specific open science practices, classified into three areas – stance, workflow, and dissemination – that researchers in linguistics can adopt to make their research more open, transparent, inclusive, and accessible to a wider audience.

Corresponding author: Joseph V. Casillas, Department of Spanish and Portuguese, Rutgers University, 15 Seminary Place, New Brunswick, NJ, 08904, USA, E-mail: joseph.casillas@rutgers.edu

Acknowledgements

We are grateful for the comments and suggestions provided by three anonymous reviewers. This work was much improved because of their diligent insights. All errors are ours alone.

Author contributions: The authors made the following contributions: JVC: conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, validation, visualization, writing – original draft, writing – review & editing; GCD: conceptualization, writing – original draft, writing – review & editing; IRA: conceptualization, data curation, project administration, writing – original draft, Writing – review & editing; JS: conceptualization, project administration, writing – original draft, writing – review & editing; SR: writing – original draft, Writing – review & editing; AG: writing – original draft; AM: writing – original draft; KL: writing – original draft; JT: writing – original draft; RTG: writing – original draft; KT: writing – original draft; IC: writing – original draft; NR: conceptualization, writing – original draft, writing – review & editing; KP: conceptualization, project administration, writing – original draft, writing – review & editing; MFR: writing – review & editing; RE: conceptualization, writing – review & editing.

References

Adetula, Adeyemi, Patrick S. Forscher, Dana Basnight-Brown, Soufian Azouaghe & Hans Ijzerman. 2022. Psychology should generalize from – not just to – Africa. Nature Reviews Psychology 1(7). 370–371. https://doi.org/10.1038/s44159-022-00070-y.Search in Google Scholar

Berez-Kroeker, Andrea L., Lauren Gawne, Susan Smythe Kung, Barbara F. Kelly, Heston Tyler, Gary Holton, Peter Pulsifer, David I. Beaver, Shobhana Chelliah, Stanley Dubinsky, Richard P. Meier, Nick Thieberger, Keren Rice & Anthony C. Woodbury. 2018. Reproducible research in linguistics: A position statement on data citation and attribution in our field. Linguistics 56(1). 1–18. https://doi.org/10.1515/ling-2017-0032.Search in Google Scholar

Berez-Kroeker, Andrea L., Bradley McDonnell, Eve Koller & Lauren B. Collister (eds.). 2022. The open handbook of linguistic data management. Cambridge, MA: The MIT Press.10.7551/mitpress/12200.001.0001Search in Google Scholar

Berger, Roni. 2015. Now I see it, now I don’t: Researcher’s position and reflexivity in qualitative research. Qualitative Research 15(2). 219–234. https://doi.org/10.1177/14687941124684.Search in Google Scholar

Bergmann, C. 2018. How to integrate open science into language acquisition research? In The 43rd annual Boston university Conference on language development (BUCLD 43). Boston, USA.Search in Google Scholar

Bochynska, Agata, Liam Keeble, Caitlin Halfacre, Joseph V. Casillas, Irys-Amélie Champagne, Kaidi Chen, Melanie Röthlisberger, Erin M. Buchanan & Timo B. Roettger. 2023. Reproducible research practices and transparency across linguistics. Glossa Psycholinguistics 2(1). 1–36. https://doi.org/10.5070/G6011239.Search in Google Scholar

Bucholtz, Mary, Eric W. Campbell, Teresa Cevallos, Veronica Cruz, Alexia Z. Fawcett, Bethany Guerrero, Katie Lydon, Inî G. Mendoza, Simon L. Peters & Griselda Reyes Basurto. 2023. Researcher positionality in linguistics: Lessons from undergraduate experiences in community-centered collaborative research. Language and Linguistics Compass 17(4). 1–15. https://doi.org/10.1111/lnc3.12495.Search in Google Scholar

Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Johan Almenberg, Altmejd Adam, Taizan Chan, Emma Heikensten, Felix Holzmeister, Taisuke Imai, Siri Isaksson, Gideon Nave, Thomas Pfeiffer, Michael Razen & Hang Wu. 2016. Evaluating replicability of laboratory experiments in economics. Science 351(6280). 1433–1436. https://doi.org/10.1126/science.aaf0918.Search in Google Scholar

Camerer, Colin F., Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A. Nosek, Thomas Pfeiffer, Adam Altmejd, Nick Buttrick, Taizan Chan, Yiling Chen, Eskil Forsell, Anup Gampa, Emma Heikensten, Lily Hummer, Taisuke Imai, Siri Isaksson, Dylan Manfredi, Julia Rose, Eric-Jan Wagenmakers & Hang Wu. 2018. Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nature Human Behaviour 2(9). 637–644. https://doi.org/10.1038/s41562-018-0399-z.Search in Google Scholar

Casillas, Joseph V. 2021. Interlingual interactions elicit performance mismatches not “compromise” categories in early bilinguals: Evidence from meta-analysis and coronal stops. Languages 6(1). 9. https://doi.org/10.3390/languages6010009.Search in Google Scholar

Casillas, J. V., C. Nagle, M. Baese-Berk & M. Amengual. 2025. Sound communities: A quantitative proposal for studying bilingual speech. https://doi.org/10.31234/osf.io/m67tx_v2.Search in Google Scholar

Center for Open Science. 2024. Registered reports. https://www.cos.io/initiatives/registered-reports (accessed 12 December 2024).Search in Google Scholar

Coretta, Stafano, Joseph V. Casillas, Simon Roessig, Michael Franke, Byron Ahn, Ali H. Al-Hoorie, Jalal Al-Tamimi, Najd E. Alotaibi, Mohammed K. AlShakhori, Ruth M. Altmiller, Pablo Arantes, Angeliki Athanasopoulou, Melissa M. Baese-Berk, George Bailey, Cheman Baira A. Sangma, Eleonora J. Beier, Gabriela M. Benavides, Nicole Benker, Emelia P. Benson Meyer, Nina R. Benway, Grant M. Berry, Liwen Bing, Christina Bjorndahl, Mariška Bolyanatz, Aaron Braver, Violet A. Brown, Alicia M. Brown, Alejna Brugos, Erin M. Buchanan, Tanna Butlin, Andrés Buxó-Lugo, Coline Caillol, Francesco Cangemi, Christopher Carignan, Sita Carraturo, Tiphaine Caudrelier, Eleanor Chodroff, Michelle Cohn, Johanna Cronenberg, Olivier Crouzet, Erica L. Dagar, Charlotte Dawson, Carissa A. Diantoro, Marie Dokovova, Shiloh Drake, Fengting Du, Margaux Dubuis, Florent Duême, Matthew Durward, Ander Egurtzegi, Mahmoud M. Elsherif, Janina Esser, Emmanuel Ferragne, Fernanda Ferreira, Lauren K. Fink, Sara Finley, Kurtis Foster, Foulkes Paul, Rosa Franzke, Gabriel Frazer-McKee, Robert Fromont, Christina García, Jason Geller, Camille L. Grasso, Pia Greca, Martine Grice, Magdalena S. Grose-Hodge, Amelia J. Gully, Caitlin Halfacre, Ivy Hauser, Jen Hay, Robert Haywood, Sam Hellmuth, Allison I. Hilger, Nicole Holliday, Damar Hoogland, Yaqian Huang, Hughes Vincent, Ane Icardo Isasa, Zlatomira G. Ilchovska, Hae-Sung Jeon, Jacq Jones, Mágat N. Junges, Stephanie Kaefer, Constantijn Kaland, Matthew C. Kelley, Niamh E. Kelly, Thomas Kettig, Ghada Khattab, Ruud Koolen, Emiel Krahmer, Dorota Krajewska, Andreas Krug, Abhilasha A. Kumar, Anna Lander, Tomas O. Lentz, Wanyin Li, Yanyu Li, Maria Lialiou, Ronaldo M. Lima Jr, Justin J. H. Lo, Julio Cesar Lopez Otero, Bradley Mackay, Bethany MacLeod, Mel Mallard, Carol-Ann Mary McConnellogue, George Moroz, Mridhula Murali, Ladislas Nalborczyk, Filip Nenadić, Jessica Nieder, Dušan Nikolić, Francisco G. S. Nogueira, Heather M. Offerman, Elisa Passoni, Maud Pélissier, Scott J. Perry, Alexandra M. Pfiffner, Michael Proctor, Ryan Rhodes, Nicole Rodríguez, Elizabeth Roepke, Jan P. Röer, Lucia Sbacco, Rebecca Scarborough, Felix Schaeffler, Erik Schleef, Dominic Schmitz, Alexander Shiryaev, Márton Sóskuthy, Malin Spaniol, Joseph A. Stanley, Alyssa Strickler, Alessandro Tavano, Fabian Tomaschek, Benjamin V. Tucker, Rory Turnbull, Kingsley O. Ugwuanyi, Iñigo Urrestarazu-Porta, Ruben van de Vijver, Kristin J. Van Engen, Emiel van Miltenburg, Bruce Xiao Wang, Natasha Warner, Simon Wehrle, Hans Westerbeek, Seth Wiener, Stephen Winters, Sidney G.-J. Wong, Anna Wood, Jane Wottawa, Chenzi Xu, Germán Zárate-Sández, Georgia Zellou, Cong Zhang, Jian Zhu & Timo B. Roettger. 2023. Multidimensional signals and analytic flexibility: Estimating degrees of freedom in human-speech analyses. Advances in Methods and Practices in Psychological Science 6(3). 1–29. https://doi.org/10.1177/25152459231162567.Search in Google Scholar

Cristea, Ioana-Alina & John P. A. Ioannidis. 2018. Improving disclosure of financial conflicts of interest for research on psychosocial interventions. JAMA Psychiatry 75(6). 541–542. https://doi.org/10.1001/jamapsychiatry.2018.0382.Search in Google Scholar

Crüwell, Sophia, Johnny van Doorn, Alexander Etz, Matthew C. Makel, Hannah Moshontz, Jesse C. Niebaum, Amy Orben, Sam Parsons & Michael Schulte-Mecklenbeck. 2018. 7 easy steps to open science: An annotated reading list. Zeitschrift für Psychologie 227(4). 237–248. https://doi.org/10.1027/2151-2604/a000387.Search in Google Scholar

De Tona, C. 2006. But what is interesting is the story of why and how migration happened. Forum for Qualitative Social Research 7(3). 1–12. https://doi.org/10.17169/fqs-7.3.143.Search in Google Scholar

Errington, Timothy M., Maya Mathur, Courtney K. Soderberg, Alexandria Denis, Nicole Perfito, Elizabeth Iorns & Brian A. Nosek. 2021. Investigating the replicability of preclinical cancer biology. Elife 10. e71601. https://doi.org/10.7554/eLife.71601.Search in Google Scholar

European Commission, Directorate-General for Research & Innovation. 2016. H2020 programme: Guidelines on FAIR data management in Horizon 2020, version 3.0. Luxembourg: European Commission, Directorate-General for Research & Innovation.Search in Google Scholar

Faytak, Matthew, Šárka Kadavá, Chenzi Xu, Onur Özsoy, Pius W. Akumbu, Amanda Cardoso, Mark Amengual, Amalia Arvaniti, Malte Belz, Dorotea Bevivino, Joseph V. Casillas, Tiphaine Caudrelier, Aleksandra Ćwiek, Maria Cairney, Indranil Dutta, Ander Egurtzegi, Hadley Forst, Paul Foulkes, Rowena Garcia, Martine Grice, Adriana Hanulikova, Sam Hellmuth, Kamil Kaźmierski, Li Xiang, Janne Lorenzen, Miki Mori, Jennifer Nycz, Reenu Punnoose, Leticia Quesada Vázquez, Teja Rebernik, Brygida Sawicka-Stępińska, Zed Sevcikova Sehyr, Jane Setter, Malin Spaniol, Inigo Urrestarazu-Porta, Alexandra Vella, Cong Zhang, Marzena Zygis, Erin Michelle Buchanan & Timo B. Roettger. 2024. Big team science for language science: Opportunities and challenges. In Open science framework. osf.io/3pkj6.Search in Google Scholar

FORRT. 2021. Reproducibility crisis (a.k.a. Replicability or replication crisis). https://forrt.org/glossary/english/reproducibility_crisis/.Search in Google Scholar

Foster, Erin D. & Ariel Deardorff. 2017. Open science framework (OSF). Journal of the Medical Library Association: JMLA 105(2). 203. https://doi.org/10.5195/jmla.2017.88.Search in Google Scholar

Gawne, Lauren & Suzy Styles. 2022. Situating linguistics in the social science data movement. In Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller & Lauren B. Collister (eds.), The open handbook of linguistic data management, 9–25. Cambridge, MA: The MIT Press.Search in Google Scholar

Hardwicke, Tom E. & John P. A. Ioannidis. 2018. Populating the data ark: An attempt to retrieve, preserve, and liberate data from the most highly-cited psychology and psychiatry articles. PLoS One 13(8). 1–12. https://doi.org/10.1371/journal.pone.0201856.Search in Google Scholar

Hardwicke, Tom E., Robert T. Thibault, Jessica E. Kosie, Joshua D. Wallach, Mallory C. Kidwell & John P. A. Ioannidis. 2022. Estimating the prevalence of transparency and reproducibility-related research practices in psychology (2014–2017). Perspectives on Psychological Science 17(1). 239–251. https://doi.org/10.1177/1745691620979806.Search in Google Scholar

Hardwicke, Tom E., Joshua D. Wallach, Mallory C. Kidwell, Theiss Bendixen, Sophia Crüwell & John P. A. Ioannidis. 2020. An empirical assessment of transparency and reproducibility-related research practices in the social sciences (2014–2017). Royal Society Open Science 7(2). 1–10. https://doi.org/10.1098/rsos.190806.Search in Google Scholar

Head, Megan L., Luke Holman, Rob Lanfear, Andrew T. Kahn & Michael D. Jennions. 2015. The extent and consequences of p-hacking in science. PLoS Biology 13(3). 1–15. https://doi.org/10.1371/journal.pbio.1002106.Search in Google Scholar

Holmes, Andrew Gary Darwin. 2020. Researcher positionality – a consideration of its influence and place in qualitative research – a new researcher guide. Shanlax International Journal of Education 8(4). 1–10. https://doi.org/10.34293/education.v8i4.3232.Search in Google Scholar

Holton, Gary, Wesley Y. Leonard & Peter L. Pulisifer. 2022. Indigenous peoples, ethics, and linguistic data. In Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller & Lauren B. Collister (eds.), The open handbook of linguistic data management, 51–60. Cambridge, MA: The MIT Press.Search in Google Scholar

Hudley, Anne H. Charity, Christine Mallinson & Mary Bucholtz. 2020. Toward racial justice in linguistics: Interdisciplinary insights into theorizing race in the discipline and diversifying the profession. Language 96(4). e200–e235. https://doi.org/10.1353/lan.2020.0074.Search in Google Scholar

Isbell, Daniel R., Dan Brown, Meishan Chen, Deirdre J. Derrick, Romy Ghanem, María Nelly Gutiérrez Arvizu, Erin Schnur, Meixiu Zhang & Plonsky Luke. 2022. Misconduct and questionable research practices: The ethics of quantitative data handling and reporting in applied linguistics. The Modern Language Journal 106(1). 172–195. https://doi.org/10.1111/modl.12760.Search in Google Scholar

Jafar, Anisa J. N. 2018. What is positionality and should it be expressed in quantitative studies? Emergency Medicine Journal 35(5). 323–324. https://doi.org/10.1136/emermed-2017-207158.Search in Google Scholar

Knuth, Donald E. 1984. Literate programming. The Computer Journal 27(2). 97–111. https://doi.org/10.1093/comjnl/27.2.97.Search in Google Scholar

Knuth, Donald E. 1992. Literate programming. Stanford, CA: Center for the Study of Language and Information.Search in Google Scholar

Landau, William Michael. 2021. The targets R package: A dynamic make-like function-oriented pipeline toolkit for reproducibility and high-performance computing. Journal of Open Source Software 6(57). 2959. https://doi.org/10.21105/joss.02959.Search in Google Scholar

Leonard, Wesley Y. 2021. Centering indigenous ways of knowing in collaborative language work. In Lisa Crowshow, Inge Genee, Mahaliah Peddle, Joslin Smith & Conor Snoek (eds.), Sustaining indigenous languages: Connecting communities, teachers, and scholars, 21–34. Athabasca, AB: Athabasca University Press.Search in Google Scholar

Lewis, Crystal. 2024. Data management in large-scale education research. New York: CRC Press.10.1201/9781032622835Search in Google Scholar

Lewis, Neil A. 2020. Open communication science: A primer on why and some recommendations for how. Communication Methods and Measures 14(2). 71–82. https://doi.org/10.1080/19312458.2019.1685660.Search in Google Scholar

Liu, Meng & Cecile De Cat. 2021. Open science in applied linguistics: A preliminary survey. In Luke Plonsky (ed.), Open science in applied linguistics, 1–28. Amsterdam: John Benjamins.10.31219/osf.io/kuf26Search in Google Scholar

Massoud, Mark Fathi. 2022. The price of positionality: Assessing the benefits and burdens of self-identification in research methods. Journal of Law and Society 49. S64–S86. https://doi.org/10.1111/jols.12372.Search in Google Scholar

Matosin, Natalie, Elisabeth Frank, Martin Engel, Jeremy S. Lum & Kelly A. Newell. 2014. Negativity towards negative results: A discussion of the disconnect between scientific worth and scientific culture. Disease Models & Mechanisms 7(2). 171–173. https://doi.org/10.1242/dmm.015123.Search in Google Scholar

Mellor, David T. & Brian A. Nosek. 2018. Easy preregistration will benefit any research. Nature Human Behaviour 2(28). 98. https://doi.org/10.1038/s41562-018-0294-7.Search in Google Scholar

Mufwene, Salikoko S. 2020. Decolonial linguistics as paradigm shift. In Ana Deumert, Anne Storch & Nick Shephard (eds.), Colonial and decolonial linguistics: Knowledges and epistemes, 289–300. Oxford: Oxford University Press.10.1093/oso/9780198793205.003.0018Search in Google Scholar

Murphy, Kevin R. & Herman Aguinis. 2019. HARKing: How badly can cherry-picking and question trolling produce bias in published results? Journal of Business and Psychology 34. 1–17. https://doi.org/10.1007/s10869-017-9524-7.Search in Google Scholar

Nosek, Brian A., Alter George, George C. Banks, Denny Borsboom, Sara D. Bowman, Steven Breckler, Stuart Buck, Christopher Chambers, Gilbert Chin, Garret Christensen, Monica Contestabile, Allan Dafoe, Eric Eich, Jeremy Freese, Rachel Glennerster, Daniel Goroff, Donald P. Green, Brad Hesse, Macartan Humphreys, John Ishiyama, Dean Karlan, Alan Kraut, Arthur Lupia, Patricia Mabry, Temina Madon, Neil Malhotra, Evan Mayo Wilson, Marcia McNutt, Miguel Edward, Elizabeth Levy Paluck, Uri Simonsohn, Courtney Soderberg, Barbara A. Spellman, Joanne Tornow, Turitto James, Vanden Bos Gary, Vazire Simine, E. J. Wagenmakers, Rick Wilson & Yarkoni Tal. 2015. Promoting an open research culture. Science 348(6242). 1422–1425. https://doi.org/10.1126/science.aab237.Search in Google Scholar

Nosek, Brian A. & Daniël Lakens. 2014. Registered reports: A method to increase the credibility of published results. Social Psychology 45(3). 137–141. https://doi.org/10.1027/1864-9335/a000192.Search in Google Scholar

Oliver, John. 2016. Scientific studies: Last week tonight with John Oliver. https://youtu.be/0Rnq1NpHdmw?si=6tIMWkEbOY47rhaE.Search in Google Scholar

Open Knowledge. 2023. The open definition. https://opendefinition.org.Search in Google Scholar

Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science 349(6251). aac4716. https://doi.org/10.1126/science.aac4716.Search in Google Scholar

Parsons, Sam, Flávio Azevedo, Mahmoud M. Elsherif, Samuel Guay, Owen N. Shahim, Gisela H. Govaart, Emma Norris, Aoife O’Mahony, Adam J. Parker, Ana Todorovic, Charlotte R. Pennington, Elias Garcia-Pelegrin, Aleksandra Lazić, Olly Robertson, Sara L. Middleton, Beatrice Valentini, Joanne McCuaig, Bradley J. Baker, Elizabeth Collins, Adrien A. Fillon, Tina B. Lonsdorf, Michele C. Lim, Norbert Vanek, Marton Kovacs, Timo B. Roettger, Sonia Rishi, Jacob F. Miranda, Matt Jaquiery, Suzanne L. K. Stewart, Valeria Agostini, Andrew J. Stewart, Kamil Izydorczak, Sarah Ashcroft-Jones, Helena Hartmann, Madeleine Ingham, Yuki Yamada, Martin R. Vasilev, Filip Dechterenko, Nihan Albayrak-Aydemir, Yu-Fang Yang, Annalise A. LaPlume, Julia K. Wolska, Emma L. Henderson, Mirela Zaneva, Benjamin G. Farrar, Ross Mounce, Tamara Kalandadze, Wanyin Li, Qinyu Xiao, Robert M. Ross, Siu Kit Yeung, Meng Liu, Micah L. Vandegrift, Zoltan Kekecs, Marta K. Topor, Myriam A. Baum, Emily A. Williams, Asma A. Assaneea, Amélie Bret, Aidan G. Cashin, Nick Ballou, Tsvetomira Dumbalska, Bettina M. J. Kern, Claire R. Melia, Beatrix Arendt, Gerald H. Vineyard, Jade S. Pickering, Thomas R. Evans, Catherine Laverty, Eliza A. Woodward, David Moreau, Dominique G. Roche, Eike M. Rinke, Graham Reid, Eduardo Garcia-Garzon, Steven Verheyen, Halil E. Kocalar, Ashley R. Blake, Jamie P. Cockcroft, Leticia Micheli, Brice Beffara Bret, Zoe M. Flack, Barnabas Szaszi, Markus Weinmann, Oscar Lecuona, Birgit Schmidt, William X. Ngiam, Ana Barbosa Mendes, Francis Shannon, Brett J. Gall, Mariella Paul, Connor T. Keating, Magdalena Grose-Hodge, James E. Bartlett, Bethan J. Iley, Lisa Spitzer, Madeleine Pownall, Christopher J. Graham, Tobias Wingen, Jenny Terry, Catia Margarida F. Oliveira, Ryan A. Millager, Kerry J. Fox, Alaa AlDoh, Alexander Hart, Olmo R. van den Akker, Gilad Feldman, Dominik A. Kiersz, Christina Pomareda, Kai Krautter, Ali H. Al-Hoorie & Balazs Aczel. 2022. A community-sourced glossary of open scholarship terms. Nature Human Behaviour 6(3). 312–318. https://doi.org/10.1038/s41562-021-01269-4.Search in Google Scholar

Puebla, I., J. Polka & O. Y. Rieger. 2021. Preprints: Their evolving role in science communication. https://doi.org/10.31222/osf.io/ezfsk.Search in Google Scholar

Quintana, Daniel S. 2020. A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation. eLife 9. 1–12. https://doi.org/10.7554/eLife.53275.Search in Google Scholar

Roberts, Steven O., Carmelle Bareket-Shavit, Forrest A. Dollins, Peter D. Goldie & Elizabeth Mortenson. 2020. Racial inequality in psychological research: Trends of the past and recommendations for the future. Perspectives on Psychological Science 15(6). 1295–1309. https://doi.org/10.1177/1745691620927.Search in Google Scholar

Roettger, Timo B. 2021a. Preregistration in experimental linguistics: Applications, challenges, and limitations. Linguistics 59(5). 1227–1249. https://doi.org/10.1515/ling-2019-0048.Search in Google Scholar

Roettger, Timo B. 2021b. Toward transparent and reproducible speech sciences. In Séminaires de recherches en phonétique et phonologie. Paris: CNRS.Search in Google Scholar

Rowe, Wendy E. 2014. Positionality. In David Coghlan & Mary Brydon-Miller (eds.), The Sage encyclopedia of action research, 627–628. Los Angeles: Sage.Search in Google Scholar

Savolainen, Jukka, Patrick J. Casey, Justin P. McBrayer & Patricia Nayna Schwerdtle. 2023. Positionality and its problems: Questioning the value of reflexivity statements in research. Perspectives on Psychological Science 18. 1331–1338. https://doi.org/10.1177/17456916221144988.Search in Google Scholar

Simmons, Joseph P., D. Nelson Leif & Simonsohn. Uri. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22(11). 1359–1366. https://doi.org/10.1177/0956797611417632.Search in Google Scholar

Singh, Leher, Melanie Killen & Judith G. Smetana. 2023. Global science requires greater equity, diversity, and cultural precision. APS Observer 36.Search in Google Scholar

Spellman, B., E. A. Gilbert & K. S. Corker. 2017. Open science: What, why, and how. https://doi.org/10.31234/osf.io/ak6jr.Search in Google Scholar

Steltenpohl, Crystal, Sa-kiera Hudson & Kat Klement. 2022. How to begin writing a positionality statement. https://vimeo.com/675236573/741e24aab7.Search in Google Scholar

Stieglitz, Stefan, Konstantin Wilms, Milad Mirbabaie, Lennart Hofeditz, Bela Brenger, Ania López & Stephanie Rehwald. 2020. When are researchers willing to share their data? Impacts of values and uncertainty on open data in academia. PLoS One 15(7). 1–20. https://doi.org/10.1371/journal.pone.0234172.Search in Google Scholar

Tsikewa, Adrienne. 2021. Reimagining the current praxis of field linguistics training: Decolonial considerations. Language 97(4). e293–e319. https://doi.org/10.1353/lan.2021.0072.Search in Google Scholar

Ushey, Kevin & Hadley Wickham. 2023. Renv: Project environments. https://CRAN.R-project.org/package=renv.Search in Google Scholar

Weissler, Rachel Elizabeth, Shiloh Drake, Ksenia Kampf, Carissa Anna Diantoro, Kurtis Foster, Audrey Kirkpatrick, Isabel Preligera, Orion Wesson, Anna Wood & Melissa Michaud Baese-Berk. 2023. Speech perception and production lab: Positionality statements. https://www.speechperceptionproductionlab.com/positionalitystatments.Search in Google Scholar

Wicherts, Jelte M., Borsboom Denny, Kats Judith & Molenaar Dylan. 2006. The poor availability of psychological research data for reanalysis. American Psychologist 61(7). 726. https://doi.org/10.1037/0003-066X.61.7.726.Search in Google Scholar

Wicherts, Jelte M., Coosje L. S. Veldkamp, Hilde E. M. Augusteijn, Marjan Bakker, Robbie C. M. van Aaert & Marcel A. L. M. van Assen. 2016. Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology 7. 1–12. https://doi.org/10.3389/fpsyg.2016.01832.Search in Google Scholar

Xie, Yihui. 2015. Dynamic documents with R and knitr, 2nd edn. Boca Raton, FL & New York: Chapman and Hall/CRC. https://yihui.org/knitr/.10.1201/b15166Search in Google Scholar

Xie, Yihui. 2023. Knitr: A general-purpose package for dynamic report generation in r. https://yihui.org/knitr/.Search in Google Scholar

Received: 2023-12-22

Accepted: 2025-04-06

Published Online: 2025-07-02

Published in Print: 2025-11-25

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/ling-2023-0249

Keywords for this article

open science; reproducibility; replicability; transparency; positionality; linguistics

Creative Commons

BY 4.0