Home A corpus-based quantitative analysis of German legal language: linguistic features and implications for accessibility
Article
Licensed
Unlicensed Requires Authentication

A corpus-based quantitative analysis of German legal language: linguistic features and implications for accessibility

  • Yuan Li ORCID logo EMAIL logo and Qi Liu
Published/Copyright: October 20, 2025
Become an author with De Gruyter Brill

Abstract

The legal language has been extensively studied due to its profound and far-reaching impact on society. Among the various research areas, the complexity of legal language has received particular attention due to its implications for legal practice and societal access to justice. Relevant studies seek to identify the factors contributing to the difficulty of legal language and explore potential methods of simplification. However, despite extensive qualitative research, a comprehensive, quantitative analysis of the essential linguistic features distinguishing legal language from everyday language remains underexplored. The present study fills this gap and quantitatively examines the key features of German legal language by comparing it to journalistic language as a benchmark for everyday discourse, providing a rigorous, corpus-based perspective on the lexical, syntactic, and typological features of legal texts. Using data from the HDT-UD and DGT-UD treebanks, the study analyzes 18 quantitative indicators across three linguistic domains – lexicon, syntax, and word-order typology. The results indicate that legal language is characterized by a limited vocabulary, a higher frequency of long words and dependent clauses, increased syntactic and structural complexity, and a predominance of SV and OV word order patterns. By providing a detailed comparison of legal and journalistic registers, this study advances the understanding of legal language through objective, empirical analysis. The findings have practical implications for legal communication, suggesting that greater attention to lexical simplification and syntactic clarity could improve accessibility and comprehension.


Corresponding author: Yuan Li, School of International Studies, Zhejiang University, Hangzhou, China, E-mail:

References

Arbel, Yonathan. 2024. The readability of contracts: Big data analysis. Journal of Empirical Legal Studies 21(4). 927–978. https://doi.org/10.1111/jels.12400.Search in Google Scholar

Arnould, Arthur, Rita Hendricusdottir & Jeroen Bergmann. 2021. The complexity of medical device regulations has increased, as assessed through data-driven techniques. Prosthesis 3(4). 314–330. https://doi.org/10.3390/prosthesis3040029.Search in Google Scholar

Baumann, Antje. 2015. Bedeutung in Gesetzen: Wie man eine spezielle textsorte mit korpuslinguistischen Mitteln verständlicher machen könnte. In Friedemann Vogel (ed.), Zugänge zur Rechtssemantik: Interdisziplinäre Ansätze im Zeitalter der Mediatisierung, 254–274. Berlin, München, Boston: De Gruyter.10.1515/9783110348941-013Search in Google Scholar

Bielawski, Paweł. 2022. Juristische Phraseologie im Kontext der Rechtsübersetzung am Beispiel deutscher und polnischer Anklageschriften. Berlin: Frank & Timme.10.57088/978-3-7329-9124-2Search in Google Scholar

Blasie, Michael. 2022. The rise of plain language laws. University of Miami Law Review 76(2). 447–524.Search in Google Scholar

Brandt, Wolfgang. 1991. Müssen Gesetze schwer verständlich sein? Einwände eines Linguisten gegen Schutzbehauptungen der Juristen. In Jörn Eckert & Hans Hattenhauer (eds.), Sprache - Recht – Geschichte: Rechtshistorisches Kolloquium, 339–350. Heidelberg: C. F. Müller Juristischer Verlag.Search in Google Scholar

Busse, Dietrich. 2013. Juristische Fachsprache und Öffentlicher Sprachgebrauch: Richterliche Bedeutungsdefinitionen und ihr Einfluß auf die Semantik politischer Begriffe. In Frank Liedtke, Martin Wengeler & Karin Böke (eds.), Begriffe besetzen: Strategien des Sprachgebrauchs in der Politik, 160–185. Berlin: Springer-Verlag.10.1007/978-3-322-92242-7_10Search in Google Scholar

Chakhnashvili, Tamar. 2012. Besonderheiten der deutschen Rechtssprache bei der fachsprachlichen Kommunikation. In Nino Abralava, Manana Kutelia, Tea Petelava, Elisabeth Venohr & Heiner Dintera (eds.), Beiträge zur Internationalen Tagung Theorie und Praxis der deutschen Fachsprache(n) in Georgien, 61–70. Georgia: Universitätsverlag.Search in Google Scholar

Charrow, Veda, Jo Crandall & Robert Charrow. 1982. Characteristics and functions of legal language. In Richard Kittredge & John Lehrberger (eds.), Sublanguage: Studies of language in restricted semantic domains, 175–190. Berlin, Boston: De Gruyter.Search in Google Scholar

Chen, Xiaobin & Detmar Meurers. 2016. CTAP: A web-based tool supporting automatic complexity analysis. In Dominique Brunato, Felice Dell’ Orletta, Giulia Venturi, Thomas François & Philippe Blache (eds.), Proceedings of the workshop on computational linguistics for linguistic complexity, 113–119. Osaka: The COLING 2016 Organizing Committee.Search in Google Scholar

Cheng, Le & Jiamin Pei. 2025. Legal discourse in transition: Technology, methodology, and sociology. International Journal of Legal Discourse 10(1). 1–11. https://doi.org/10.1515/ijld-2025-2001.Search in Google Scholar

Codarcea, Emilia. 2021. Linguistische Merkmale der juristischen Fachsprache: Bemerkungen zur Fachlichkeit und Verständlichkeit juristischer Texte. In Roxana-Maria Nistor, Camelia Teglaş, Roxana Mihele & Raluca Zglobiu-Sandu (eds.), Limbaje specializate: Abordari si provocari pentru viitor, 132–145. Cluj-Napoca: Presa Universitară Clujeană.Search in Google Scholar

Cowan, Nelson. 2005. Working memory capacity. Hove: Psychology Press.Search in Google Scholar

Crossley, Scott, Stephen Skalicky & Mihai Dascalu. 2019. Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading 42(3–4). 541–561. https://doi.org/10.1111/1467-9817.12283.Search in Google Scholar

Curtotti, Michael & Eric McCreath. 2013. Right to access implies right to know: An open online platform for research on the readability of law. Journal of Open Access to Law. 1(1). 1–56.Search in Google Scholar

European Parliament. 2016. Regulation (EU) 2016/1037 of the European Parliament and of the council of 8 June 2016 on protection against subsidised imports from countries not members of the European Union (codification). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R1037&qid=1711864492492 (accessed 29 March 2024).Search in Google Scholar

Felder, Ekkehard & Friedemann Vogel. 2017. Handbuch Sprache im Recht. Berlin: Walter de Gruyter.10.1515/9783110296198Search in Google Scholar

Foth, Kilian, Arne Köhn, Niels Beuck & Wolfgang Menzel. 2014. Because size does matter: The Hamburg dependency treebank. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds.), Proceedings of the ninth international conference on language resources and evaluation, 2326–2333. Reykjavik: European Language Resources Association.Search in Google Scholar

Gibbons, John. 1999. Language and the law. Annual Review of Applied Linguistics 19. 156–173. https://doi.org/10.1017/s0267190599190081.Search in Google Scholar

Grieshofer, Tatiana, Matt Gee & Ralph Morton. 2022. The journey to comprehensibility: Court forms as the first barrier to accessing justice. International Journal for the Semiotics of Law 35. 1733–1759.10.1007/s11196-021-09870-6Search in Google Scholar

Hudson, Richard. 1995. Measuring syntactic difficulty. London: University College.Search in Google Scholar

Jiang, Jingyang & Haitao Liu. 2015. The effects of sentence length on dependency distance, dependency direction and the implications: Based on a parallel English Chinese dependency treebank. Language Sciences 50. 93–104. https://doi.org/10.1016/j.langsci.2015.04.002.Search in Google Scholar

Joint Research Centre. 2024. DGT-translation memory. https://joint-research-centre.ec.europa.eu/language-technology-resources/dgt-translation-memory (accessed 26 March 2024).Search in Google Scholar

Kubát, Miroslav, Matlach Vladimír & Radek, Čech. 2014. QUITA – Quantitative Index Text Analyzer. https://ram-verlag.de/software-neu/quita-quantitative-index-text-analyzer/ (accessed 1 October 2025).Search in Google Scholar

Li, Jian & Zhanglei Ye. 2024. Stance expressions in legal academic discourse: A corpus-based analysis of legal journals. International Journal of Legal Discourse 9(2). 367–385. https://doi.org/10.1515/ijld-2024-2016.Search in Google Scholar

Liu, Haitao. 2008. Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science 9(2). 159–191. https://doi.org/10.17791/jcs.2008.9.2.159.Search in Google Scholar

Liu, Haitao. 2010. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua 120(6). 1567–1578. https://doi.org/10.1016/j.lingua.2009.10.001.Search in Google Scholar

Liu, Haitao. 2022. Dependency relation & language networks. Beijing: Science Press.Search in Google Scholar

Liu, Haitao, Yiyi Zhao & Wenwen Li. 2009. Chinese syntactic and typological properties based on dependency syntactic treebanks. Poznań Studies in Contemporary Linguistics 45(4). 509–523. https://doi.org/10.2478/v10010-009-0025-3.Search in Google Scholar

Liu, Bingli, Yaxian Niu & Haitao Liu. 2012. Word class, syntactic function and style: A comparative study based on annotated corpora. Applied Linguistics(4). 134–142.Search in Google Scholar

Ljubešić, Nikola & Tomaž Erjavec. 2018. JRC EU DGT Translation Memory Parsebank DGT-UD (1.0). Slovenia: CLARIN.SI. https://www.clarin.si/repository/xmlui/handle/11356/1197?show=full (accessed 2 February 2024).Search in Google Scholar

Marneffe, Marie-Catherine de, Christopher Manning, Joakim Nivre & Daniel Zeman. 2021. Universal dependencies. Computational Linguistics 47(2). 255–308.Search in Google Scholar

Melinkoff, David. 1963. The language of the law. Boston: Little Brown.Search in Google Scholar

Roelcke, Thorsten. 2010. Fachsprachen. Berlin: Erich Schmidt Verlag.Search in Google Scholar

Schendera, Christian. 2004. Die Verständlichkeit von Rechtstexten: Eine kritische Darstellung der Forschungslage. In Kent Lerch (ed.), Die Sprache des Rechts, 321–332. Berlin: Walter de Gruyter.Search in Google Scholar

Schriver, Karen. 2017. Plain language in the US gains momentum: 1940–2015. IEEE Transactions on Professional Communications 60(4). 343–383. https://doi.org/10.1109/tpc.2017.2765118.Search in Google Scholar

Scott, Mike. 2024. WordSmith tools version 9. Stroud: Lexical analysis software. https://lexically.net/wordsmith/. (accessed 26 March 2024).Search in Google Scholar

Sun, Yuxiu & Cheng Le. 2017. Linguistic variation and legal representation in legislative discourse: A corpus-based multi-dimensional study. International Journal of Legal Discourse 2(2). 315–339. https://doi.org/10.1515/ijld-2017-0017.Search in Google Scholar

Tesnière, Lucien. 1976. Eléments de syntaxe structural. Paris: Klincksieck.Search in Google Scholar

Twain, Mark. 1880. A tramp abroad. Hartford: American publishing company.Search in Google Scholar

Völker, Emanuel Borges, Maximilian Wendt, Felix Hennig & Arne Köhn. 2019. HDT-UD: A very large universal dependencies treebank for German. In Alexandre Rademaker & Francis Tyers (eds.), Proceedings of the third workshop on universal dependencies, 46–55. Paris: Association for Computational Linguistics.10.18653/v1/W19-8006Search in Google Scholar

Walter, Tonio. 2007. Sprache und Stil in Rechtstexten. Juristische Rundschau 2. 61–65. https://doi.org/10.1515/juru.2007.016.Search in Google Scholar

Williams, Christopher. 2004. Legal English and plain language: An introduction. ESP across Cultures 1(1). 111–124.Search in Google Scholar

Yan, Jianwei & Haitao Liu. 2023. Quantitative word-order typology based on the dependency direction of syntactic relations with high frequencies. Applied Linguistics (02). 79–90.Search in Google Scholar

Yngve, Victor. 1960. A model and an hypothesis for language structure. Proceedings of the American Philosophical Society 104(5). 444–466.Search in Google Scholar

Zhou, Pinyu, Ning Ye & Jiamin Pei. 2024. Evolution and regulation of online public opinion on Weibo: A corpus-based topic-sentiment aggregation analysis. International Journal of Legal Discourse 10(1). 121–152. https://doi.org/10.1515/ijld-2025-2007.Search in Google Scholar

Ződi, Zsolt. 2019. The limits of plain legal language: Understanding the comprehensible style in law. International Journal of Law in Context 15(3). 246–262. https://doi.org/10.1017/s1744552319000260.Search in Google Scholar

Received: 2025-01-02
Accepted: 2025-08-15
Published Online: 2025-10-20

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 23.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijld-2025-2019/html
Scroll to top button