Abstract
This paper provides an overview of the discrete cosine transform (DCT) as a method for smoothing vowel formant tracks, as well as a procedure to take any speaker normalization method that has been defined for formant point measurements and define an equivalent method to be applied directly to DCT coefficients. This procedure is followed for three established normalization methods, and the difference between DCT normalization and formant point normalization is found to be marginal.
Appendix A: The scipy implementation of the DCT
The scipy documentation for the DCT describes three ways the DCT can be “normalized”, and two ways the DCT can be “orthogonalized” or “non-orthogonalized”. All of these options on the DCT alter the terms to the left of the sum in the DCT formula. Let’s define and simplify these components.
I will use S to indicate the sum function, which is defined as
This term is unaltered by any of the different options scipy offers. Any given DCT implementation can be given as
where o is the orthogonalization term and c is the normalization constant.
The orthogonalization term is the easiest to define:
The scipy documentation provides the mathematical definition for “backward” normalization constant only, but the “forward” normalization can be inferred from its output:
As a demonstration by example, we can define a python function for just the sum function (Listing 2), then apply it to the formant track in Figure 16:

Demonstration formant track.

Definition of the DCT sum function.
We can get the result of the sum function for the 0th and 1st DCT coefficients to then examine the outcome of the different normalizations, as in Listing 3.

Sum terms of the DCT.
At this point, we can also get the 0th and 1st DCT coefficients from the scipy implementation (Listing 4):

Application of the scipy DCT.
The normalizing constant for norm = “backward” is documented to be 2, so multiplying s_0 and s_1 by 2 should be equal to the 0th and 1st coefficients in dct_backward (Listing 5):

Backward DCT normalization.
If the normalizing constant for norm=“forward” is

Forward DCT normalization.
Admittedly, it would be more ideal to be able to reference the actual forward normalization constant from the scipy documentation, but it is not provided.
Appendix B: The DCT basis
While the formula in Equation (1) can be used to calculate the DCT coefficients, the formula to calculate the DCT basis functions in Figure 2 is different. If B is a matrix of the basis functions, the kth basis function will be in its columns. To get B, we apply the DCT with backward normalization to an identity matrix I (that is, a matrix with 1s along the diagonal, and 0s elsewhere). The orthogonalization term o is included in Equation (45).
This can be quickly implemented using the scipy DCT implementation like so (Listing 7):

Getting the DCT basis functions.
Appendix C: The choice of orthogonalization
The choice of “orthogonalizing” the DCT coefficients, that is, dividing the 0th coefficient by
As a practical issue, formant tracking sometimes returns missing, or NA values for some, but not all, time points along a formant track. With missing values, the DCT cannot be directly applied. However, the DCT coefficients can be approximated by linear regression, using the DCT basis as the “predictors” (Listing 8):

Comparison of direct versus regression-based DCT.
Orthogonalizing the first coefficient was the only option that resulted in the same coefficients for both regression and direct DCT within the scipy implementation. Without orthogonalizing the first coefficient, the 0th coefficient is not equal between the regression-based DCT and direct DCT (Listing 9):

Comparison of direct versus regression-based non-orthogonalized DCT.
Since the design decision to orthogonalize the DCT coefficients was made within fasttrackpy, which was the tool used to arrive at these DCT coefficients in this paper, this was also the version of the DCT used here.
References
Adank, Patti, Roel Smits & Roeland van Hout. 2004. A comparison of vowel normalization procedures for language variation research. The Journal of the Acoustical Society of America 116(5). 3099–3107. https://doi.org/10.1121/1.1795335.Search in Google Scholar
Barreda, Santiago. 2021a. Fast track: Fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard 7(1). 20200051. https://doi.org/10.1515/lingvan-2020-0051.Search in Google Scholar
Barreda, Santiago. 2021b. Perceptual validation of vowel normalization methods for variationist research. Language Variation and Change 33(1). 27–53. https://doi.org/10.1017/S0954394521000016.Search in Google Scholar
Cox, Felicity & Sallyanne Palethorpe. 2019. Vowel variation in a standard context across four major Australian cities. In Sasha Calhoun, Paola Escudero, Marija Tabain & Paul Warren (eds.), Proceedings of the 19th international congress of phonetic sciences, 577–581. Melbourne: Australasian Speech Science and Technology Association Inc & International Phonetic Association. Available at: https://assta.org/proceedings/ICPhS2019/papers/ICPhS_626.pdf.Search in Google Scholar
Docherty, Gerard, Simón Gonzalez & Nathaniel Mitchell. 2015. Static vs dynamic perspectives on the realization of vowel nucleii in West Australian English. In The Scottish Consortium for ICPhS 2015 (ed.), Proceedings of the 18th international congress of phonetic sciences. Glasgow: University of Glasgow. Available at: https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0956.pdf.Search in Google Scholar
Fox, Robert Allen & Ewa Jacewicz. 2009. Cross-dialectal variation in formant dynamics of American English vowels. The Journal of the Acoustical Society of America 126(5). 2603–2618. https://doi.org/10.1121/1.3212921.Search in Google Scholar
Fruehwald, Josef. 2025. new-fave, version 1.1.1 [python package]. Available at: https://pypi.org/project/new-fave/.Search in Google Scholar
Fruehwald, Josef & Santiago Barreda. 2024. fasttrackpy, version 0.5.3 [python package]. Available at: https://pypi.org/project/fasttrackpy/.Search in Google Scholar
Gubian, Michele, Francisco Torreira & Lou Boves. 2015. Using functional data analysis for investigating multidimensional dynamic phonetic contrasts. Journal of Phonetics 49. 16–40. https://doi.org/10.1016/j.wocn.2014.10.001.Search in Google Scholar
Guzik, Karita M. & Jonathan Harrington. 2007. The quantification of place of articulation assimilation in electropalatographic data using the similarity index (SI). Advances in Speech Language Pathology 9(1). 109–119. https://doi.org/10.1080/07268600601094294.Search in Google Scholar
Hillenbrand, James M., Michael J. Clark & Terrance M. Nearey. 2001. Effects of consonant environment on vowel formant patterns. The Journal of the Acoustical Society of America 109(2). 748–763. https://doi.org/10.1121/1.1337959.Search in Google Scholar
Jannedy, Stefanie & Melanie Weirich. 2017. Spectral moments vs discrete cosine transformation coefficients: Evaluation of acoustic measures distinguishing two merging German fricatives. The Journal of the Acoustical Society of America 142(1). 395–405. https://doi.org/10.1121/1.4991347.Search in Google Scholar
Jochim, Markus, Raphael Winkelmann, Klaus Jaensch, Steve Cassidy & Jonathan Harrington. 2024. emuR: Main package of the EMU speech database management system, version 2.5.0 [R package]. Available at: https://cran.r-project.org/web/packages/emuR/.Search in Google Scholar
Johnson, Keith. 2020. The ΔF method of vocal tract length normalization for vowels. Laboratory Phonology: Journal of the Association for Laboratory Phonology 11(1). 10. https://doi.org/10.5334/labphon.196.Search in Google Scholar
Labov, William. 2001. Principles of linguistic change, vol. 2, Social factors (Language in Society). Oxford: Blackwell.Search in Google Scholar
Labov, William & Ingrid Rosenfelder. 2011. The Philadelphia neighborhood corpus. Philadelphia: University of Pennsylvania. Available at: http://fave.ling.upenn.edu/pnc.html.Search in Google Scholar
Labov, William, Sherry Ash & Charles Boberg. 2006. The atlas of North American English: Phonetics, phonology and sound change. New York: Mouton de Gruyter.10.1515/9783110167467Search in Google Scholar
Lobanov, Boris. 1971. Classification of Russian vowels spoken by different listeners. Journal of the Acoustical Society of America 49. 606–608. https://doi.org/10.1121/1.1912396.Search in Google Scholar
Mersmann, Olaf. 2024. Fftw: Fast FFT and DCT based on the FFTW library, version 1.0-9 [R package]. Available at: https://CRAN.R-project.org/package=fftw.Search in Google Scholar
Morrison, Geoffrey Stewart. 2009. Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. The Journal of the Acoustical Society of America 125(4). 2387–2397. https://doi.org/10.1121/1.3081384.Search in Google Scholar
Nearey, Terrance M. 1978. Phonetic feature systems for vowels. Edmonton: University of Alberta PhD thesis. Available at: https://sites.ualberta.ca/∼tnearey/Nearey1978_compressed.pdf.Search in Google Scholar
Nearey, Terrance M. & Peter F. Assmann. 1986. Modeling the role of inherent spectral change in vowel identification. The Journal of the Acoustical Society of America 80(5). 1297–1308. https://doi.org/10.1121/1.394433.Search in Google Scholar
Ramsay, James & Bernard W. Silverman. 2006. Functional data analysis. New York: Springer.10.1007/b98888Search in Google Scholar
Risdal, Megan L. & Mary E. Kohn. 2014. Ethnolectal and generational differences in vowel trajectories: Evidence from African American English and the Southern Vowel System. Penn Working Papers in Linguistics 20(2). 139–148. https://doi.org/20.500.14332/45004.Search in Google Scholar
Rosenfelder, Ingrid, Josef Fruehwald, Keelan Evanini, Scott Seyfarth, Christian Brickhouse, Kyle Gorman, Hillary Prichard & Jiahong Yuan. 2024. FAVE: Forced alignment and vowel extraction, version 2.0.3 [python package]. Available at: https://pypi.org/project/fave/.Search in Google Scholar
Sóskuthy, Márton. 2017. Generalised additive mixed models for dynamic analysis in linguistics: A practical introduction. arXiv. https://doi.org/10.48550/arXiv.1703.05339.Search in Google Scholar
Sóskuthy, Márton. 2021. Evaluating generalised additive mixed modelling strategies for dynamic speech analysis. Journal of Phonetics 84. 101017. https://doi.org/10.1016/j.wocn.2020.101017.Search in Google Scholar
Tanner, James, Morgan Sonderegger & Jane Stuart-Smith. 2022. Multidimensional acoustic variation in vowels across English dialects. In Garrett Nicolai & Eleanor Chodroff (eds.), Proceedings of the 19th SIGMORPHON workshop on computational research in phonetics, phonology, and morphology, 72–82. Seattle, WA: Association for Computational Linguistics.10.18653/v1/2022.sigmorphon-1.8Search in Google Scholar
Virtanen, Pauli, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C. J. Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt. 2020. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods 17(3). 261–272. https://doi.org/10.1038/s41592-019-0686-2.Search in Google Scholar
Wallace, Gregory K. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38(1). xviii–xxxiv. https://doi.org/10.1109/30.125072.Search in Google Scholar
Watson, Catherine I. & Jonathan Harrington. 1999. Acoustic evidence for dynamic formant trajectories in Australian English vowels. The Journal of the Acoustical Society of America 106(1). 458–468. https://doi.org/10.1121/1.427069.Search in Google Scholar
Williams, Daniel & Paola Escudero. 2014. A cross-dialectal acoustic comparison of vowels in Northern and Southern British English. The Journal of the Acoustical Society of America 136(5). 2751–2761. https://doi.org/10.1121/1.4896471.Search in Google Scholar
Williams, Daniel, Jan-Willem van Leussen & Paola Escudero. 2015. Beyond North American English: Modelling vowel inherent spectral change in British English and Dutch. In The Scottish Consortium for ICPhS 2015 (ed.), Proceedings of the 18th international congress of phonetic sciences. Glasgow: University of Glasgow. Available at: https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0596.pdf.Search in Google Scholar
Zahorian, Stephen A. & Amir J. Jagharghi. 1991. Speaker normalization of static and dynamic vowel spectral features. The Journal of the Acoustical Society of America 90(1). 67–75. https://doi.org/10.1121/1.402350.Search in Google Scholar
Zahorian, Stephen A. & Amir Jalali Jagharghi. 1993. Spectral-shape features versus formants as acoustic correlates for vowels. The Journal of the Acoustical Society of America 94(4). 1966–1982. https://doi.org/10.1121/1.407520.Search in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial 2025
- Research Articles
- Vowel formant track normalization using discrete cosine transform coefficients
- Asymmetry in French speech-in-noise perception: the effects of native dialect and cross-dialectal exposure
- Direct pseudo-partitives in US English
- A baseline for object clitic climbing in Italian
- Semantic granularity in derivation
- Shared processing strategies as a mechanism for contact-induced change in flexible constituent order
- The (non)canonical status of the ka- passive in Balinese
- A comparative study of 时 si 2 /shi 2 in Meixian Hakka and Ancient Chinese using the Minimalist Program
- A quantitative method for syntactic gradience: words, phrases, and the constructions in between
- Yeah, but how? Operationalizing the functions of the discourse-pragmatic marker yeah
- Hotspots for acoustic politeness in Korean and Japanese deferential speech
- How fast is fast and how slow is slow in mental simulation? Two rating studies on Estonian speed adverbs
- Discourse effects in processing Chinese reflexive pronouns
- Attitudinal negotiation: the analysis of online commentary videos about an international event on Chinese social media platform bilibili.com
- Crosslinguistic constructions and strategies: where do concessive conditionals fit in?
- Recurring patterns in tone (chain) shift
- Null pronoun interpretation probed via thematic role ambiguity: a case in Korean
- Experimental investigation on quantifier scope in Chinese relative clauses
- Sensitivity to honorific agreement: a window into predictive processing
- The negative concord illusion: an acceptability study with Czech neg-words
- Expletive negation in Italian temporal clauses: an acceptability judgement and a self-paced reading study
- Effects of information structure on pronoun resolution: the number of pronouns matters
- The cognitive processing of nouns and verbs in second language reading: an eye-tracking study
- Comprehension of conversational implicatures in L3 Mandarin
- Effects of crosslinguistic influence in definiteness acquisition: comparing HL-English and HL-Russian bilingual children acquiring Hebrew
- Multimodal language processing in school-aged Mandarin-speaking children: the role of beat gesture in enhancing memory for discourse information
- My Memoji, my self: prosodic correlates of online performed code-switching via avatar
- Gender effects in Mandarin creaky voice evaluation: a matched-guise study
- Narrating the doctoral journey on Chinese social media: chronotopes and scales in user interaction on Xiaohongshu
- Salient Language in Context (SLIC): a web app for collecting real-time attention data in response to audio samples
- Children’s emerging sociolinguistic expectations around social roles: a triangulated approach
- Situating speakers in change: a methodology for quantifying degree and direction of change over the lifespan
- Testing the effect of speech separation on vowel formant estimates
- Researching dialects with high school students: a citizen science approach
- Sociolinguistic research projects as brands
- Do readers perceive various types of knowledge expressed through evidentials in news reports with different degrees of certainty?
- Quantitative relationship between distribution of sentence length and dependency distance in Spanish
- Large corpora and large language models: a replicable method for automating grammatical annotation
- Using ATLAS.ti for constructing and analysing multimodal social media corpora
- Exploring the effect of semantic diversity on boundary permeability in verb/noun heterosemy using deep contextualized word embedding
- Communicative pressures influence the use of adverbs as well as adjectives: evidence from a crosslinguistic investigation
- Non-signers favor two-handed gestures when expressing inherently plural meanings
- Encoding Chinese metaphorical motion: a typological perspective
- Frequency does not predict the processing speed of multi-morpheme sequences in Japanese
- Did he lead monologues or did he talk to himself? How typological distance between source and target language influences the preservation of metaphorical mappings in translation
- How long is too long? Production-internal and communicative constraints in the coding of conditionality in Spanish
- Long English objects and short Chinese objects: language diversity shaped by cognitive universality
- Corrigendum
- Corrigendum to: Sign recognition: the effect of parameters and features in sign mispronunciations
Articles in the same Issue
- Frontmatter
- Editorial
- Editorial 2025
- Research Articles
- Vowel formant track normalization using discrete cosine transform coefficients
- Asymmetry in French speech-in-noise perception: the effects of native dialect and cross-dialectal exposure
- Direct pseudo-partitives in US English
- A baseline for object clitic climbing in Italian
- Semantic granularity in derivation
- Shared processing strategies as a mechanism for contact-induced change in flexible constituent order
- The (non)canonical status of the ka- passive in Balinese
- A comparative study of 时 si 2 /shi 2 in Meixian Hakka and Ancient Chinese using the Minimalist Program
- A quantitative method for syntactic gradience: words, phrases, and the constructions in between
- Yeah, but how? Operationalizing the functions of the discourse-pragmatic marker yeah
- Hotspots for acoustic politeness in Korean and Japanese deferential speech
- How fast is fast and how slow is slow in mental simulation? Two rating studies on Estonian speed adverbs
- Discourse effects in processing Chinese reflexive pronouns
- Attitudinal negotiation: the analysis of online commentary videos about an international event on Chinese social media platform bilibili.com
- Crosslinguistic constructions and strategies: where do concessive conditionals fit in?
- Recurring patterns in tone (chain) shift
- Null pronoun interpretation probed via thematic role ambiguity: a case in Korean
- Experimental investigation on quantifier scope in Chinese relative clauses
- Sensitivity to honorific agreement: a window into predictive processing
- The negative concord illusion: an acceptability study with Czech neg-words
- Expletive negation in Italian temporal clauses: an acceptability judgement and a self-paced reading study
- Effects of information structure on pronoun resolution: the number of pronouns matters
- The cognitive processing of nouns and verbs in second language reading: an eye-tracking study
- Comprehension of conversational implicatures in L3 Mandarin
- Effects of crosslinguistic influence in definiteness acquisition: comparing HL-English and HL-Russian bilingual children acquiring Hebrew
- Multimodal language processing in school-aged Mandarin-speaking children: the role of beat gesture in enhancing memory for discourse information
- My Memoji, my self: prosodic correlates of online performed code-switching via avatar
- Gender effects in Mandarin creaky voice evaluation: a matched-guise study
- Narrating the doctoral journey on Chinese social media: chronotopes and scales in user interaction on Xiaohongshu
- Salient Language in Context (SLIC): a web app for collecting real-time attention data in response to audio samples
- Children’s emerging sociolinguistic expectations around social roles: a triangulated approach
- Situating speakers in change: a methodology for quantifying degree and direction of change over the lifespan
- Testing the effect of speech separation on vowel formant estimates
- Researching dialects with high school students: a citizen science approach
- Sociolinguistic research projects as brands
- Do readers perceive various types of knowledge expressed through evidentials in news reports with different degrees of certainty?
- Quantitative relationship between distribution of sentence length and dependency distance in Spanish
- Large corpora and large language models: a replicable method for automating grammatical annotation
- Using ATLAS.ti for constructing and analysing multimodal social media corpora
- Exploring the effect of semantic diversity on boundary permeability in verb/noun heterosemy using deep contextualized word embedding
- Communicative pressures influence the use of adverbs as well as adjectives: evidence from a crosslinguistic investigation
- Non-signers favor two-handed gestures when expressing inherently plural meanings
- Encoding Chinese metaphorical motion: a typological perspective
- Frequency does not predict the processing speed of multi-morpheme sequences in Japanese
- Did he lead monologues or did he talk to himself? How typological distance between source and target language influences the preservation of metaphorical mappings in translation
- How long is too long? Production-internal and communicative constraints in the coding of conditionality in Spanish
- Long English objects and short Chinese objects: language diversity shaped by cognitive universality
- Corrigendum
- Corrigendum to: Sign recognition: the effect of parameters and features in sign mispronunciations