Home Yiyang Xiang vowel quality: Comparability across two recording media
Article
Licensed
Unlicensed Requires Authentication

Yiyang Xiang vowel quality: Comparability across two recording media

  • Robert Marcelo Sevilla ORCID logo EMAIL logo
Published/Copyright: April 10, 2023

Abstract

Current events have necessitated the sacrifice of some degree of recording quality in order to reach inaccessible or faraway areas; for instance, using video conferencing software like Zoom for recording rather than traditional in-person microphone or sound booth recording. This then leads to the question: can Zoom-recorded data be used more or less interchangeably with standard recording procedures? The present research is an analysis of vowel acoustics in the Yiyang dialect of Xiang (Sinitic), comparing across two recording mediums: one online (Zoom) and another in person (sound booth). Researching Xiang varieties has been made increasingly difficult during the pandemic. This study analyzes two recordings retelling the events of the Pear Stories video, performed by a speaker of Yiyang Xiang (female, 24, college-educated), one recorded in the sound booth at the University of Hong Kong and another recorded through Zoom using a laptop microphone. Acoustic features analyzed include F1, F2, and F3. Preliminary findings suggest that while F1 is fairly comparable between the two recordings, the higher two formants are altered in ways that question the comparability of Zoom-recorded versus sound booth-recorded vowels. However, results improve considerably if formants are collected manually, suggesting that some recoverability is possible.


Corresponding author: Robert Marcelo Sevilla, Department of Linguistics, The University of Hong Kong, Pokfulam, Hong Kong, E-mail:

Acknowledgments

I would like to thank Viktorija Kostadinova and Matt Gardner at the Getting Data working group (https://gettingdata.humanities.uva.nl/) for their feedback and support in exploring this topic; their drive and motivation in establishing this group in response to the pandemic is truly inspiring. I would also like to thank Dr. Jonathan Havenhill for his comments on methodological approaches; all errors are my own, of course.

Appendix

Note that in these tables: *p < 0.05, **p < 0.01, ***p < 0.001.

Table 2:

Linear model for raw low /a/ (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 882.42 13.95 94 63.26 <2 × 10−16***
yy_zoom −196.95 25.38 94 −7.76 =1.02 × 10−11***
F2
Intercept (yy_booth) 1,446.78 27.73 94 52.180 <2 × 10−16***
yy_zoom −321.94 50.45 94 −6.382 =6.56 × 10−9***
F3
Intercept (yy_booth) 2,924.43 57.33 94 51.01 <2 × 10−16***
yy_zoom −1,074.75 104.31 94 −10.30 <2 × 10−16***
Table 3:

Linear model for raw mid front /e/ (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 513.36 23.91 24 21.469 <2 × 10−16***
yy_zoom −25.53 40.64 24 −0.628 =0.536
F2
Intercept (yy_booth) 1,991.1 135.5 24 14.696 =1.7 × 10−13***
yy_zoom −270.6 230.3 24 −1.175 =0.251
F3
Intercept (yy_booth) 2,959.98 90.74 24 32.621 <2 × 10−16***
yy_zoom −591.05 154.23 24 −3.832 =0.000804***
Table 4:

Linear model for raw high front /i/ (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 356.243 6.056 66 58.827 <2 × 10−16***
yy_zoom 18.678 11.166 66 1.673 =0.0991
F2
Intercept (yy_booth) 2,572.64 86.96 66 29.58 <2 × 10−16***
yy_zoom −644.57 160.35 66 −4.02 =0.000152***
F3
Intercept (yy_booth) 3,289.55 29.61 66 111.103 <2 × 10−16***
yy_zoom −431.71 54.59 66 −7.907 =3.88 × 10−11***
Table 5:

Linear model for raw mid central /ə/ (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 514.18 20.84 22 24.677 <2 × 10−16***
yy_zoom −57.40 41.67 22 −1.377 =0.182
F2
Intercept (yy_booth) 1,620.35 57.61 22 28.125 <2 × 10−16***
yy_zoom −796.08 115.22 22 −6.909 =6.15 × 10−7***
F3
Intercept (yy_booth) 3,123.12 95.03 22 32.865 <2 × 10−16***
yy_zoom −1,443.18 190.05 22 −7.593 =1.39 × 10−07***
Table 6:

Linear model for raw high back /u/ (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 388.83 10.24 38 37.983 <2 × 10−16***
yy_zoom 87.09 24.47 38 3.559 =0.00102**
F2
Intercept (yy_booth) 948.16 27.36 38 34.651 <2 × 10−16***
yy_zoom −38.10 65.41 38 −0.582 =0.564
F3
Intercept (yy_booth) 2,901.98 78.09 38 37.163 <2 × 10−16***
yy_zoom −458.04 186.67 38 −2.454 =0.0188*
Table 7:

Linear model for raw mid back /o/ (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 626.70 18.28 52 34.285 <2 × 10−16***
yy_zoom −95.09 31.66 52 −3.004 =0.0041**
F2
Intercept (yy_booth) 996.3 21.6 52 46.136 <2 × 10−16***
yy_zoom −162.0 37.4 52 −4.331 =6.79 × 10−5***
F3
Intercept (yy_booth) 3,134.29 77.12 52 40.64 <2 × 10−16***
yy_zoom −1,699.08 133.57 52 −12.72 <2 × 10−16***
Table 8:

Linear model for low /a/, manually corrected (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 887.53 11.29 94 78.64 <2 × 10−16***
yy_zoom −12.94 20.53 94 −0.63 =0.53
F2
Intercept (yy_booth) 1,467.545 23.848 94 61.538 <2 × 10−16***
yy_zoom 9.752 43.390 94 0.225 =0.823
F3
Intercept (yy_booth) 3,075.40 42.91 94 71.673 <2 × 10−16***
yy_zoom −50.65 78.07 94 −0.649 =0.518
Table 9:

Linear model for mid front /e/, manually corrected (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 496.01 18.53 24 26.772 <2 × 10−16***
yy_zoom 21.23 31.49 24 0.674 =0.507
F2
Intercept (yy_booth) 2,213.061 60.210 24 36.756 <2 × 10−16***
yy_zoom −4.594 102.337 24 −0.045 =0.965
F3
Intercept (yy_booth) 3,126.22 48.55 24 64.395 <2 × 10−16***
yy_zoom −200.58 82.51 24 −2.431 =0.0229*
Table 10:

Linear model for high front /i/, manually corrected (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 356.243 5.427 66 65.637 <2 × 10−16***
yy_zoom 49.827 10.008 66 4.979 =4.86 × 10−6***
F2
Intercept (yy_booth) 2,657.40 20.16 66 131.819 <2 × 10−16***
yy_zoom −108.89 37.17 66 −2.929 =0.00466**
F3
Intercept (yy_booth) 3,348.94 31.47 66 106.428 <2 × 10−16***
yy_zoom −263.06 58.02 66 −4.534 =2.5 × 10−5***
Table 11:

Linear model for mid central /ə/, manually corrected (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 499.11 23.09 22 21.616 =2.6 × 10−16***
yy_zoom 85.95 46.18 22 1.861 =0.0761
F2
Intercept (yy_booth) 1,670.06 38.96 22 42.866 <2 × 10−16***
yy_zoom −25.99 77.92 22 −0.334 =0.742
F3
Intercept (yy_booth) 3,274.00 56.78 22 57.661 <2 × 10−16***
yy_zoom −83.64 113.56 22 −0.736 =0.469
Table 12:

Linear model for high back /u/, manually corrected (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 385.99 10.66 38 36.206 <2 × 10−16***
yy_zoom 87.75 25.48 38 3.443 =0.00141**
F2
Intercept (yy_booth) 945.22 25.95 38 36.424 <2 × 10−16***
yy_zoom 62.34 62.34 38 1.005 =0.321
F3
Intercept (yy_booth) 2,929.19 52.36 38 55.94 <2 × 10−16***
yy_zoom −41.27 125.17 38 −0.33 =0.743
Table 13:

Linear model for mid back /o/, manually corrected (F1/F2/F3 ∼ Recording Condition).

Estimate SE df t p
F1
Intercept (yy_booth) 541.57 14.36 52 37.712 <2 × 10−16***
yy_zoom 81.15 24.87 52 3.262 =0.00195**
F2
Intercept (yy_booth) 976.440 17.009 52 57.406 <2 × 10−16***
yy_zoom 6.449 29.461 52 0.219 =0.828
F3
Intercept (yy_booth) 3,140.19 44.50 52 70.560 <2 × 10−16***
yy_zoom 57.63 77.08 52 0.748 =0.458

References

Boersma, Paul & David Weenink. 2021. Praat: Doing phonetics by computer, version 6.0.49 [Computer program]. Available at: http://www.praat.org/.Search in Google Scholar

Bulgin, James, Paul De Decker & Jennifer Nycz. 2010. Reliability of formant measurements from lossy compressed audio. Paper presented at the British Association of Academic Phoneticians Colloquium, University of Westminster, 29–31 March.Search in Google Scholar

Calder, Jeremy & Rebecca Wheeler. 2022. Is Zoom viable for sociophonetic research? A comparison of in-person and online recordings for sibilant analysis. Linguistics Vanguard 8. pp 20210014. https://doi.org/10.1515/lingvan-2021-0014.Search in Google Scholar

Calder, Jeremy, Rebecca Wheeler, Sarah Adams, Daniel Amarelo, Katherine Arnold-Murray, Justin Bai, Meredith Church, Josh Daniels, Sarah Gomez, Jacob Henry, Yunan Jia, Brienna Johnson-Morris, Kyo Lee, Kit Miller, Derrek Powell, Caitlin Ramsey-Smith, Sydney Rayl, Sara Rosenau & Nadine Salvador. 2022. Is Zoom viable for sociophonetic research? A comparison of in-person and online recordings for vocalic analysis. Linguistics Vanguard 8. pp 20200148. https://doi.org/10.1515/lingvan-2020-0148.Search in Google Scholar

Chafe, Wallace (ed.). 1980. The pear stories: Cognitive, cultural and linguistic aspects of narrative production. Norwood, NJ: Ablex.Search in Google Scholar

Corretge, Ramon. 2012. Praat vocal toolkit [Computer program]. http://www.praatvocaltoolkit.com/ (accessed 12 February 2021).Search in Google Scholar

Cui, Zhenhua. 1998. Yiyang fangyan yanjiu [A study of the Yiyang dialect]. Changsha: Hunan Education Press.Search in Google Scholar

De Decker, Paul & Jennifer Nycz. 2011. For the record: Which digital media can be used for sociophonetic analysis? University of Pennsylvania Working Papers in Linguistics 17(2). 51–59.Search in Google Scholar

Freeman, Valerie & Paul De Decker. 2021. Remote sociophonetic data collection: Vowels and nasalization over video conferencing apps. Journal of the Acoustical Society of America 149(2). 1211–1223. https://doi.org/10.1121/10.0003529.Search in Google Scholar

Ge, Chunyu, Yixuan Xiong & Peggy Mok. 2021. How reliable are phonetic data collected remotely? Comparison of recording devices and environments on acoustic measurements. Proceedings of Interspeech 2021. 3984–3988. https://doi.org/10.21437/Interspeech.2021-1122.Search in Google Scholar

Mayorga, Pedro, Laurent Besacier, Richard Lamy & J.-F. Serignat. 2003. Audio packet loss over IP and speech recognition. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE cat. no. 03EX721), 607–612. St. Thomas, VI: IEEE.10.1109/ASRU.2003.1318509Search in Google Scholar

Norman, Jerry. 1988. Chinese. Cambridge: Cambridge University Press.Search in Google Scholar

R Core Team. 2021. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar

Salomon, David. 2007. A concise introduction to data compression. London: Springer Science & Business Media.10.1007/978-1-84800-072-8Search in Google Scholar

Sanker, Chelsea, Sarah Babinski, Roslyn Burns, Marisha Evans, Jeremy Johns, Juhyae Kim, Slater Smith, Natalie Weber & Claire Bowern. 2021. (Don’t) try this at home! The effects of recording devices and software on phonetic analysis. Language 97(4). e360–e382. https://doi.org/10.1353/lan.2021.0079.Search in Google Scholar

Stanley, Joey. 2019. Automatic formant extraction in Praat. https://joeystanley.com/downloads/191002-formant_extraction.html (accessed 21 February 2021).Search in Google Scholar

Thomas, Erik & Tyler Kendall. 2007. NORM: The vowel normalization and plotting suite. Eugene, OR: University of Oregon. http://ncslaap.lib.ncsu.edu/tools/norm/ (accessed 21 February 2021).Search in Google Scholar

Wu, Yunji. 2005. Synchronic and diachronic study of the grammar of the Chinese Xiang dialects. Berlin: Mouton de Gruyter.10.1515/9783110927481Search in Google Scholar

Zhang, Cong, Kathleen Jepson, Georg Lohfink & Amalia Arvaniti. 2021. Comparing acoustic analyses of speech data collected remotely. Journal of the Acoustical Society of America 149(6). 3910–3916. https://doi.org/10.1121/10.0005132.Search in Google Scholar

Received: 2021-12-14
Accepted: 2022-12-13
Published Online: 2023-04-10

© 2023 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 10.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/lingvan-2021-0144/html
Scroll to top button