Abstract
Current events have necessitated the sacrifice of some degree of recording quality in order to reach inaccessible or faraway areas; for instance, using video conferencing software like Zoom for recording rather than traditional in-person microphone or sound booth recording. This then leads to the question: can Zoom-recorded data be used more or less interchangeably with standard recording procedures? The present research is an analysis of vowel acoustics in the Yiyang dialect of Xiang (Sinitic), comparing across two recording mediums: one online (Zoom) and another in person (sound booth). Researching Xiang varieties has been made increasingly difficult during the pandemic. This study analyzes two recordings retelling the events of the Pear Stories video, performed by a speaker of Yiyang Xiang (female, 24, college-educated), one recorded in the sound booth at the University of Hong Kong and another recorded through Zoom using a laptop microphone. Acoustic features analyzed include F1, F2, and F3. Preliminary findings suggest that while F1 is fairly comparable between the two recordings, the higher two formants are altered in ways that question the comparability of Zoom-recorded versus sound booth-recorded vowels. However, results improve considerably if formants are collected manually, suggesting that some recoverability is possible.
Acknowledgments
I would like to thank Viktorija Kostadinova and Matt Gardner at the Getting Data working group (https://gettingdata.humanities.uva.nl/) for their feedback and support in exploring this topic; their drive and motivation in establishing this group in response to the pandemic is truly inspiring. I would also like to thank Dr. Jonathan Havenhill for his comments on methodological approaches; all errors are my own, of course.
Note that in these tables: *p < 0.05, **p < 0.01, ***p < 0.001.
Linear model for raw low /a/ (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 882.42 | 13.95 | 94 | 63.26 | <2 × 10−16*** |
yy_zoom | −196.95 | 25.38 | 94 | −7.76 | =1.02 × 10−11*** |
F2 | |||||
Intercept (yy_booth) | 1,446.78 | 27.73 | 94 | 52.180 | <2 × 10−16*** |
yy_zoom | −321.94 | 50.45 | 94 | −6.382 | =6.56 × 10−9*** |
F3 | |||||
Intercept (yy_booth) | 2,924.43 | 57.33 | 94 | 51.01 | <2 × 10−16*** |
yy_zoom | −1,074.75 | 104.31 | 94 | −10.30 | <2 × 10−16*** |
Linear model for raw mid front /e/ (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 513.36 | 23.91 | 24 | 21.469 | <2 × 10−16*** |
yy_zoom | −25.53 | 40.64 | 24 | −0.628 | =0.536 |
F2 | |||||
Intercept (yy_booth) | 1,991.1 | 135.5 | 24 | 14.696 | =1.7 × 10−13*** |
yy_zoom | −270.6 | 230.3 | 24 | −1.175 | =0.251 |
F3 | |||||
Intercept (yy_booth) | 2,959.98 | 90.74 | 24 | 32.621 | <2 × 10−16*** |
yy_zoom | −591.05 | 154.23 | 24 | −3.832 | =0.000804*** |
Linear model for raw high front /i/ (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 356.243 | 6.056 | 66 | 58.827 | <2 × 10−16*** |
yy_zoom | 18.678 | 11.166 | 66 | 1.673 | =0.0991 |
F2 | |||||
Intercept (yy_booth) | 2,572.64 | 86.96 | 66 | 29.58 | <2 × 10−16*** |
yy_zoom | −644.57 | 160.35 | 66 | −4.02 | =0.000152*** |
F3 | |||||
Intercept (yy_booth) | 3,289.55 | 29.61 | 66 | 111.103 | <2 × 10−16*** |
yy_zoom | −431.71 | 54.59 | 66 | −7.907 | =3.88 × 10−11*** |
Linear model for raw mid central /ə/ (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 514.18 | 20.84 | 22 | 24.677 | <2 × 10−16*** |
yy_zoom | −57.40 | 41.67 | 22 | −1.377 | =0.182 |
F2 | |||||
Intercept (yy_booth) | 1,620.35 | 57.61 | 22 | 28.125 | <2 × 10−16*** |
yy_zoom | −796.08 | 115.22 | 22 | −6.909 | =6.15 × 10−7*** |
F3 | |||||
Intercept (yy_booth) | 3,123.12 | 95.03 | 22 | 32.865 | <2 × 10−16*** |
yy_zoom | −1,443.18 | 190.05 | 22 | −7.593 | =1.39 × 10−07*** |
Linear model for raw high back /u/ (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 388.83 | 10.24 | 38 | 37.983 | <2 × 10−16*** |
yy_zoom | 87.09 | 24.47 | 38 | 3.559 | =0.00102** |
F2 | |||||
Intercept (yy_booth) | 948.16 | 27.36 | 38 | 34.651 | <2 × 10−16*** |
yy_zoom | −38.10 | 65.41 | 38 | −0.582 | =0.564 |
F3 | |||||
Intercept (yy_booth) | 2,901.98 | 78.09 | 38 | 37.163 | <2 × 10−16*** |
yy_zoom | −458.04 | 186.67 | 38 | −2.454 | =0.0188* |
Linear model for raw mid back /o/ (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 626.70 | 18.28 | 52 | 34.285 | <2 × 10−16*** |
yy_zoom | −95.09 | 31.66 | 52 | −3.004 | =0.0041** |
F2 | |||||
Intercept (yy_booth) | 996.3 | 21.6 | 52 | 46.136 | <2 × 10−16*** |
yy_zoom | −162.0 | 37.4 | 52 | −4.331 | =6.79 × 10−5*** |
F3 | |||||
Intercept (yy_booth) | 3,134.29 | 77.12 | 52 | 40.64 | <2 × 10−16*** |
yy_zoom | −1,699.08 | 133.57 | 52 | −12.72 | <2 × 10−16*** |
Linear model for low /a/, manually corrected (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 887.53 | 11.29 | 94 | 78.64 | <2 × 10−16*** |
yy_zoom | −12.94 | 20.53 | 94 | −0.63 | =0.53 |
F2 | |||||
Intercept (yy_booth) | 1,467.545 | 23.848 | 94 | 61.538 | <2 × 10−16*** |
yy_zoom | 9.752 | 43.390 | 94 | 0.225 | =0.823 |
F3 | |||||
Intercept (yy_booth) | 3,075.40 | 42.91 | 94 | 71.673 | <2 × 10−16*** |
yy_zoom | −50.65 | 78.07 | 94 | −0.649 | =0.518 |
Linear model for mid front /e/, manually corrected (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 496.01 | 18.53 | 24 | 26.772 | <2 × 10−16*** |
yy_zoom | 21.23 | 31.49 | 24 | 0.674 | =0.507 |
F2 | |||||
Intercept (yy_booth) | 2,213.061 | 60.210 | 24 | 36.756 | <2 × 10−16*** |
yy_zoom | −4.594 | 102.337 | 24 | −0.045 | =0.965 |
F3 | |||||
Intercept (yy_booth) | 3,126.22 | 48.55 | 24 | 64.395 | <2 × 10−16*** |
yy_zoom | −200.58 | 82.51 | 24 | −2.431 | =0.0229* |
Linear model for high front /i/, manually corrected (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 356.243 | 5.427 | 66 | 65.637 | <2 × 10−16*** |
yy_zoom | 49.827 | 10.008 | 66 | 4.979 | =4.86 × 10−6*** |
F2 | |||||
Intercept (yy_booth) | 2,657.40 | 20.16 | 66 | 131.819 | <2 × 10−16*** |
yy_zoom | −108.89 | 37.17 | 66 | −2.929 | =0.00466** |
F3 | |||||
Intercept (yy_booth) | 3,348.94 | 31.47 | 66 | 106.428 | <2 × 10−16*** |
yy_zoom | −263.06 | 58.02 | 66 | −4.534 | =2.5 × 10−5*** |
Linear model for mid central /ə/, manually corrected (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 499.11 | 23.09 | 22 | 21.616 | =2.6 × 10−16*** |
yy_zoom | 85.95 | 46.18 | 22 | 1.861 | =0.0761 |
F2 | |||||
Intercept (yy_booth) | 1,670.06 | 38.96 | 22 | 42.866 | <2 × 10−16*** |
yy_zoom | −25.99 | 77.92 | 22 | −0.334 | =0.742 |
F3 | |||||
Intercept (yy_booth) | 3,274.00 | 56.78 | 22 | 57.661 | <2 × 10−16*** |
yy_zoom | −83.64 | 113.56 | 22 | −0.736 | =0.469 |
Linear model for high back /u/, manually corrected (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 385.99 | 10.66 | 38 | 36.206 | <2 × 10−16*** |
yy_zoom | 87.75 | 25.48 | 38 | 3.443 | =0.00141** |
F2 | |||||
Intercept (yy_booth) | 945.22 | 25.95 | 38 | 36.424 | <2 × 10−16*** |
yy_zoom | 62.34 | 62.34 | 38 | 1.005 | =0.321 |
F3 | |||||
Intercept (yy_booth) | 2,929.19 | 52.36 | 38 | 55.94 | <2 × 10−16*** |
yy_zoom | −41.27 | 125.17 | 38 | −0.33 | =0.743 |
Linear model for mid back /o/, manually corrected (F1/F2/F3 ∼ Recording Condition).
Estimate | SE | df | t | p | |
---|---|---|---|---|---|
F1 | |||||
Intercept (yy_booth) | 541.57 | 14.36 | 52 | 37.712 | <2 × 10−16*** |
yy_zoom | 81.15 | 24.87 | 52 | 3.262 | =0.00195** |
F2 | |||||
Intercept (yy_booth) | 976.440 | 17.009 | 52 | 57.406 | <2 × 10−16*** |
yy_zoom | 6.449 | 29.461 | 52 | 0.219 | =0.828 |
F3 | |||||
Intercept (yy_booth) | 3,140.19 | 44.50 | 52 | 70.560 | <2 × 10−16*** |
yy_zoom | 57.63 | 77.08 | 52 | 0.748 | =0.458 |
References
Boersma, Paul & David Weenink. 2021. Praat: Doing phonetics by computer, version 6.0.49 [Computer program]. Available at: http://www.praat.org/.Search in Google Scholar
Bulgin, James, Paul De Decker & Jennifer Nycz. 2010. Reliability of formant measurements from lossy compressed audio. Paper presented at the British Association of Academic Phoneticians Colloquium, University of Westminster, 29–31 March.Search in Google Scholar
Calder, Jeremy & Rebecca Wheeler. 2022. Is Zoom viable for sociophonetic research? A comparison of in-person and online recordings for sibilant analysis. Linguistics Vanguard 8. pp 20210014. https://doi.org/10.1515/lingvan-2021-0014.Search in Google Scholar
Calder, Jeremy, Rebecca Wheeler, Sarah Adams, Daniel Amarelo, Katherine Arnold-Murray, Justin Bai, Meredith Church, Josh Daniels, Sarah Gomez, Jacob Henry, Yunan Jia, Brienna Johnson-Morris, Kyo Lee, Kit Miller, Derrek Powell, Caitlin Ramsey-Smith, Sydney Rayl, Sara Rosenau & Nadine Salvador. 2022. Is Zoom viable for sociophonetic research? A comparison of in-person and online recordings for vocalic analysis. Linguistics Vanguard 8. pp 20200148. https://doi.org/10.1515/lingvan-2020-0148.Search in Google Scholar
Chafe, Wallace (ed.). 1980. The pear stories: Cognitive, cultural and linguistic aspects of narrative production. Norwood, NJ: Ablex.Search in Google Scholar
Corretge, Ramon. 2012. Praat vocal toolkit [Computer program]. http://www.praatvocaltoolkit.com/ (accessed 12 February 2021).Search in Google Scholar
Cui, Zhenhua. 1998. Yiyang fangyan yanjiu [A study of the Yiyang dialect]. Changsha: Hunan Education Press.Search in Google Scholar
De Decker, Paul & Jennifer Nycz. 2011. For the record: Which digital media can be used for sociophonetic analysis? University of Pennsylvania Working Papers in Linguistics 17(2). 51–59.Search in Google Scholar
Freeman, Valerie & Paul De Decker. 2021. Remote sociophonetic data collection: Vowels and nasalization over video conferencing apps. Journal of the Acoustical Society of America 149(2). 1211–1223. https://doi.org/10.1121/10.0003529.Search in Google Scholar
Ge, Chunyu, Yixuan Xiong & Peggy Mok. 2021. How reliable are phonetic data collected remotely? Comparison of recording devices and environments on acoustic measurements. Proceedings of Interspeech 2021. 3984–3988. https://doi.org/10.21437/Interspeech.2021-1122.Search in Google Scholar
Mayorga, Pedro, Laurent Besacier, Richard Lamy & J.-F. Serignat. 2003. Audio packet loss over IP and speech recognition. In 2003 IEEE workshop on automatic speech recognition and understanding (IEEE cat. no. 03EX721), 607–612. St. Thomas, VI: IEEE.10.1109/ASRU.2003.1318509Search in Google Scholar
Norman, Jerry. 1988. Chinese. Cambridge: Cambridge University Press.Search in Google Scholar
R Core Team. 2021. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.Search in Google Scholar
Salomon, David. 2007. A concise introduction to data compression. London: Springer Science & Business Media.10.1007/978-1-84800-072-8Search in Google Scholar
Sanker, Chelsea, Sarah Babinski, Roslyn Burns, Marisha Evans, Jeremy Johns, Juhyae Kim, Slater Smith, Natalie Weber & Claire Bowern. 2021. (Don’t) try this at home! The effects of recording devices and software on phonetic analysis. Language 97(4). e360–e382. https://doi.org/10.1353/lan.2021.0079.Search in Google Scholar
Stanley, Joey. 2019. Automatic formant extraction in Praat. https://joeystanley.com/downloads/191002-formant_extraction.html (accessed 21 February 2021).Search in Google Scholar
Thomas, Erik & Tyler Kendall. 2007. NORM: The vowel normalization and plotting suite. Eugene, OR: University of Oregon. http://ncslaap.lib.ncsu.edu/tools/norm/ (accessed 21 February 2021).Search in Google Scholar
Wu, Yunji. 2005. Synchronic and diachronic study of the grammar of the Chinese Xiang dialects. Berlin: Mouton de Gruyter.10.1515/9783110927481Search in Google Scholar
Zhang, Cong, Kathleen Jepson, Georg Lohfink & Amalia Arvaniti. 2021. Comparing acoustic analyses of speech data collected remotely. Journal of the Acoustical Society of America 149(6). 3910–3916. https://doi.org/10.1121/10.0005132.Search in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- Getting “good” data in a pandemic, part 1: assessing the validity and quality of data collected remotely
- Yiyang Xiang vowel quality: Comparability across two recording media
- Using social media as a source of analysable material in phonetics and phonology – lenition in Spanish
- Perception of regional and nonnative accents: a comparison of museum laboratory and online data collection
- Brazilian Portuguese-Russian (BraPoRus) corpus: automatic transcription and acoustic quality of elderly speech during the COVID-19 pandemic
- Outsourcing teenage language: a participatory approach for exploring speech and text messaging
Articles in the same Issue
- Frontmatter
- Research Articles
- Getting “good” data in a pandemic, part 1: assessing the validity and quality of data collected remotely
- Yiyang Xiang vowel quality: Comparability across two recording media
- Using social media as a source of analysable material in phonetics and phonology – lenition in Spanish
- Perception of regional and nonnative accents: a comparison of museum laboratory and online data collection
- Brazilian Portuguese-Russian (BraPoRus) corpus: automatic transcription and acoustic quality of elderly speech during the COVID-19 pandemic
- Outsourcing teenage language: a participatory approach for exploring speech and text messaging