Home From image to text to speech: the effects of speech prosody on information sequencing in audio description
Article
Licensed
Unlicensed Requires Authentication

From image to text to speech: the effects of speech prosody on information sequencing in audio description

  • Maija Hirvonen

    Maija Hirvonen (PhD, 2014) is a tenure track professor (associate level) in German Language, Culture and Translation at Tampere University (Finland), and an external researcher at University of Helsinki (Finland). Her research focuses on accessibility (in particular audio description), multimodal and intermodal translation, multimodal interaction, and shared cognition.

    ORCID logo EMAIL logo
    and Mari Wiklund

    Mari Wiklund (née Lehtinen) (PhD, 2009) works as a university lecturer in the field of French philology at the Department of Languages of the University of Helsinki (Finland). Her current research interests include e.g. prosody (French, Finnish), foreign accent, interaction of persons with autism spectrum disorder, comprehension problems, conversational repairs, nonverbal communication, disfluencies of speech and asymmetric interaction. Her methodological background is mainly in conversation analysis and prosodic analysis.

    ORCID logo
Published/Copyright: February 4, 2021

Abstract

Given the extensive body of research in audio description – the verbal-vocal description of visual or audiovisual content for visually impaired audiences – it is striking how little attention has been paid thus far to the spoken dimension of audio description and its para-linguistic, prosodic aspects. This article complements the previous research into how audio description speech is received by the partially sighted audiences by analyzing how it is performed vocally. We study the audio description of pictorial art, and one aspect of prosody is examined in detail: pitch, and the segmentation of information in relation to it. We analyze this relation in a corpus of audio described pictorial art in Finnish by combining phonetic measurements of the pitch with discourse analysis of the information segmentation. Previous studies have already shown that a sentence-initial high pitch acts as a discourse-structuring device in interpreting. Our study shows that the same applies to audio description. In addition, our study suggests that there is a relationship between the scale in the rise of pitch and the scale of the topical transition. That is, when the topical transition is clear, the rise of pitch level between the beginnings of two consecutive spoken sentences is large. Analogically, when the topical transition is small, the change of the sentence-initial pitch level is also rather small.


Corresponding author: Maija Hirvonen, Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland, E-mail:

Funding source: Academy of Finland

Award Identifier / Grant number: 295104

Funding source: Helsinki Collegium for Advanced Studies

About the authors

Maija Hirvonen

Maija Hirvonen (PhD, 2014) is a tenure track professor (associate level) in German Language, Culture and Translation at Tampere University (Finland), and an external researcher at University of Helsinki (Finland). Her research focuses on accessibility (in particular audio description), multimodal and intermodal translation, multimodal interaction, and shared cognition.

Mari Wiklund

Mari Wiklund (née Lehtinen) (PhD, 2009) works as a university lecturer in the field of French philology at the Department of Languages of the University of Helsinki (Finland). Her current research interests include e.g. prosody (French, Finnish), foreign accent, interaction of persons with autism spectrum disorder, comprehension problems, conversational repairs, nonverbal communication, disfluencies of speech and asymmetric interaction. Her methodological background is mainly in conversation analysis and prosodic analysis.

Acknowledgements

The authors thank Academy of Finland (the research project Multimodal Translation with the Blind, grant number 295104) and Helsinki Collegium for Advanced Studies for the financial and scientific support of this research. We are also grateful to Ateneum and Sara Hildén Art Museum for providing us with the valuable research data.

Appendix

Transcription symbols

(0.4)

A pause and its duration (seconds)

(.)

A micropause (less than 0.2 s)

.

Falling intonation

;

Slightly falling intonation

,

Continuing intonation

?

Rising intonation

¿

Slightly rising intonation

METsäaukealla Raised pitch level

TUIjottavat

Stressed syllable

.hhh

Inbreath

References

Aho, Eija. 2010. Spontaanin puheen prosodinen jaksottelu [Prosodic segmentation of spontaneous speech]. Helsinki: University of Helsinki dissertation. http://urn.fi/URN (accessed 21 July 2020).Search in Google Scholar

Arminen, Ilkka. 2016. Institutional interaction: Studies of talk at work. New York: Routledge. http://search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,ip,uid&db=nlebk&AN=1480500&site=ehost-live&scope=site (accessed 21 July 2020).Search in Google Scholar

Ateneum. 2019. Kuvailutulkkaukset [Audio descriptions]. https://ateneum.fi/opastukset/kuvailutulkkaukset/# (accessed 21 July 2020).Search in Google Scholar

Boersma, Paul & David Weenink. 2017. Praat:: Doing phonetics by computer [Computer program]. Version 6.0.27. http://www.praat.org/ (accessed 17 March 2017).Search in Google Scholar

Bolinger, Dwight. 1998. Intonation in American English. In Daniel Hirst & Albert Di Cristo (eds.), Intonation systems. A survey of twenty languages, 45–55. Cambridge: Cambridge University Press.10.2307/487243Search in Google Scholar

Campbell, Nick & Ya Li. 2015. Expressivity in interactive speech synthesis; some paralinguistic and nonlinguistic issues of speech prosody for conversational dialogue systems. In Keikichi Hirose & Jianhua Tao (eds.), Speech prosody in speech synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis, 97–107. Berlin: Springer.10.1007/978-3-662-45258-5_7Search in Google Scholar

Chafe, Wallace L. 1980. The deployment of consciousness in the production of narrative. In Chafe Wallace (ed.), The Pear stories: cognitive, cultural and linguistic aspects of narrative production, 9–50. Norwood, NJ: Ablex.Search in Google Scholar

Chafe, Wallace L. 1994. Discourse, consciousness and time: The flow and conscious experience in writing and speaking. Chicago: The University of Chicago Press.Search in Google Scholar

Couper-Kuhlen, Elizabeth. 1986. An introduction to English prosody. Tübingen/London: Niemeyer/Arnold.Search in Google Scholar

Couper-Kuhlen, Elizabeth. 2000. Prosody. In Verschueren Jef, Jan-Ola Östman, Blommaert Jan & Chris Bulcaen (eds.), Handbook of pragmatics, 1–19. Amsterdam/Philadelphia: John Benjamins. https://doi.org/10.1075/hop (accessed 21 July 2020).Search in Google Scholar

Couper-Kuhlen, Elizabeth. 2006. Prosodic cues of discourse units. In Keith Brown (ed.), Encyclopedia of language & linguistics, 2nd edn., 178–182. https://doi.org/10.1016/B0-08-044854-2/00588-5 (accessed 21 July 2020).Search in Google Scholar

Crystal, David. 1969. Prosodic systems and intonation in English. Cambridge: Cambridge University Press.Search in Google Scholar

Crystal, David. 1980. A first dictionary of linguistics and phonetics. London: Deutsch.Search in Google Scholar

De Coster, Karin & Volkmar Mühleis. 2007. Intersensorial translation. Visual art made up by words. In Jorge Díaz-Cintas, Pilar Orero & Remael Aline (eds.), Media for all: Subtitling for the deaf, audio description and sign language, 189–200. Amsterdam: Rodopi.10.1163/9789401209564_014Search in Google Scholar

Fernandéz-Torné, Anna & Matamala Anna. 2015. Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into Catalan. JosTrans 24. 61–88.Search in Google Scholar

Fix, Ulla (ed.). 2005. Hörfilm: Bildkompensation durch Sprache. Berlin: Erich Schmidt.Search in Google Scholar

Fresno, Nazaret. 2014. Is a picture worth a thousand words? The role of memory in audio description. Across Langauges and Cultures 15(1). 111–129. https://doi.org/10.1556/acr.15.2014.1.6.Search in Google Scholar

Fryer, Louise. 2016. An introduction to audio description. A practical guide. London: Routledge.10.4324/9781315707228Search in Google Scholar

Gutenberg, Norbert. 2000. Mündlich realisierte schriftkonstituierte Textsorten (mrskT). In Klaus Brinker, Gerd Antos, Wolfgang Heinemann & Sven F. Sager (eds.), Text- und Gesprächslinguistik/Linguistics of text and conversation (Halbbd. 1/Vol. 1), 574–582. Berlin: Gruyter. http://search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,ip,uid&db=nlebk&AN=186385&site=ehost-live&scope=site (accessed 21 July 2020).10.1515/9783110194067-054Search in Google Scholar

Hirst, Daniel. 1998. Intonation in British English. In Daniel Hirst & Albert Di Cristo (eds.), Intonation systems. A survey of twenty languages, 56–77. Cambridge: Cambridge University Press.Search in Google Scholar

Hirvonen, Maija. 2012. Contrasting visual and verbal cueing of space: Strategies and devices in the audio description of film. New Voices in Translation Studies 8. 21–43.Search in Google Scholar

Hirvonen, Maija. 2014. Multimodal representation and intermodal similarity: Cues of space in the audio description of film. Helsinki: University of Helsinki dissertation. http://urn.fi/URN (accessed 21 July 2020).Search in Google Scholar

Iglesias-Fernández, Emilia, Silvia Martínez-Martínez & Antonio Javier Chica Núñez. 2015. Cross-fertilization between reception studies in audio description and interpreting quality assessment: The role of the describer’s voice. In Jorge Díaz-Cintas & Rocío Piñero-Baños (eds.), Audiovisual translation in a global context, 72–94. London: Palgrave Macmillan.10.1057/9781137552891_5Search in Google Scholar

Iivonen, Antti. 1998. Intonation in Finnish. In Daniel Hirst & Albert Di Cristo (eds.), Intonation systems. A survey of twenty languages, 311–327. Cambridge: Cambridge University Press.Search in Google Scholar

Kluckhohn, Kim. 2005. Informationsstrukturierung als Kompensationsstrategie – Audiodeskription und Syntax. In Ulla Fix (ed.), Hörfilm: Bildkompensation durch Sprache, 49–65. Berlin: Erich Schmidt.Search in Google Scholar

Koskela, Anna. 2013. Aikuisten puhe- ja artikulaationopeus sekä artikulaationopeuden yhteys oraalimotorisiin taitoihin [Adults’ speech and articulation rates and the connection between the articulation rate and oral-motor skills]. Oulu: University of Oulu MA thesis. http://urn.fi/URN:NBN:fi:oulu-201312102031 (accessed 18 February 2019).Search in Google Scholar

Kreiman, Jody. 1982. Perception of sentence and paragraph boundaries in natural conversation. Journal of Phonetics 10(2). 163–175. https://doi.org/10.1016/s0095-4470(19)30955-6.Search in Google Scholar

Lambrecht, Knud. 1994. Information structure and sentence form: Topic, focus, and the representation of mental referents in discourse. Cambridge: Cambridge University Press.10.1017/CBO9780511620607Search in Google Scholar

Laver, John. 1994. Principles of phonetics. Cambridge: Cambridge University Press.10.1017/CBO9781139166621Search in Google Scholar

Lehtinen, Mari. 2010. The recategorisation of the rheme and the structure of the oral paragraph in French and in Finnish. Discours 7. https://doi.org/10.4000/discours.8007.Search in Google Scholar

Liebenthal, Einat, David A. Silbersweig & Emily Stern. 2016. The language, tone and prosody of emotions: Neural substrates and dynamics of spoken-word emotion perception. Frontiers of Neuroscience 10(506). https://doi.org/10.3389/fnins.2016.00506.Search in Google Scholar

Maszerowska, Anna, Matamala Anna & Pilar Orero (eds.). 2014. Audio description. New perspectives illustrated. Amsterdam: John Benjamins. http://search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,ip,uid&db=e000xww&AN=868017&site=ehost-live&scope=site (accessed 21 July 2020).10.1075/btl.112Search in Google Scholar

Mazur, Iwona & Jan-Louis Kruger. 2012. Pear Stories and audio description: Language, perception and cognition across cultures. Special issue of Perspectives: Studies in Translation Theory and Practice 20(1). 1–3. https://doi.org/10.1080/0907676x.2012.633769.Search in Google Scholar

Nafá Waasaf, María Lourdes. 2007. Intonation and the structural organisation of texts in simultaneous interpreting. Interpreting 9(2). 177–198. https://doi.org/10.1075/intp.9.2.03naf.Search in Google Scholar

Neves, Josélia. 2012. Multi-sensory approaches to (audio) describing visual art. MonTi 4. 277–293. https://doi.org/10.6035/MonTI.2012.4.12.Search in Google Scholar

Poethe, Hannelore. 2005. Audiodeskription – Entstehung und Wesen einer Textsorte. In Ulla Fix (ed.), Hörfilm: Bildkompensation durch Sprache, 33–48. Berlin: Erich Schmidt.Search in Google Scholar

Ramos, Marina. 2015. The emotional experience of films: Does audio description make a difference?. The Translator 21(1). 68–94. https://doi.org/10.1080/13556509.2014.994853.Search in Google Scholar

Remael, Aline, Nina Reviers & Gert Vercauteren (eds.). 2015. Pictures painted in words: ADLAB Audio Description guidelines. Trieste: Edizioni Università di Trieste. http://hdl.handle.net/10077/11838 (accessed 21 July 2020).Search in Google Scholar

Sluijter, Agaath & Jacques Terken. 1993. Beyond sentence prosody: paragraph intonation in Dutch. Phonetica 50. 180–188. https://doi.org/10.1159/000261938.Search in Google Scholar

Snyder, Joel. 2008. Audio description: The visual made verbal. In Jorge Díaz-Cintas (ed.), The didactics of audiovisual translation, 191–198. Amsterdam/Philadelphia: John Benjamins. http://search.ebscohost.com/login.aspx?direct=true&AuthType=cookie,ip,uid&db=e000xww&AN=243195&site=ehost-live&scope=site (accessed 21 July 2020).10.1075/btl.77.18snySearch in Google Scholar

Soler Gallego, Silvia. 2018a. Audio descriptive guides in art museums. A corpus-based semantic analysis. Translation and Interpreting Studies 13(2). 230–249. https://doi.org/10.1075/tis.00013.sol.Search in Google Scholar

Soler Gallego, Silvia. 2018b. Intermodal coherence in audio descriptive guided tours for art museums. Parallèles 30(2). 111–128.Search in Google Scholar

Szarkowska, Agnieska & Anna Jankowska. 2012. Text-to-speech audio description for voiced-over films. A case study of audio described Volver in Polish. In Elisa Perego (ed.), Emerging topics in translation: Audio description, 81–98. Trieste: Edizione Universita di Trieste. http://hdl.handle.net/10077/6356 (accessed 21 July 2020).Search in Google Scholar

Väyrynen, Eero. 2014. Emotion recognition from speech using prosodic features. Oulu: University of Oulu dissertation. http://jultika.oulu.fi/files/isbn9789526204048.pdf (accessed 21 July 2020).Search in Google Scholar

Walczak, Agnieszka & Louise Fryer. 2017. Creative description: The impact of audio description style on presence in visually impaired audiences. British Journal of Visual Impairment 35(1). 6–17. https://doi.org/10.1177/0264619616661603.Search in Google Scholar

Wichmann, Anne. 2000. Intonation in text and discourse: Beginnings, middles and ends. Harlow: Pearson Education Limited.Search in Google Scholar

Wiklund, Mari. 2014. The realization of pitch reset in Finnish print interpreting data. Text & Talk 34(4). 491–520. https://doi.org/10.1515/text-2014-0013.Search in Google Scholar

Wiklund, Mari. 2018. Indicating dependency between spoken sentences by prosodic means. Discours 22. https://doi.org/10.4000/discours.9675.Search in Google Scholar

Wilson, Deirdre & Tim Wharton. 2006. Relevance and prosody. Journal of Pragmatics 38(10). 1559–1579. https://doi.org/10.1016/j.pragma.2005.04.012.Search in Google Scholar

Received: 2019-04-18
Accepted: 2021-01-13
Published Online: 2021-02-04
Published in Print: 2021-05-26

© 2021 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 28.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/text-2019-0172/html
Scroll to top button