Startseite Linguistik & Semiotik Automatic transcription of the Polish newsreel
Artikel
Lizenziert
Nicht lizenziert Erfordert eine Authentifizierung

Automatic transcription of the Polish newsreel

  • Danijel Koržinek EMAIL logo , Krzysztof Wołk , Łukasz Brocki und Krzysztof Marasek
Veröffentlicht/Copyright: 17. August 2019

Abstract

This paper describes an automatic transcription system for the Polish Newsreel, which is a collection of mid to late 20th century news segments presented in audio and video form. They are characterized by their use of archaic language and poor audio quality, which makes them a demanding problem for speech recognition systems. Acoustic and language models had to be retrained using data from in-domain corpora. During the adaptation of the models, experiments were carried out to select optimal adaptation parameters. The experiments showed that the adaptation of the speech recognition system to a narrow and clearly defined domain significantly increases its efficiency. The final word error rate obtained for this domain was 10.97%.


Danijel Koržinek Polish-Japanese Academy of Information Technology Koszykowa 86 02-008 Warszawa Poland

References

Bilmes, J.A. and K. Kirchhoff. 2003. “Factored language models and generalized parallel backoff”. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion volume of the Proceedings of HLT-NAACL 2003 – short papers – volume 2 Association for Computational Linguistics. 4–6.10.3115/1073483.1073485Suche in Google Scholar

Brocki, Ł., K. Marasek and D. Koržinek. 2012a. “Connectionist language model for Polish”. Intelligent tools for building a scientific information platform Berlin: Springer. 243–250.10.1007/978-3-642-24809-2_15Suche in Google Scholar

Brocki, Ł., K. Marasek and D. Koržinek. 2012b. “Multiple model text normalization for the Polish language”. International Symposium on Methodologies for Intelligent Systems Berlin: Springer. 143–148.10.1007/978-3-642-34624-8_17Suche in Google Scholar

Chorowski, J.K, D. Bahdanau, D. Serdyuk, K. Cho and Y. Bengio. 2015. “Attention-based models for speech recognition”. Advances in neural information processing systems. 577–585.Suche in Google Scholar

Cui, X., V. Goel and B. Kingsbury. 2015. “Data augmentation for deep neural network acoustic modeling”. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23(9). 1469–1477.10.1109/TASLP.2015.2438544Suche in Google Scholar

Demenko, G., S. Grocholewski, K. Klessa, J. Ogórkiewicz, A. Wagner, M. Lange, D. Śledziński and N. Cylwik. 2008. “JURISDIC: Polish speech database for taking dictation of legal texts”. Proceedings of LREC .Suche in Google Scholar

Gage, P. 1994. “A new algorithm for data compression”. The C Users Journal 12(2). 23–38.Suche in Google Scholar

Graves, A. and J. Schmidhuber. 2005. “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks 18(5–6). 602–610.10.1016/j.neunet.2005.06.042Suche in Google Scholar

Joshua, T. and J.T. Goodman. 2001. “A bit of progress in language modeling extended version”. Machine. Learning and Applied Statistics Group Microsoft Research Technical Report MSR-TR-2001-72.Suche in Google Scholar

Koržinek, D. 2017. “Transkrypcja fonetyczna kronik RP” [Phonetic transcription of the Polish newsreel]. <http://hdl.handle.net/11321/426>Suche in Google Scholar

Marasek, K., D. Koržinek and Ł. Brocki. 2014. “System for automatic transcription of sessions of the Polish Senate”. Archives of Acoustics 39(4). 501–509.10.2478/aoa-2014-0054Suche in Google Scholar

Maziarz, M., M. Piasecki and S. Szpakowicz. 2012. “Approaching plWordNet 2.0”. Proceedings of 6th International Global Wordnet Conference The Global Wordnet Association. 189–196.Suche in Google Scholar

Mikolov, T., I. Sutskever, K. Chen, G. S. Corrado and J. Dean. 2013. “Distributed representations of words and phrases and their compositionality”. Advances in Neural Information Processing Systems 3111–3119.Suche in Google Scholar

Pan, S.J., Q. Yang et al. 2010. “A survey on transfer learning”. IEEE Transactions on Knowledge and Data Engineering 22(10). 1345–1359.10.1109/TKDE.2009.191Suche in Google Scholar

Pawłowski, A. 2016. “Chronological corpora: Challenges and opportunities of sequential analysis. The example of ChronoPress corpus of Polish”. Digital Humanities 2016. 311–313.Suche in Google Scholar

Sennrich, R., B. Haddow and A. Birch. 2015. “Neural machine translation of rare words with subword units”. arXiv preprint arXiv:1508.07909.10.18653/v1/P16-1162Suche in Google Scholar

Shannon, C.E. and W. Weaver. 1949. The mathematical theory of information Urbana, IL: University of Illinois Press.Suche in Google Scholar

Soutner, D. and L. Müller. 2015. “On continuous space word representations as input of LSTM language model”. International Conference on Statistical Language and Speech Processing. Berlin: Springer. 267–274.10.1007/978-3-319-25789-1_25Suche in Google Scholar

Stolcke, A. 2000. “Entropy-based pruning of backoff language models”. arXiv preprint cs/0006025.Suche in Google Scholar

Stolcke, A. 2002. “SRILM – An extensible language modeling toolkit”. Seventh International Conference on Spoken Language Processing10.21437/ICSLP.2002-303Suche in Google Scholar

Sundermeyer, M., R. Schlüter and H. Ney. 2012. “LSTM neural networks for language modeling”. Thirteenth Annual Conference of the International Speech Communication Association10.21437/Interspeech.2012-65Suche in Google Scholar

Tiedemann, J. 2009. “News from OPUS-A collection of multilingual parallel corpora with tools and interfaces”. Recent advances in natural language processing (vol. 5). 237–248.10.1075/cilt.309.19tieSuche in Google Scholar

Wang, D. and T.F. Zheng. 2015. “Transfer learning for speech and language processing”. arXiv preprint arXiv:1511.06066.Suche in Google Scholar

Werbos, P. 1990. “Backpropagation through time: What it does and how to do it”. Proceedings of the IEEE 78(10). 1550–1560.10.1109/5.58337Suche in Google Scholar

Wołk, A., K. Wołk and K. Marasek. 2017. “Analysis of complexity between spoken and written language for statistical machine translation in West-Slavic group”. Multimedia and network information systems. Berlin: Springer. 251–260.10.1007/978-3-319-43982-2_22Suche in Google Scholar

Wołk, K. and K. Marasek. 2013. “Polish–English speech statistical machine translation systems for the IWSLT 2013”. IWSLT 2013 Conference Proceedings 113–119.Suche in Google Scholar

Wołk, K. and K. Marasek. 2014. “Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs”. Procedia Technology 18. 126–132.10.1016/j.protcy.2014.11.024Suche in Google Scholar

Wołk, K. and A. Wołk. 2018. “Augmenting SMT with semantically-generated virtual-parallel corpora from monolingual texts”. World Conference on Information Systems and Technologies Berlin: Springer. 358–374.10.1007/978-3-319-77703-0_37Suche in Google Scholar

Wołk, K., A. Wołk and K. Marasek. 2017. “Big data language model of contemporary Polish”. 2017 Federated Conference on Computer Science and Information SystemsFEDCSIS 389–395.10.15439/2017F432Suche in Google Scholar

Xiong, W., J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu and G. Zweig. 2016. “Achieving human parity in conversational speech recognition”. arXiv preprint arXiv:1610.05256.10.1109/TASLP.2017.2756440Suche in Google Scholar

Ziółko, B., T. Jadczyk, D. Skurzok, P. Żelasko, J. Gałka, T. Pędzimąż, I. Gawlik and S. Pałka. 2015. “SARMATA 2.0 automatic Polish language speech recognition system”. Sixteenth Annual Conference of the International Speech Communication AssociationSuche in Google Scholar

Published Online: 2019-08-17
Published in Print: 2019-06-26

© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland

Heruntergeladen am 13.12.2025 von https://www.degruyterbrill.com/document/doi/10.1515/psicl-2019-0008/html
Button zum nach oben scrollen