Abstract
This article introduces the issue of recognition and normalisation of temporal expressions for the Polish language. We describe what temporal information is and we present TimeML specification, adapted to Polish as a model for the description of temporal expressions. Classes of temporal expressions are presented as well as guidelines for annotation, normalisation of these expressions and our approach to corpus annotation and temporal expressions recognition. The key aspect of the work is the description of the features used for the recognition, the use of the method for selection and creation of feature templates for the model of Conditional Random Fields. We demonstrate the experiments and conclusions drawn from them.
9 Acknowledgements
Work financed as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.
References
Allen, J. 1995. Natural language understanding. (2nd ed.). Redwood City, CA: Benjamin-Cummings Publishing Co., Inc.Suche in Google Scholar
Benthem, J. van. 1983. The logic of time : A model-theoretic investigation into the varieties of temporal ontology and temporal discourse Dordrecht: Reidel.10.1007/978-94-010-9868-7Suche in Google Scholar
Bethard, S. 2013. “ClearTK-TimeML: A minimalist approach to TempEval 2013”. Atlanta: Association for Computational Linguistics. 10–14. <http://www.aclweb.org/anthology/S13-2002>Suche in Google Scholar
Broda, B. M. Marcińczuk, M. Maziarz, A. Radziszewski and A. Wardyński. 2012. “KPWr: towards a free corpus of Polish”. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul: European Language Resources Association (ELRA).Suche in Google Scholar
Dietterich, T. G. 1998. “Approximate statistical tests for comparing supervised classification learning algorithms”. Neural Computation 10. 1895–1923.10.1162/089976698300017197Suche in Google Scholar
Ferro, L. 2001. “Instruction manual for the annotation of temporal expressions”. In: Fleiss, J.L, B. Levin and M. C. Paik. 2013. Statistical methods for rates and proportions Oxford: John Wiley & Sons.Suche in Google Scholar
Hripcsak, G. and A.S. Rothschild. 2005. “Technical brief: Agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3). 296–298. doi:10.1197/jamia.M1733.10.1197/jamia.M1733Suche in Google Scholar
ISO. 2004. ISO 8601:2004. Data elements and interchange formats – Information interchange – Representation of dates and times<http://www.iso.ch/cate/d26780dhtml>Suche in Google Scholar
Kocoń, J. and M. Marcińczuk. 2015. “Recognition of Polish temporal expressions”. Proceedings of the Recent Advances in Natural Language Processing. 282–290.Suche in Google Scholar
Kocoń, J. and M. Marcińczuk. 2017. “Supervised approach to recognise Polish temporal expressions and rule-based interpretation of timexes”. Natural Language Engineering 23(3). 385–418. doi:10.1017/S1351324916000255.10.1017/S1351324916000255Suche in Google Scholar
Kocoń, J. and M. Marcińczuk. 2017. “Improved recognition and normalisation of Polish temporal expressions. Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 387–393.10.26615/978-954-452-049-6_051Suche in Google Scholar
Kocoń, J., M. Marcińczuk, M. Oleksy, T. Bernaś and M. Wolski. 2015. “Temporal expressions in Polish Corpus KPWr”. Cognitive Studies – Etudes Cognitives 15. Warszawa: SOW.10.11649/cs.2015.020Suche in Google Scholar
Lafferty, J.D., A. McCallum and F. C. N. Pereira. 2001. “Conditional random fields: Probabilistic models for segmenting and labeling sequence data”. Proceedings of the eighteenth international conference on machine learning (ICML ’01). San Francisco, CA: Morgan Kaufmann. 282–289. <http://dl.acm.org/citation.cfm?id=645530.655813>Suche in Google Scholar
Li, D., K. Kipper-Schuler and G. Savova. 2008. “Conditional random fields and support vector machines for disorder named entity recognition in clinical texts”. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing Association for Computational Linguistics. 94–95.10.3115/1572306.1572326Suche in Google Scholar
Llorens, H., E. Saquete and B. Navarro-Colorado. 2010. “TimeML events recognition and classification: Learning CRF models with semantic roles”. Proceedings of the 23rd international conference on computational linguistics (COLING ’10). Stroudsburg, PA: Association for Computational Linguistics. 725–733. <http://dl.acm.org/citation.cfm?id=1873781.1873863>Suche in Google Scholar
Llorens, H., E. Saquete and B. Navarro-Colorado. 2013. “Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language”. Information Processing & Management 49(1). 179–197.10.1016/j.ipm.2012.05.005Suche in Google Scholar
Marcińczuk, M. and J. Kocoń. 2013. “Recognition of named entities boundaries in Polish texts”. Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing Sofia: Association for Computational Linguistics. 94–99. <http://www.aclweb.org/anthology/W13-2414>Suche in Google Scholar
Marcińczuk, M., J. Kocoń and B. Broda. 2012. “Inforex – a web-based tool for text corpus management and semantic annotation”. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul: European Language Resources Association (ELRA).Suche in Google Scholar
Marcińczuk, M., J. Kocoń and M. Janicki. 2013. “Liner2 – A customizable framework for proper names recognition for Polish”. In: Bembenik, R., Ł. Skonieczny, H. Rybiński, M. Kryszkiewicz and M. Niezgódka (eds.), Intelligent tools for building a scientific information platform. Berlin: Springer. 231253 doi:10.1007/978-3-642-35647-6_17.10.1007/978-3-642-35647-6_17Suche in Google Scholar
Maziarz, M., M. Piasecki, E. Rudnicka and S. Szpakowicz. 2013. “Beyond the transfer-and-merge wordnet construction: plWordNet and a comparison with WordNet”. <http://www.informatik.uni-trier.de/~ley/pers/hd/m/Maziarz:Marek>Suche in Google Scholar
Mazur, P. 2012. Broad-coverage rule-based processing of temporal expressions. (PhD dissertation, Wrocław University of Science and Technology.)Suche in Google Scholar
Mazur, P. and R. Dale. 2011. “LTIMEX: Representing the local semantics of temporal expressions”. Federated Conference on Computer Science and Information Systems (FEDCSIS), 2011 IEEE. 201–208.Suche in Google Scholar
Miller, G.A. 1995. “WordNet: A lexical database for English”. Communications of the ACM 38(11). 39–41.10.1145/219717.219748Suche in Google Scholar
Piasecki, M., M. Maziarz, S. Szpakowicz and E. Rudnicka. 2014. “PLWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources”. <http://aclanthology.info/events/gwc-2014>Suche in Google Scholar
Piasecki, M., S. Szpakowicz and B. Broda. 2009. A wordnet from the ground up Wrocław: Oficyna Wydawnicza Politechniki Wrocławskiej. <http://www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf>Suche in Google Scholar
Przepiórkowski, A. 2009. “A comparison of two morphosyntactic tagsets of Polish”. Representing semantics in digital lexicography: Proceedings of mondilex fourth open workshop Citeseer. 138–144.Suche in Google Scholar
Pustejovsky, J., B. Ingria, R. Sauri, J. Castano, J. Littman, R. Gaizauskas, A. Setzer, G. Katz and I. Mani. 2005. “The specification language TimeML”. The language of time: A reader Oxford: Oxford University Press. 545–557.Suche in Google Scholar
Pustejovsky, J., R. Knippen, J. Littman and R. Saurı́. 2005. “Temporal and event information in natural language text”. Language Resources and Evaluation 39(2–3). 123–164. doi:10.1007/s10579-005-7882-7.10.1007/s10579-005-7882-7Suche in Google Scholar
Radziszewski, A., M. Maziarz and J. Wieczorek. 2012. “Shallow syntactic annotation in the Corpus of Wrocław University of Technology”. Cognitive Studies 12.10.11649/cs.2012.010Suche in Google Scholar
Saquete, E., R. Muñoz and P. Martı́nez-Barco. 2003. “TERSEO: Temporal expression resolution system applied to event ordering”. In: Matoušek, V. and P. Mautner (eds.), Text, speech and dialogue Berlin: Springer. 220–228. doi:10.1007/978-3-540-39398-6_31.10.1007/978-3-540-39398-6_31Suche in Google Scholar
Saurı́, R., J. Littman, R. Gaizauskas, A. Setzer and J. Pustejovsky. 2006. “TimeML annotation guidelines”. Version 1.2.1.Suche in Google Scholar
Strötgen, J., and M. Gertz. 2013. “Multilingual and cross-domain temporal tagging”. Language Resources and Evaluation 47(2). 269–298. doi:10.1007/s10579-012-9179-y.10.1007/s10579-012-9179-ySuche in Google Scholar
Strötgen, J., J. Zell and M. Gertz. 2013. “HeidelTime: Tuning English and developing Spanish resources for TempEval-3”. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SEMEVAL 2013) Atlanta: Association for Computational Linguistics. 15–19. <http://www.aclweb.org/anthology/S13-2003>Suche in Google Scholar
UzZaman, N. and J.F Allen. 2010. “TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text”. Proceedings of the 5th International Workshop on Semantic Evaluation Association for Computational Linguistics. 276–283.Suche in Google Scholar
UzZaman, N., H. Llorens, L. Derczyński, J. Allen, M. Verhagen and J. Pustejovsky. 2013. “Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations”. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SEMEVAL 2013) Atlanta: Association for Computational Linguistics. 1–9.Suche in Google Scholar
© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland
Artikel in diesem Heft
- Frontmatter
- Foreword
- Automatic transcription of the Polish newsreel
- Part of speech tagging for Polish
- Named entity recognition for Polish
- Recognition and normalisation of temporal expressions using conditional random fields and cascade of partial rules
- Dependency parsing of Polish
- A Weakly supervised word sense disambiguation for Polish using rich lexical resources
- Nominal coreference resolution for Polish
- Three-step coreference-based summarizer for Polish news texts
- Sentiment analysis for Polish
- Semantic approach for building generated virtual-parallel corpora from monolingual texts
- Statistical versus neural machine translation – a case study for a medium size domain-specific bilingual corpus
Artikel in diesem Heft
- Frontmatter
- Foreword
- Automatic transcription of the Polish newsreel
- Part of speech tagging for Polish
- Named entity recognition for Polish
- Recognition and normalisation of temporal expressions using conditional random fields and cascade of partial rules
- Dependency parsing of Polish
- A Weakly supervised word sense disambiguation for Polish using rich lexical resources
- Nominal coreference resolution for Polish
- Three-step coreference-based summarizer for Polish news texts
- Sentiment analysis for Polish
- Semantic approach for building generated virtual-parallel corpora from monolingual texts
- Statistical versus neural machine translation – a case study for a medium size domain-specific bilingual corpus