Recognition and normalisation of temporal expressions using conditional random fields and cascade of partial rules

Jan Kocoń; Tomasz Bernaś; Marcin Oleksy

doi:10.1515/psicl-2019-0011

Article

Recognition and normalisation of temporal expressions using conditional random fields and cascade of partial rules

Jan Kocoń , Tomasz Bernaś and Marcin Oleksy

Published/Copyright: August 17, 2019

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Poznan Studies in Contemporary Linguistics Volume 55 Issue 2

Abstract

This article introduces the issue of recognition and normalisation of temporal expressions for the Polish language. We describe what temporal information is and we present TimeML specification, adapted to Polish as a model for the description of temporal expressions. Classes of temporal expressions are presented as well as guidelines for annotation, normalisation of these expressions and our approach to corpus annotation and temporal expressions recognition. The key aspect of the work is the description of the features used for the recognition, the use of the method for selection and creation of feature templates for the model of Conditional Random Fields. We demonstrate the experiments and conclusions drawn from them.

Keywords: Temporal expressions; TimeML; Polish; recognition; information extraction

Jan Kocoń Wrocław University of Science and Technology Wybrzeże Stanisława Wyspianskiego 27 50-370 Wrocław Poland

9 Acknowledgements

Work financed as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.

References

Allen, J. 1995. Natural language understanding. (2nd ed.). Redwood City, CA: Benjamin-Cummings Publishing Co., Inc.Search in Google Scholar

Benthem, J. van. 1983. The logic of time : A model-theoretic investigation into the varieties of temporal ontology and temporal discourse Dordrecht: Reidel.10.1007/978-94-010-9868-7Search in Google Scholar

Bethard, S. 2013. “ClearTK-TimeML: A minimalist approach to TempEval 2013”. Atlanta: Association for Computational Linguistics. 10–14. <http://www.aclweb.org/anthology/S13-2002>Search in Google Scholar

Broda, B. M. Marcińczuk, M. Maziarz, A. Radziszewski and A. Wardyński. 2012. “KPWr: towards a free corpus of Polish”. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul: European Language Resources Association (ELRA).Search in Google Scholar

Dietterich, T. G. 1998. “Approximate statistical tests for comparing supervised classification learning algorithms”. Neural Computation 10. 1895–1923.10.1162/089976698300017197Search in Google Scholar

Ferro, L. 2001. “Instruction manual for the annotation of temporal expressions”. In: Fleiss, J.L, B. Levin and M. C. Paik. 2013. Statistical methods for rates and proportions Oxford: John Wiley & Sons.Search in Google Scholar

Hripcsak, G. and A.S. Rothschild. 2005. “Technical brief: Agreement, the F-measure, and reliability in information retrieval. JAMIA 12(3). 296–298. doi:10.1197/jamia.M1733.10.1197/jamia.M1733Search in Google Scholar

ISO. 2004. ISO 8601:2004. Data elements and interchange formats – Information interchange – Representation of dates and times<http://www.iso.ch/cate/d26780dhtml>Search in Google Scholar

Kocoń, J. and M. Marcińczuk. 2015. “Recognition of Polish temporal expressions”. Proceedings of the Recent Advances in Natural Language Processing. 282–290.Search in Google Scholar

Kocoń, J. and M. Marcińczuk. 2017. “Supervised approach to recognise Polish temporal expressions and rule-based interpretation of timexes”. Natural Language Engineering 23(3). 385–418. doi:10.1017/S1351324916000255.10.1017/S1351324916000255Search in Google Scholar

Kocoń, J. and M. Marcińczuk. 2017. “Improved recognition and normalisation of Polish temporal expressions. Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 387–393.10.26615/978-954-452-049-6_051Search in Google Scholar

Kocoń, J., M. Marcińczuk, M. Oleksy, T. Bernaś and M. Wolski. 2015. “Temporal expressions in Polish Corpus KPWr”. Cognitive Studies – Etudes Cognitives 15. Warszawa: SOW.10.11649/cs.2015.020Search in Google Scholar

Lafferty, J.D., A. McCallum and F. C. N. Pereira. 2001. “Conditional random fields: Probabilistic models for segmenting and labeling sequence data”. Proceedings of the eighteenth international conference on machine learning (ICML ’01). San Francisco, CA: Morgan Kaufmann. 282–289. <http://dl.acm.org/citation.cfm?id=645530.655813>Search in Google Scholar

Li, D., K. Kipper-Schuler and G. Savova. 2008. “Conditional random fields and support vector machines for disorder named entity recognition in clinical texts”. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing Association for Computational Linguistics. 94–95.10.3115/1572306.1572326Search in Google Scholar

Llorens, H., E. Saquete and B. Navarro-Colorado. 2010. “TimeML events recognition and classification: Learning CRF models with semantic roles”. Proceedings of the 23rd international conference on computational linguistics (COLING ’10). Stroudsburg, PA: Association for Computational Linguistics. 725–733. <http://dl.acm.org/citation.cfm?id=1873781.1873863>Search in Google Scholar

Llorens, H., E. Saquete and B. Navarro-Colorado. 2013. “Applying semantic knowledge to the automatic processing of temporal expressions and events in natural language”. Information Processing & Management 49(1). 179–197.10.1016/j.ipm.2012.05.005Search in Google Scholar

Marcińczuk, M. and J. Kocoń. 2013. “Recognition of named entities boundaries in Polish texts”. Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing Sofia: Association for Computational Linguistics. 94–99. <http://www.aclweb.org/anthology/W13-2414>Search in Google Scholar

Marcińczuk, M., J. Kocoń and B. Broda. 2012. “Inforex – a web-based tool for text corpus management and semantic annotation”. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) Istanbul: European Language Resources Association (ELRA).Search in Google Scholar

Marcińczuk, M., J. Kocoń and M. Janicki. 2013. “Liner2 – A customizable framework for proper names recognition for Polish”. In: Bembenik, R., Ł. Skonieczny, H. Rybiński, M. Kryszkiewicz and M. Niezgódka (eds.), Intelligent tools for building a scientific information platform. Berlin: Springer. 231253 doi:10.1007/978-3-642-35647-6_17.10.1007/978-3-642-35647-6_17Search in Google Scholar

Maziarz, M., M. Piasecki, E. Rudnicka and S. Szpakowicz. 2013. “Beyond the transfer-and-merge wordnet construction: plWordNet and a comparison with WordNet”. <http://www.informatik.uni-trier.de/~ley/pers/hd/m/Maziarz:Marek>Search in Google Scholar

Mazur, P. 2012. Broad-coverage rule-based processing of temporal expressions. (PhD dissertation, Wrocław University of Science and Technology.)Search in Google Scholar

Mazur, P. and R. Dale. 2011. “LTIMEX: Representing the local semantics of temporal expressions”. Federated Conference on Computer Science and Information Systems (FEDCSIS), 2011 IEEE. 201–208.Search in Google Scholar

Miller, G.A. 1995. “WordNet: A lexical database for English”. Communications of the ACM 38(11). 39–41.10.1145/219717.219748Search in Google Scholar

Piasecki, M., M. Maziarz, S. Szpakowicz and E. Rudnicka. 2014. “PLWordNet as the Cornerstone of a Toolkit of Lexico-semantic Resources”. <http://aclanthology.info/events/gwc-2014>Search in Google Scholar

Piasecki, M., S. Szpakowicz and B. Broda. 2009. A wordnet from the ground up Wrocław: Oficyna Wydawnicza Politechniki Wrocławskiej. <http://www.plwordnet.pwr.wroc.pl/main/content/files/publications/A_Wordnet_from_the_Ground_Up.pdf>Search in Google Scholar

Przepiórkowski, A. 2009. “A comparison of two morphosyntactic tagsets of Polish”. Representing semantics in digital lexicography: Proceedings of mondilex fourth open workshop Citeseer. 138–144.Search in Google Scholar

Pustejovsky, J., B. Ingria, R. Sauri, J. Castano, J. Littman, R. Gaizauskas, A. Setzer, G. Katz and I. Mani. 2005. “The specification language TimeML”. The language of time: A reader Oxford: Oxford University Press. 545–557.Search in Google Scholar

Pustejovsky, J., R. Knippen, J. Littman and R. Saurı́. 2005. “Temporal and event information in natural language text”. Language Resources and Evaluation 39(2–3). 123–164. doi:10.1007/s10579-005-7882-7.10.1007/s10579-005-7882-7Search in Google Scholar

Radziszewski, A., M. Maziarz and J. Wieczorek. 2012. “Shallow syntactic annotation in the Corpus of Wrocław University of Technology”. Cognitive Studies 12.10.11649/cs.2012.010Search in Google Scholar

Saquete, E., R. Muñoz and P. Martı́nez-Barco. 2003. “TERSEO: Temporal expression resolution system applied to event ordering”. In: Matoušek, V. and P. Mautner (eds.), Text, speech and dialogue Berlin: Springer. 220–228. doi:10.1007/978-3-540-39398-6_31.10.1007/978-3-540-39398-6_31Search in Google Scholar

Saurı́, R., J. Littman, R. Gaizauskas, A. Setzer and J. Pustejovsky. 2006. “TimeML annotation guidelines”. Version 1.2.1.Search in Google Scholar

Strötgen, J., and M. Gertz. 2013. “Multilingual and cross-domain temporal tagging”. Language Resources and Evaluation 47(2). 269–298. doi:10.1007/s10579-012-9179-y.10.1007/s10579-012-9179-ySearch in Google Scholar

Strötgen, J., J. Zell and M. Gertz. 2013. “HeidelTime: Tuning English and developing Spanish resources for TempEval-3”. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SEMEVAL 2013) Atlanta: Association for Computational Linguistics. 15–19. <http://www.aclweb.org/anthology/S13-2003>Search in Google Scholar

UzZaman, N. and J.F Allen. 2010. “TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text”. Proceedings of the 5th International Workshop on Semantic Evaluation Association for Computational Linguistics. 276–283.Search in Google Scholar

UzZaman, N., H. Llorens, L. Derczyński, J. Allen, M. Verhagen and J. Pustejovsky. 2013. “Semeval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and temporal relations”. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SEMEVAL 2013) Atlanta: Association for Computational Linguistics. 1–9.Search in Google Scholar

Published Online: 2019-08-17

Published in Print: 2019-06-26

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/psicl-2019-0011

Keywords for this article

Temporal expressions; TimeML; Polish; recognition; information extraction