Home Medicine Evaluation of lexicon- and syntax-based negation detection algorithms using clinical text data
Article
Licensed
Unlicensed Requires Authentication

Evaluation of lexicon- and syntax-based negation detection algorithms using clinical text data

  • J. Manimaran ORCID logo EMAIL logo and T. Velmurugan
Published/Copyright: December 29, 2017
Become an author with De Gruyter Brill

Abstract

Background:

Clinical Text Analysis and Knowledge Extraction System (cTAKES) is an open-source natural language processing (NLP) system. In recent development modules of cTAKES, a negation detection (ND) algorithm is used to improve annotation capabilities and simplify automatic identification of negative context in large clinical documents. In this research, the two types of ND algorithms used are lexicon and syntax, which are analyzed using a database made openly available by the National Center for Biomedical Computing. The aim of this analysis is to find the pros and cons of these algorithms.

Methods:

Patient medical reports were collected from three institutions included the 2010 i2b2/VA Clinical NLP Challenge, which is the input data for this analysis. This database includes patient discharge summaries and progress notes. The patient data is fed into five ND algorithms: NegEx, ConText, pyConTextNLP, DEEPEN and Negation Resolution (NR). NegEx, ConText and pyConTextNLP are lexicon-based, whereas DEEPEN and NR are syntax-based. The results from these five ND algorithms are post-processed and compared with the annotated data. Finally, the performance of these ND algorithms is evaluated by computing standard measures including F-measure, kappa statistics and ROC, among others, as well as the execution time of each algorithm.

Results:

This research is tested through practical implementation based on the accuracy of each algorithm’s results and computational time to evaluate its performance in order to find a robust and reliable ND algorithm.

Conclusions:

The performance of the chosen ND algorithms is analyzed based on the results produced by this research approach. The time and accuracy of each algorithm are calculated and compared to suggest the best method.

Acknowledgments

We thank S. Shahul Hameed and Gokulalakshmi Elayaperumal from Sree Balaji Medical College and Hospital, Chennai, India, George Gkotsis from IoPPN, King’s College London, and Saeed Mehrabi from the School of Informatics and Computing, Indiana University, Indianapolis, IN, USA, who offered their continuous support in the implementation of this task.

  1. Author contributions: The authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Employment or leadership: None declared.

  4. Honorarium: None declared.

  5. Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.

Appendix A

Table A1:

Example of single annotated i2b2 report.

Sentences (report file)Chest CT scan was negative for pulmonary embolism but positive for consolidation
Assertion annotationc=“pulmonary embolism” 21:6 21:7 | | t= “problem” | | a=“absent”
(Annotated file)c=“consolidation” 21:11 21:11 | | t=“problem” | | a=“present”
Table A2:

A sample input data.

ConceptsSentences
MassNo mass or vegetation is seen on the mitral valve
Pericardial effusionThere is no pericardial effusion
Epileptiform featuresNo epileptiform features were seen
InfectionCXR, LP, UA and abdominal CT showed no sign of infection
OrthostaticShe was not orthostatic
A headacheHe did not complain about a headache
Table A3:

Evaluation of the algorithm’s output using 2*2 table.

Predicted output
True (negated)False (affirmed)
Manual annotations
 True (negated)True positive (TP)False negative (FN)
 False (affirmed)False positive (FP)True negative (TN)
Figure A1: A simple surface-based approach of ConText algorithm.
Figure A1:

A simple surface-based approach of ConText algorithm.

References

1. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc 2012;18:544–51.10.1136/amiajnl-2011-000464Search in Google Scholar PubMed PubMed Central

2. Koopman B, Bruza P, Sitbon L, Lawley M. Analysis of the effect of negation on information retrieval of medical data. In: Proc 15th Australas Doc Comput Symp 2010:89–92.Search in Google Scholar

3. Scuba W, Tharp M, Mowery D, Tseytlin E, Liu Y, Drews FA, et al. Knowledge author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semant 2016;7:42.10.1186/s13326-016-0086-9Search in Google Scholar PubMed PubMed Central

4. Garla V, Re V Lo, Dorey-Stein Z, Kidwai F, Scotch M, Womack J, et al. The Yale cTAKES extensions for document classification: architecture and application. J Am Med Inform Assoc 2011;18:614–20.10.1136/amiajnl-2011-000093Search in Google Scholar PubMed PubMed Central

5. Mitchell KJ, Becich MJ, Berman JJ, Chapman WW, Gilbertson J, Gupta D, et al. Implementation and evaluation of a negation tagger in a pipeline-based system for information extraction from pathology reports. Stud Health Technol Inform 2004;107:663–7.Search in Google Scholar

6. Clark C, Aberdeen J, Coarr M, Tresner-kirsch D, Wellner B, Yeh A, et al. Determining assertion status for medical problems in clinical records. McLean, VA: Mitre Corporation, 2011:2–6.Search in Google Scholar

7. Ou Y, Patrick J. Automatic negation detection in narrative pathology reports. Artif Intell Med 2015;64:41–50.10.1016/j.artmed.2015.03.001Search in Google Scholar PubMed

8. Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inf Assoc 2011;18:601–6.10.1136/amiajnl-2011-000163Search in Google Scholar PubMed PubMed Central

9. Clark C, Aberdeen J, Coarr M, Tresner-Kirsch D, Wellner B, Yeh A, et al. MITRE system for clinical assertion status classification. J Am Med Inform Assoc 2011;18:563–7.10.1136/amiajnl-2011-000164Search in Google Scholar PubMed PubMed Central

10. Minard A-L, Ligozat A-L, Ben Abacha A, Bernhard D, Cartoni B, Deléger L, et al. Hybrid methods for improving information access in clinical documents: concept, assertion, and relation identification. J Am Med Inform Assoc 2011;18:588–93.10.1136/amiajnl-2011-000154Search in Google Scholar PubMed PubMed Central

11. Ballesteros M, Francisco V, Díaz A, Herrera J, Gervás P. Inferring the scope of negation in biomedical documents. Lect Notes Comput Sci 2012;7181 LNCS:363–75.10.1007/978-3-642-28604-9_30Search in Google Scholar

12. Chapman WW, Dowling JN, Wagner MM. Fever detection from free-text clinical records for biosurveillance. J Biomed Inform 2004;37:120–7.10.1016/j.jbi.2004.03.002Search in Google Scholar PubMed PubMed Central

13. Sanchez-Graillet O, Poesio M. Negation of protein-protein interactions: analysis and extraction. Bioinformatics 2007;23:424–32.10.1093/bioinformatics/btm184Search in Google Scholar PubMed

14. Morante R. Descriptive analysis of negation cues in biomedical texts. Statistics 2009;1429–36.Search in Google Scholar

15. Horn LR. Natural history of negation. J Pragmat 1989;16: 269–80.10.1016/0378-2166(91)90096-GSearch in Google Scholar

16. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 2001;34:301–10.10.1006/jbin.2001.1029Search in Google Scholar PubMed

17. Aronow DB, Fangfang F, Croft WB. Ad hoc classification of radiology reports. J Am Med Inform Assoc 1999;6:393–411.10.1136/jamia.1999.0060393Search in Google Scholar PubMed PubMed Central

18. Mutalik PG, Deshpande A, Nadkarni PM. Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS. J Am Med Inform Assoc 2001;8:598–609.10.1136/jamia.2001.0080598Search in Google Scholar PubMed PubMed Central

19. Gindl S, Kaiser K, Miksch S. Syntactical negation detection in clinical practice guidelines. Stud Health Technol Inform 2008;136:187–92.Search in Google Scholar PubMed

20. Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform 2009;42:839–51.10.1016/j.jbi.2009.05.002Search in Google Scholar PubMed PubMed Central

21. Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform 2011;44:728–37.10.1016/j.jbi.2011.03.011Search in Google Scholar PubMed PubMed Central

22. Mehrabi S, Krishnan A, Sohn S, Roch AM, Schmidt H, Kesterson J, et al. DEEPEN: a negation detection system for clinical text incorporating dependency relation into NegEx. J Biomed Inform 2015;54:213–9.10.1016/j.jbi.2015.02.010Search in Google Scholar PubMed PubMed Central

23. Huang Y, Lowe H. A novel hybrid approach to automated negation detection in clinical radiology reports. J Am Med Inform 2007;304–11.10.1197/jamia.M2284Search in Google Scholar PubMed PubMed Central

24. Zhu Q, Li J, Wang H. A unified framework for scope learning via simplified shallow semantic parsing. In: EMNLP 2010 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing 2010:714–24.Search in Google Scholar

25. Sohn S, Wu S, Chute CG. Dependency parser-based negation detection in clinical narratives. AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci 2012;2012:1–8.Search in Google Scholar

26. Gkotsis G, Velupillai S, Oellrich A, Dean H, Liakata M, Dutta R. Don’t let notes be misunderstood: a negation detection method for assessing risk of suicide in mental health records. In: Proc 3rd Work Comput Linguist Clin Psychol Linguist Signal Clin Real 2016:95–105.10.18653/v1/W16-0310Search in Google Scholar

27. Lapponi E, Read J, Øvrelid L. Representing and resolving negation for sentiment analysis. In: Proc 12th IEEE Int Conf Data Min Work ICDMW 2012:687–92.10.1109/ICDMW.2012.23Search in Google Scholar

28. Shivade C, de Marneffe MC, Fosler-Lussier E, Lai AM. Extending NegEx with kernel methods for negation detection in clinical text. In: Proc Work Extra-Propositional Asp Mean Comput Semant NAACL 2015:41–6.10.3115/v1/W15-1305Search in Google Scholar

29. Kang T, Zhang S, Xu N, Wen D, Zhang X, Lei J. Detecting negation and scope in Chinese clinical notes using character and word embedding. Comput Methods Programs Biomed 2017;140:53–9.10.1016/j.cmpb.2016.11.009Search in Google Scholar PubMed

30. Goryachev S, Sordo M, Zeng QT, Ngo L. Implementation and evaluation of four different methods of negation detection. Boston, MA: DSG, 2006.Search in Google Scholar

31. Tanushi H, Dalianis H, Duneld M, Kvist M, Skeppstedt M, Velupillai S. Negation scope delimitation in clinical text using three approaches: NegEx, PyConTextNLP and SynNeg. In: Proc 19th Nord Conf Comput Linguist (NoDaLiDa 2013) 2013;1: 387–97.Search in Google Scholar

32. Wu S, Miller T, Masanz J, Coarr M, Halgrim S, Carrell D, et al. Negation’s not solved: generalizability versus optimizability in clinical natural language processing. PLoS One 2014;9:e112774.10.1371/journal.pone.0112774Search in Google Scholar PubMed PubMed Central

33. Uzuner O, South BR, Shen S, DuVall SL, Uzuner Ö, South BR, et al. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc 2011;18:552–6.10.1136/amiajnl-2011-000203Search in Google Scholar PubMed PubMed Central

34. Mowery DL, Chapman BE, Conway M, South BR, Madden E, Keyhani S, et al. Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 2016;7:26.10.1186/s13326-016-0065-1Search in Google Scholar PubMed PubMed Central

35. Chapman BE, Mowery DL, Narasimhan E, Patel N, Chapman WW, Heilbrun ME. Assessing the feasibility of an automated suggestion system for communicating critical findings from chest radiology reports to referring physicians. In: Proc 15th Work Biomed Nat Lang Process 2016:181–5.10.18653/v1/W16-2924Search in Google Scholar

36. Bruha I, Famili A. Postprocessing in machine learning and data mining. ACM SIGKDD Explor Newslett 2000;2:110–4.10.1145/380995.381059Search in Google Scholar

Received: 2017-7-25
Accepted: 2017-11-27
Published Online: 2017-12-29
Published in Print: 2017-12-20

©2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 7.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/bams-2017-0016/html
Scroll to top button