Home Linguistics & Semiotics Chapter 4. How to compare speed and accuracy of syntactic parsers
Chapter
Licensed
Unlicensed Requires Authentication

Chapter 4. How to compare speed and accuracy of syntactic parsers

  • Gertjan van Noord
View more publications by John Benjamins Publishing Company
Crossroads Semantics
This chapter is in the book Crossroads Semantics

Abstract

The paper introduces a methodological innovation as well as a practical innovation. Firstly, two scenarios are introduced to compare accurate, but slow parsers on the one hand, with faster, but less accurate parsers on the other hand. Secondly, a corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. Experimental results with the Alpino parser for Dutch indicate that the technique yields much faster parsers that perform with almost the same level of accuracy. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora.

Abstract

The paper introduces a methodological innovation as well as a practical innovation. Firstly, two scenarios are introduced to compare accurate, but slow parsers on the one hand, with faster, but less accurate parsers on the other hand. Secondly, a corpus-based technique is described to improve the efficiency of wide-coverage high-accuracy parsers. By keeping track of the derivation steps which lead to the best parse for a very large collection of sentences, the parser learns which parse steps can be filtered without significant loss in parsing accuracy, but with an important increase in parsing efficiency. Experimental results with the Alpino parser for Dutch indicate that the technique yields much faster parsers that perform with almost the same level of accuracy. An interesting characteristic of our approach is that it is self-learning, in the sense that it uses unannotated corpora.

Downloaded on 29.9.2025 from https://www.degruyterbrill.com/document/doi/10.1075/z.210.04van/html
Scroll to top button