Facts & Figures

Feature comparison

Here's a quick comparison of the functionalities offered by spaCy, SyntaxNet, NLTK and CoreNLP.

spaCySyntaxNetNLTKCoreNLP
Easy installation
Python API
Multi-language support
Tokenization
Part-of-speech tagging
Sentence segmentation
Dependency parsing
Entity Recognition
Integrated word vectors
Sentiment analysis
Coreference resolution

Benchmarks

Two peer-reviewed papers in 2015 confirm that spaCy offers the fastest syntactic parser in the world and that its accuracy is within 1% of the best available. The few systems that are more accurate are 20× slower or more.

SystemLanguageAccuracySpeed (wps)
spaCyCython91.813,963
ClearNLPJava91.710,271
CoreNLPJava89.68,602
MATEJava92.5550
TurboC++92.4349

Parse accuracy

In 2016, Google released their SyntaxNet library, setting a new state of the art for syntactic dependency parsing accuracy. SyntaxNet's algorithm is very similar to spaCy's. The main difference is that SyntaxNet uses a neural network while spaCy uses a sparse linear model.

SystemNewsWebQuestions
spaCy92.8n/an/a
Parsey McParseface94.1589.0894.77
Martins et al. (2013)93.188.2394.21
Zhang and McDonald (2014)93.3288.6593.37
Weiss et al. (2015)93.9189.2994.17
Andor et al. (2016)94.4490.1795.4

Detailed speed comparison

Here we compare the per-document processing time of various spaCy functionalities against other NLP libraries. We show both absolute timings (in ms) and relative performance (normalized to spaCy). Lower is better.

Absolute (ms per doc)Relative (to spaCy)
SystemTokenizeTagParseTokenizeTagParse
spaCy0.2ms1ms19ms1x1x1x
CoreNLP2ms10ms49ms10x10x2.6x
ZPar1ms8ms850ms5x8x44.7x
NLTK4ms443msn/a20x443xn/a

Named entity comparison

Jiang et al. (2016) present several detailed comparisons of the named entity recognition models provided by spaCy, CoreNLP, NLTK and LingPipe. Here we show their evaluation of person, location and organization accuracy on Wikipedia.

SystemPrecisionRecallF-measure
spaCy0.7240.65140.6858
CoreNLP0.79140.73270.7609
NLTK0.51360.65320.575
LingPipe0.54120.53570.5384
Read next: Languages