Natural Language

in Python

Fastest in the world

spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.

Facts & figures

Get things done

spaCy is designed to help you do real work — to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. We like to think of spaCy as the Ruby on Rails of Natural Language Processing.

Get started

Deep learning

spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python's awesome AI ecosystem. spaCy helps you connect the statistical models trained by these libraries to the rest of your application.

Read more
# Install: pip install spacy && spacy download en
import spacy

# Load English tokenizer, tagger, parser, NER and word vectors
nlp = spacy.load('en')

# Process a document, of any size
text = open('war_and_peace.txt').read()
doc = nlp(text)

# Hook in your own deep learning models
similarity_model = load_my_neural_network()
def install_similarity(doc):
    doc.user_hooks['similarity'] = similarity_model

doc1 = nlp(u'the fries were gross')
doc2 = nlp(u'worst fries ever')


  • Non-destructive tokenization
  • Support for 20+ languages
  • 6 statistical models for 5 languages
  • Pre-trained word vectors
  • Easy deep learning integration
  • Part-of-speech tagging
  • Named entity recognition
  • Labelled dependency parsing
  • Syntax-driven sentence segmentation
  • Built in visualizers for syntax and NER
  • Convenient string-to-hash mapping
  • Export to numpy data arrays
  • Efficient binary serialization
  • Easy model packaging and deployment
  • State-of-the-art speed
  • Robust, rigorously evaluated accuracy

New in v2.0
Convolutional neural network models

spaCy v2.0 features new neural models for tagging, parsing and entity recognition. The models have been designed and implemented from scratch specifically for spaCy, to give you an unmatched balance of speed, size and accuracy. A novel bloom embedding strategy with subword features is used to support huge vocabularies in tiny tables. Convolutional layers with residual connections, layer normalization and maxout non-linearity are used, giving much better efficiency than the standard BiLSTM solution. Finally, the parser and NER use an imitation learning objective to deliver accuracy in-line with the latest research systems, even when evaluated from raw text. With these innovations, spaCy v2.0's models are 10× smaller, 20% more accurate, and just as fast as the previous generation.

spaCy is trusted by

Featured on

From the makers of spaCy
Prodigy: Radically efficient machine teaching

Prodigy is an annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. Whether you're working on entity recognition, intent detection or image classification, Prodigy can help you train and evaluate your models faster. Stream in your own examples or real-world data from live APIs, update your model in real-time and chain models together to build more complex systems.


In 2015, independent researchers from Emory University and Yahoo! Labs showed that spaCy offered the fastest syntactic parser in the world and that its accuracy was within 1% of the best available (Choi et al., 2015). spaCy v2.0, released in 2017, is more accurate than any of the systems Choi et al. evaluated.

SystemYearLanguageAccuracySpeed (wps)
spaCy v2.x2017Python / Cython92.6n/a
spaCy v1.x2015Python / Cython91.813,963