Models & Languages
As of v1.7.0, models for spaCy can be installed as Python packages. This means that they're a component of your application, just like any
other module. They're versioned and can be defined as a dependency in your
requirements.txt. Models can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system.
Install a default model, get the code to load it from within spaCy and an example to test it. For more options, see the section on available models below.
spacy download enimport en_core_web_smnlp = en_core_web_sm.load()nlp = spacy.load('en')doc = nlp(u"This is a sentence.")print([(w.text, w.pos_) for w in doc])spacy download deimport de_dep_news_smnlp = de_dep_news_sm.load()nlp = spacy.load('de')doc = nlp(u"Dies ist ein Satz.")print([(w.text, w.pos_) for w in doc])spacy download esimport es_core_web_smnlp = es_core_web_sm.load()nlp = spacy.load('es')doc = nlp(u"Esto es una frase.")print([(w.text, w.pos_) for w in doc])spacy download frimport nlp = .load()nlp = spacy.load('fr')doc = nlp(u"C'est une phrase.")print([(w.text, w.pos_) for w in doc])spacy download xximport xx_ent_wiki_smnlp = xx_ent_wiki_sm.load()nlp = spacy.load('xx')doc = nlp(u"This is a sentence about Facebook.")print([(ent.text, ent.label) for ent in doc.ents])
Model differences are mostly statistical. In general, we do expect larger models to be "better" and more accurate overall. Ultimately, it depends on your use case and requirements, and we recommend starting with the default models (marked with a star below). For a more detailed overview, see the models directory.
|English||Vocabulary, syntax, entities, vectors|
|English||Vocabulary, syntax, entities, vectors|
|Spanish||Vocabulary, syntax, entities, vectors|
Installing and using models
The easiest way to download a model is via spaCy's
download command. It takes care of finding the best-matching model compatible with your spaCy installation.
# out-of-the-box: download best-matching default model spacy download en spacy download de spacy download es spacy download fr spacy download xx # download best-matching version of specific model for your spaCy installation spacy download en_core_web_sm # download exact model version (doesn't create shortcut link) spacy download en_core_web_sm-2.0.0 --direct
The download command will install the model via pip, place the package in your
site-packages directory and create a shortcut link that lets you load the model by a custom name. The shortcut link will be the same as the model name used in
pip install spacy spacy download en
import spacy nlp = spacy.load('en') doc = nlp(u'This is a sentence.')
Installation via pip
To download a model directly using pip, simply point
pip install to the URL or local path of the archive file. To find the direct link to a model, head over to the model releases, right click on the archive link and copy it to your clipboard.
# with external URL pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-1.2.0/en_core_web_md-1.2.0.tar.gz # with local file pip install /Users/you/en_core_web_md-1.2.0.tar.gz
By default, this will install the model into your
site-packages directory. You can then use
spacy.load() to load it via its package name, create a shortcut link to assign it a custom name, or import it explicitly as a module. If you need to download models as part of an automated process, we
recommend using pip with a direct link, instead of relying on spaCy's
Manual download and installation
In some cases, you might prefer downloading the data manually, for example to place it into a custom directory. You can download the model via your browser from the latest releases, or configure your own download script using the URL of the archive file. The archive consists of a model directory that contains another directory with the model data.
└── en_core_web_md-1.2.0.tar.gz # downloaded archive ├── meta.json # model meta data ├── setup.py # setup file for pip installation └── en_core_web_md # 📦 model package ├── __init__.py # init for pip installation ├── meta.json # model meta data └── en_core_web_md-1.2.0 # model data
You can place the model package directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory.
Using models with spaCy
To load a model, use
spacy.load() with the model's shortcut link, package name or a path to the data directory:
import spacy nlp = spacy.load('en') # load model with shortcut link "en" nlp = spacy.load('en_core_web_sm') # load model package "en_core_web_sm" nlp = spacy.load('/path/to/en_core_web_sm') # load package from a directory doc = nlp(u'This is a sentence.')
Using custom shortcut links
While previous versions of spaCy required you to maintain a data directory containing the models for each installation, you can now choose how and where you want to keep your data. For example, you could download all models manually and put them into a local directory. Whenever your spaCy projects need a models, you create a shortcut link to tell spaCy to load it from there. This means you'll never end up with duplicate data.
link command will create a symlink in the
spacy link [package name or path] [shortcut] [--force]
The first argument is the package name (if the model was installed via pip), or a local path to the the model package. The second argument is the internal name you want to use for the model. Setting the
--force flag will overwrite any existing links.
# set up shortcut link to load installed package as "en_default" spacy link en_core_web_md en_default # set up shortcut link to load local model as "my_amazing_model" spacy link /Users/you/model my_amazing_model
Importing models as modules
If you've installed a model via spaCy's downloader, or directly via pip, you can also
import it and then call its
load() method with no arguments:
import en_core_web_md nlp = en_core_web_md.load() doc = nlp(u'This is a sentence.')
How you choose to load your models ultimately depends on personal preference. However, for larger code bases, we usually recommend native imports, as this will make it easier to integrate models with your
existing build process, continuous integration workflow and testing
framework. It'll also prevent you from ever trying to load a model that is not installed, as your code will raise an
ImportError immediately, instead of failing somewhere down the line when calling
Using your own models
If you've trained your own model, for example for additional languages or custom named entities, you can save its state using the
Language.to_disk() method. To make the model more convenient to deploy, we recommend
wrapping it as a Python package.
spaCy currently provides models for the following languages:
Alpha tokenization support
Work has started on the following languages. You can help by improving the existing language data and extending the tokenization patterns.
As of v2.0, spaCy supports models trained on more than one language. This
is especially useful for named entity recognition. The language ID used for multi-language or language-neutral models is
xx. The language class, a generic subclass containing only the base language data, can be found in
To load your model with the neutral, multi-language class, simply set
"language": "xx" in your model package's meta.json. You can also import the class directly, or call
util.get_lang_class() for lazy-loading.
from spacy.lang.xx import MultiLanguage nlp = MultiLanguage()
from spacy.util import get_lang_class nlp = get_lang_class('xx')
Using models in production
If your application depends on one or more models, you'll usually want to integrate them into your continuous integration workflow and build process. While spaCy provides a range of useful helpers for downloading, linking and loading models, the underlying functionality is entirely based on native Python packages. This allows your application to handle a model like any other package dependency.
Downloading and requiring model dependencies
download command is mostly intended as a convenient, interactive wrapper. It performs
compatibility checks and prints detailed error messages and warnings.
However, if you're downloading models as part of an automated build
process, this only adds an unnecessary layer of complexity. If you know
which models your application needs, you should be specifying them directly.
Because all models are valid Python packages, you can add them to your application's
requirements.txt. If you're running your own internal PyPi installation, you can simply upload the models there. pip's requirements file format supports both package names to download via a PyPi server, as well as direct
spacy>=2.0.0,<3.0.0 -e https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#en_core_web_sm
#egg= with the package name tells pip which package to expect from the download URL. This way, the
package won't be re-downloaded and overwritten if it's already
installed - just like when you're downloading a package from PyPi.
All models are versioned and specify their spaCy dependency. This ensures
cross-compatibility and lets you specify exact version requirements for
each model. If you've trained your own model, you can use the
package command to generate the required meta data and turn it into a loadable package.
Loading and testing models
Downloading models directly via pip won't call spaCy's link
link command, which creates symlinks for model shortcuts. This means that you'll have to run this command separately, or use the native
import syntax to load the models:
import en_core_web_sm nlp = en_core_web_sm.load()
In general, this approach is recommended for larger code bases, as it's
more "native", and doesn't depend on symlinks or rely on spaCy's loader
to resolve string names to model packages. If a model can't be imported, Python will raise an
ImportError immediately. And if a model is imported but not used, any linter will catch that.
Similarly, it'll give you more flexibility when writing tests that
require loading models. For example, instead of writing your own
except logic around spaCy's loader, you can use pytest's
importorskip() method to only run a test if a specific model or model version is installed. Each model package exposes a
__version__ attribute which you can also use to perform your own version compatibility checks
before loading a model.