Languages

spaCy currently provides models for the following languages and capabilities:

LanguageTokenSBDLemmaPOSNERDepVectorSentiment
English en
German de
French fr
Spanish es

See available models

Alpha tokenization support

Work has started on the following languages. You can help by improving the existing language data and extending the tokenization patterns.

LanguageCodeSource
Italianitlang/it
Portugueseptlang/pt
Dutchnllang/nl
Swedishsvlang/sv
Finnishfilang/fi
Norwegian Bokmålnblang/nb
Danishdalang/da
Hungarianhulang/hu
Polishpllang/pl
Bengalibnlang/bn
Hebrewhelang/he
Chinesezhlang/zh
Japanesejalang/ja

Multi-language support

As of v2.0, spaCy supports models trained on more than one language. This is especially useful for named entity recognition. The language ID used for multi-language or language-neutral models is xx. The language class, a generic subclass containing only the base language data, can be found in lang/xx .

To load your model with the neutral, multi-language class, simply set "language": "xx" in your model package's meta.json. You can also import the class directly, or call util.get_lang_class() for lazy-loading.

Standard import

from spacy.lang.xx import MultiLanguage nlp = MultiLanguage()

With lazy-loading

from spacy.util import get_lang_class nlp = get_lang_class('xx')
Read next: Philosophy