scikit

Top-level Functions

spacy.load

Load a model via its shortcut link, the name of an installed model package, a unicode path or a Path-like object. spaCy will try resolving the load argument in this order. If a model is loaded from a shortcut link or package name, spaCy will assume it's a Python package and import it and call the model's own load() method. If a model is loaded from a path, spaCy will assume it's a data directory, read the language and pipeline settings off the meta.json and initialise the Language class. The data will be loaded in via Language.from_disk() .

NameTypeDescription
nameunicode or PathModel to load, i.e. shortcut link, package name or path.
disablelist Names of pipeline components to disable.
returnsLanguageA Language object with the loaded model.

Essentially, spacy.load() is a convenience wrapper that reads the language ID and pipeline components from a model's meta.json, initialises the Language class, loads in the model data and returns it.

Abstract example

cls = util.get_lang_class(lang) # get language for ID, e.g. 'en' nlp = cls() # initialise the language for name in pipeline: component = nlp.create_pipe(name) # create each pipeline component nlp.add_pipe(component) # add component to pipeline nlp.from_disk(model_data_path) # load in model data

spacy.blank

Create a blank model of a given language class. This function is the twin of spacy.load().

NameTypeDescription
nameunicodeISO code of the language class to load.
disablelist Names of pipeline components to disable.
returnsLanguageAn empty Language object of the appropriate subclass.

spacy.info

The same as the info command . Pretty-print information about your installation, models and local setup from within spaCy. To get the model meta data as a dictionary instead, you can use the meta attribute on your nlp object with a loaded model, e.g. nlp['meta'].

NameTypeDescription
modelunicodeA model, i.e. shortcut link, package name or path (optional).
markdownboolPrint information as Markdown.

spacy.explain

Get a description for a given POS tag, dependency label or entity type. For a list of available terms, see glossary.py .

NameTypeDescription
termunicodeTerm to explain.
returnsunicodeThe explanation, or None if not found in the glossary.

displaCySource

As of v2.0, spaCy comes with a built-in visualization suite. For more info and examples, see the usage guide on visualizing spaCy.

displacy.serve

Serve a dependency parse tree or named entity visualization to view it in your browser. Will run a simple web server.

NameTypeDescriptionDefault
docslist or DocDocument(s) to visualize.
styleunicodeVisualization style, 'dep' or 'ent'.'dep'
pageboolRender markup as full HTML page.True
minifyboolMinify HTML markup.False
optionsdictVisualizer-specific options, e.g. colors.{}
manualbool Don't parse Doc and instead, expect a dict or list of dicts. See here for formats and examples.False
portintPort to serve visualization.5000

displacy.render

Render a dependency parse tree or named entity visualization.

NameTypeDescriptionDefault
docslist or DocDocument(s) to visualize.
styleunicodeVisualization style, 'dep' or 'ent'.'dep'
pageboolRender markup as full HTML page.False
minifyboolMinify HTML markup.False
jupyterbool Explicitly enable "Jupyter mode" to return markup ready to be rendered in a notebook.detected automatically
optionsdictVisualizer-specific options, e.g. colors.{}
manualbool Don't parse Doc and instead, expect a dict or list of dicts. See here for formats and examples.False
returnsunicodeRendered HTML markup.

Visualizer options

The options argument lets you specify additional settings for each visualizer. If a setting is not present in the options, the default value will be used.

Dependency Visualizer options

NameTypeDescriptionDefault
collapse_punctbool Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation.True
compactbool"Compact mode" with square arrows that takes up less space.False
colorunicodeText color (HEX, RGB or color names).'#000000'
bgunicodeBackground color (HEX, RGB or color names).'#ffffff'
fontunicodeFont name or font family for all text.'Arial'
offset_xintSpacing on left side of the SVG in px.50
arrow_strokeintWidth of arrow path in px.2
arrow_widthintWidth of arrow head in px.10 / 8 (compact)
arrow_spacingintSpacing between arrows in px to avoid overlaps.20 / 12 (compact)
word_spacingintHorizontal spacing between words and arcs in px.45
distanceintDistance between words in px.175 / 85 (compact)

Named Entity Visualizer options

NameTypeDescriptionDefault
entslist Entity types to highlight (None for all types).None
colorsdict Color overrides. Entity types in uppercase should be mapped to color names or values.{}

By default, displaCy comes with colours for all entity types supported by spaCy. If you're using custom entity types, you can use the colors setting to add your own colours for them.

Utility functionsSource

spaCy comes with a small collection of utility functions located in spacy/util.py . Because utility functions are mostly intended for internal use within spaCy, their behaviour may change with future releases. The functions documented on this page should be safe to use and we'll try to ensure backwards compatibility. However, we recommend having additional tests in place if your application depends on any of spaCy's utilities.

util.get_data_path

Get path to the data directory where spaCy looks for models. Defaults to spacy/data.

NameTypeDescription
require_existsboolOnly return path if it exists, otherwise return None.
returnsPath / NoneData path or None.

util.set_data_path

Set custom path to the data directory where spaCy looks for models.

NameTypeDescription
pathunicode or PathPath to new data directory.

util.get_lang_class

Import and load a Language class. Allows lazy-loading language data and importing languages using the two-letter language code.

NameTypeDescription
langunicodeTwo-letter language code, e.g. 'en'.
returnsLanguageLanguage class.

util.load_model

Load a model from a shortcut link, package or data path. If called with a shortcut link or package name, spaCy will assume the model is a Python package and import and call its load() method. If called with a path, spaCy will assume it's a data directory, read the language and pipeline settings from the meta.json and initialise a Language class. The model data will then be loaded in via Language.from_disk() .

NameTypeDescription
nameunicodePackage name, shortcut link or model path.
**overrides-Specific overrides, like pipeline components to disable.
returnsLanguageLanguage class with the loaded model.

util.load_model_from_path

Load a model from a data directory path. Creates the Language class and pipeline based on the directory's meta.json and then calls from_disk() with the path. This function also makes it easy to test a new model that you haven't packaged yet.

NameTypeDescription
model_pathunicodePath to model data directory.
metadict Model meta data. If False, spaCy will try to load the meta from a meta.json in the same directory.
**overrides-Specific overrides, like pipeline components to disable.
returnsLanguageLanguage class with the loaded model.

util.load_model_from_init_py

A helper function to use in the load() method of a model package's __init__.py .

NameTypeDescription
init_fileunicodePath to model's __init__.py, i.e. __file__.
**overrides-Specific overrides, like pipeline components to disable.
returnsLanguageLanguage class with the loaded model.

util.get_model_meta

Get a model's meta.json from a directory path and validate its contents.

NameTypeDescription
pathunicode or PathPath to model directory.
returnsdictThe model's meta data.

util.is_package

Check if string maps to a package installed via pip. Mainly used to validate model packages.

NameTypeDescription
nameunicodeName of package.
returnsboolTrue if installed package, False if not.

util.get_package_path

Get path to an installed package. Mainly used to resolve the location of model packages. Currently imports the package to find its path.

NameTypeDescription
package_nameunicodeName of installed package.
returnsPathPath to model package directory.

util.is_in_jupyter

Check if user is running spaCy from a Jupyter notebook by detecting the IPython kernel. Mainly used for the displacy visualizer.

NameTypeDescription
returnsboolTrue if in Jupyter, False if not.

util.update_exc

Update, validate and overwrite tokenizer exceptions. Used to combine global exceptions with custom, language-specific exceptions. Will raise an error if key doesn't match ORTH values.

NameTypeDescription
base_exceptionsdictBase tokenizer exceptions.
*addition_dictsdictsException dictionaries to add to the base exceptions, in order.
returnsdictCombined tokenizer exceptions.

util.prints

Print a formatted, text-wrapped message with optional title. If a text argument is a Path, it's converted to a string. Should only be used for interactive components like the command-line interface.

NameTypeDescription
*textsunicodeTexts to print. Each argument is rendered as paragraph.
**kwargs- title is rendered as coloured headline. exits performs system exit after printing, using the value of the argument as the exit code, e.g. exits=1.

Compatibility functionsSource

All Python code is written in an intersection of Python 2 and Python 3. This is easy in Cython, but somewhat ugly in Python. Logic that deals with Python or platform compatibility only lives in spacy.compat. To distinguish them from the builtin functions, replacement functions are suffixed with an undersocre, e.e unicode_. For specific checks, spaCy uses the six and ftfy packages.

NamePython 2Python 3
compat.bytes_strbytes
compat.unicode_unicodestr
compat.basestring_basestringstr
compat.input_raw_inputinput
compat.json_dumpsujson.dumps with .decode('utf8')ujson.dumps
compat.path2strstr(path) with .decode('utf8')str(path)

compat.is_config

Check if a specific configuration of Python version and operating system matches the user's setup. Mostly used to display targeted error messages.

NameTypeDescription
python2boolspaCy is executed with Python 2.x.
python3boolspaCy is executed with Python 3.x.
windowsboolspaCy is executed on Windows.
linuxboolspaCy is executed on Linux.
osxboolspaCy is executed on OS X or macOS.
returnsboolWhether the specified configuration matches the user's platform.

Command line

As of v1.7.0, spaCy comes with new command line helpers to download and link models and show useful debugging information. For a list of available commands, type spacy --help.

Download

Download models for spaCy. The downloader finds the best-matching compatible version, uses pip to download the model as a package and automatically creates a shortcut link to load the model by name. Direct downloads don't perform any compatibility checks and require the model name to be specified with its version (e.g., en_core_web_sm-1.2.0).

spacy download [model] [--direct]
ArgumentTypeDescription
modelpositionalModel name or shortcut (en, de, vectors).
--direct, -dflagForce direct download of exact model version.
--help, -hflagShow help message and available arguments.

Create a shortcut link for a model, either a Python package or a local directory. This will let you load models from any location using a custom name via spacy.load() .

spacy link [origin] [link_name] [--force]
ArgumentTypeDescription
originpositionalModel name if package, or path to local directory.
link_namepositionalName of the shortcut link to create.
--force, -fflagForce overwriting of existing link.
--help, -hflagShow help message and available arguments.

Info

Print information about your spaCy installation, models and local setup, and generate Markdown-formatted markup to copy-paste into GitHub issues.

spacy info [--markdown]
spacy info [model] [--markdown]
ArgumentTypeDescription
modelpositionalA model, i.e. shortcut link, package name or path (optional).
--markdown, -mdflagPrint information as Markdown.
--help, -hflagShow help message and available arguments.

Validate

Find all models installed in the current environment (both packages and shortcut links) and check whether they are compatible with the currently installed version of spaCy. Should be run after upgrading spaCy via pip install -U spacy to ensure that all installed models are can be used with the new version. The command is also useful to detect out-of-sync model links resulting from links created in different virtual environments. Prints a list of models, the installed versions, the latest compatible version (if out of date) and the commands for updating.

spacy validate

Convert

Convert files into spaCy's JSON format for use with the train command and other experiment management functions. The right converter is chosen based on the file extension of the input file. Currently only supports .conllu.

spacy convert [input_file] [output_dir] [--n-sents] [--morphology]
ArgumentTypeDescription
input_filepositionalInput file.
output_dirpositionalOutput directory for converted JSON file.
--n-sents, -noptionNumber of sentences per document.
--morphology, -moptionEnable appending morphology to tags.
--help, -hflagShow help message and available arguments.

Train

Train a model. Expects data in spaCy's JSON format. On each epoch, a model will be saved out to the directory. Accuracy scores and model details will be added to a meta.json to allow packaging the model using the package command.

spacy train [lang] [output_dir] [train_data] [dev_data] [--n-iter] [--n-sents] [--use-gpu] [--meta-path] [--vectors] [--no-tagger] [--no-parser] [--no-entities] [--gold-preproc]
ArgumentTypeDescription
langpositionalModel language.
output_dirpositionalDirectory to store model in.
train_datapositionalLocation of JSON-formatted training data.
dev_datapositionalLocation of JSON-formatted dev data (optional).
--n-iter, -noptionNumber of iterations (default: 20).
--n-sents, -nsoptionNumber of sentences (default: 0).
--use-gpu, -goptionUse GPU.
--vectors, -voptionModel to load vectors from.
--meta-path, -moption Optional path to model meta.json. All relevant properties like lang, pipeline and spacy_version will be overwritten.
--version, -Voption Model version. Will be written out to the model's meta.json after training.
--no-tagger, -TflagDon't train tagger.
--no-parser, -PflagDon't train parser.
--no-entities, -NflagDon't train NER.
--gold-preproc, -GflagUse gold preprocessing.
--help, -hflagShow help message and available arguments.

Environment variables for hyperparameters

spaCy lets you set hyperparameters for training via environment variables. This is useful, because it keeps the command simple and allows you to create an alias for your custom train command while still being able to easily tweak the hyperparameters. For example:

parser_hidden_depth=2 parser_maxout_pieces=1 train-parser
NameDescriptionDefault
dropout_fromInitial dropout rate.0.2
dropout_toFinal dropout rate.0.2
dropout_decayRate of dropout change.0.0
batch_fromInitial batch size.1
batch_toFinal batch size.64
batch_compoundRate of batch size acceleration.1.001
token_vector_widthWidth of embedding tables and convolutional layers.128
embed_sizeNumber of rows in embedding tables.7500
parser_maxout_piecesNumber of pieces in the parser's and NER's first maxout layer.2
parser_hidden_depthNumber of hidden layers in the parser and NER.1
hidden_widthSize of the parser's and NER's hidden layers.128
history_featsNumber of previous action ID features for parser and NER.128
history_widthNumber of embedding dimensions for each action ID.128
learn_rateLearning rate.0.001
optimizer_B1Momentum for the Adam solver.0.9
optimizer_B2Adagrad-momentum for the Adam solver.0.999
optimizer_epsEpsylon value for the Adam solver.1e-08
L2_penaltyL2 regularisation penalty.1e-06
grad_norm_clipGradient L2 norm constraint.1.0

Evaluate

Evaluate a model's accuracy and speed on JSON-formatted annotated data. Will print the results and optionally export displaCy visualizations of a sample set of parses to .html files. Visualizations for the dependency parse and NER will be exported as separate files if the respective component is present in the model's pipeline.

spacy evaluate [model] [data_path] [--displacy-path] [--displacy-limit] [--gpu-id] [--gold-preproc]
ArgumentTypeDescription
modelpositional Model to evaluate. Can be a package or shortcut link name, or a path to a model data directory.
data_pathpositionalLocation of JSON-formatted evaluation data.
--displacy-path, -dpoption Directory to output rendered parses as HTML. If not set, no visualizations will be generated.
--displacy-limit, -dloption Number of parses to generate per file. Defaults to 25. Keep in mind that a significantly higher number might cause the .html files to render slowly.
--gpu-id, -goptionGPU to use, if any. Defaults to -1 for CPU.
--gold-preproc, -GflagUse gold preprocessing.

Package

Generate a model Python package from an existing model data directory. All data files are copied over. If the path to a meta.json is supplied, or a meta.json is found in the input directory, this file is used. Otherwise, the data can be entered directly from the command line. The required file templates are downloaded from GitHub to make sure you're always using the latest versions. This means you need to be connected to the internet to use this command.

spacy package [input_dir] [output_dir] [--meta-path] [--create-meta] [--force]
ArgumentTypeDescription
input_dirpositionalPath to directory containing model data.
output_dirpositionalDirectory to create package folder in.
--meta-path, -moption Path to meta.json file (optional).
--create-meta, -cflag Create a meta.json file on the command line, even if one already exists in the directory.
--force, -fflagForce overwriting of existing folder in output directory.
--help, -hflagShow help message and available arguments.