scikit

Code Examples
Full code examples you can modify and run.

Custom pipeline components and attribute extensions

This example shows the implementation of a pipeline component that sets entity annotations based on a list of single or multiple-word company names, merges entities into one token and sets custom attributes on the Doc, Span and Token.

Custom pipeline components and attribute extensions via a REST API

This example shows the implementation of a pipeline component that fetches country meta data via the REST Countries API sets entity annotations for countries, merges entities into one token and sets custom attributes on the Doc, Span and Token – for example, the capital, latitude/longitude coordinates and the country flag.

spacy/examples/pipeline/custom_component_countries_api.py

Custom method extensions

A collection of snippets showing examples of extensions adding custom methods to the Doc, Token and Span.

Using spaCy's rule-based matcher

This example shows how to use spaCy's rule-based Matcher to find and label entities across documents.

Using spaCy's phrase matcher

This example shows how to use the new PhraseMatcher to efficiently find entities from a large terminology list.

Training an additional entity type

This script shows how to add a new entity type to an existing pre-trained NER model. To keep the example short and simple, only four sentences are provided as examples. In practice, you'll need many more — a few hundred would be a good start.

Training an NER system from scratch

This example is written to be self-contained and reasonably transparent. To achieve that, it duplicates some of spaCy's internal functionality.

Training spaCy's text classifier

This example shows how to use and train spaCy's new TextCategorizer pipeline component on IMDB movie reviews.

Text classification with Keras

In this example, we're using spaCy to pre-process text for use with a Keras text classification model.

A decomposable attention model for Natural Language Inference

This example contains an implementation of the entailment prediction model described by Parikh et al. (2016). The model is notable for its competitive performance with very few parameters, and was implemented using Keras and spaCy.