scikit

EntityRecognizer
Annotate named entities on documents.

EntityRecognizer.Model

Initialise a model for the pipe. The model should implement the thinc.neural.Model API. Wrappers are available for most major machine learning libraries.

NameTypeDescription
**kwargs-Parameters for initialising the model
returnsobjectThe initialised model.

EntityRecognizer.__init__

Create a new pipeline instance.

NameTypeDescription
vocabVocabThe shared vocabulary.
modelthinc.neural.Model or True The model powering the pipeline component. If no model is supplied, the model is created when you call begin_training, from_disk or from_bytes.
**cfg-Configuration parameters.
returnsEntityRecognizerThe newly constructed object.

EntityRecognizer.__call__

Apply the pipe to one document. The document is modified in place, and returned. Both EntityRecognizer.__call__ and EntityRecognizer.pipe should delegate to the EntityRecognizer.predict and EntityRecognizer.set_annotations methods.

NameTypeDescription
docDocThe document to process.
returnsDocThe processed document.

EntityRecognizer.pipe

Apply the pipe to a stream of documents. Both EntityRecognizer.__call__ and EntityRecognizer.pipe should delegate to the EntityRecognizer.predict and EntityRecognizer.set_annotations methods.

NameTypeDescription
streamiterableA stream of documents.
batch_sizeintThe number of texts to buffer. Defaults to 128.
n_threadsint The number of worker threads to use. If -1, OpenMP will decide how many to use at run time. Default is -1.
yieldsDocProcessed documents in the order of the original text.

EntityRecognizer.predict

Apply the pipeline's model to a batch of docs, without modifying them.

NameTypeDescription
docsiterableThe documents to predict.
returns-Scores from the model.

EntityRecognizer.set_annotations

Modify a batch of documents, using pre-computed scores.

NameTypeDescription
docsiterableThe documents to modify.
scores-The scores to set, produced by EntityRecognizer.predict.

EntityRecognizer.update

Learn from a batch of documents and gold-standard information, updating the pipe's model. Delegates to EntityRecognizer.predict and EntityRecognizer.get_loss.

NameTypeDescription
docsiterableA batch of documents to learn from.
goldsiterableThe gold-standard data. Must have the same length as docs.
dropintThe dropout rate.
sgdcallable The optimizer. Should take two arguments weights and gradient, and an optional ID.
lossesdict Optional record of the loss during training. The value keyed by the model's name is updated.

EntityRecognizer.get_loss

Find the loss and gradient of loss for the batch of documents and their predicted scores.

NameTypeDescription
docsiterableThe batch of documents.
goldsiterableThe gold-standard data. Must have the same length as docs.
scores-Scores representing the model's predictions.
returnstupleThe loss and the gradient, i.e. (loss, gradient).

EntityRecognizer.begin_training

Initialize the pipe for training, using data exampes if available. If no model has been initialized yet, the model is added.

NameTypeDescription
gold_tuplesiterable Optional gold-standard annotations from which to construct GoldParse objects.
pipelinelist Optional list of Pipe components that this component is part of.

EntityRecognizer.use_params

Modify the pipe's model, to use the given parameter values.

NameTypeDescription
params- The parameter values to use in the model. At the end of the context, the original parameters are restored.

EntityRecognizer.to_disk

Serialize the pipe to disk.

NameTypeDescription
pathunicode or Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

EntityRecognizer.from_disk

Load the pipe from disk. Modifies the object in place and returns it.

NameTypeDescription
pathunicode or Path A path to a directory. Paths may be either strings or Path-like objects.
returnsEntityRecognizerThe modified EntityRecognizer object.

EntityRecognizer.to_bytes

Serialize the pipe to a bytestring.

NameTypeDescription
**exclude-Named attributes to prevent from being serialized.
returnsbytesThe serialized form of the EntityRecognizer object.

EntityRecognizer.from_bytes

Load the pipe from a bytestring. Modifies the object in place and returns it.

NameTypeDescription
bytes_databytesThe data to load from.
**exclude-Named attributes to prevent from being loaded.
returnsEntityRecognizerThe EntityRecognizer object.