Hooking a deep learning model into spaCy

In this example, we'll be using Keras, as it's the most popular deep learning library for Python. Using Keras, we will write a custom sentiment analysis model that predicts whether a document is positive or negative. Then, we will use it to find which entities are commonly associated with positive or negative documents. Here's a quick example of how that can look at runtime.

For most applications, I it's recommended to use pre-trained word embeddings without "fine-tuning". This means that you'll use the same embeddings across different models, and avoid learning adjustments to them on your training data. The embeddings table is large, and the values provided by the pre-trained vectors are already pretty good. Fine-tuning the embeddings table is therefore a waste of your "parameter budget". It's usually better to make your network larger some other way, e.g. by adding another LSTM layer, using attention mechanism, using character features, etc.

Attribute hooks

Earlier, we saw how to store data in the new generic user_data dict. This generalises well, but it's not terribly satisfying. Ideally, we want to let the custom data drive more "native" behaviours. For instance, consider the .similarity() methods provided by spaCy's Doc , Token and Span objects:

Polymorphic similarity example

span.similarity(doc) token.similarity(span) doc1.similarity(doc2)

By default, this just averages the vectors for each document, and computes their cosine. Obviously, spaCy should make it easy for you to install your own similarity model. This introduces a tricky design challenge. The current solution is to add three more dicts to the Doc object:

NameDescription
user_hooksCustomise behaviour of doc.vector, doc.has_vector, doc.vector_norm or doc.sents
user_token_hooksCustomise behaviour of token.similarity, token.vector, token.has_vector, token.vector_norm or token.conjuncts
user_span_hooksCustomise behaviour of span.similarity, span.vector, span.has_vector, span.vector_norm or span.root

To sum up, here's an example of hooking in custom .similarity() methods:

Add custom similarity hooks

class SimilarityModel(object): def __init__(self, model): self._model = model def __call__(self, doc): doc.user_hooks['similarity'] = self.similarity doc.user_span_hooks['similarity'] = self.similarity doc.user_token_hooks['similarity'] = self.similarity def similarity(self, obj1, obj2): y = self._model([obj1.vector, obj2.vector]) return float(y[0])