StringStore

Look up strings by 64-bit hashes. As of v2.0, spaCy uses hash values instead of integer IDs. This ensures that strings always map to the same ID, even from different StringStores.

StringStore.__init__

Create the StringStore.

NameTypeDescription
stringsiterableA sequence of unicode strings to add to the store.
returnsStringStoreThe newly constructed object.

StringStore.__len__

Get the number of strings in the store.

NameTypeDescription
returnsintThe number of strings in the store.

StringStore.__getitem__

Retrieve a string from a given hash, or vice versa.

NameTypeDescription
string_or_idbytes, unicode or uint64The value to encode.
returnsunicode or intThe value to be retrieved.

StringStore.__contains__

Check whether a string is in the store.

NameTypeDescription
stringunicodeThe string to check.
returnsboolWhether the store contains the string.

StringStore.__iter__

Iterate over the strings in the store, in order. Note that a newly initialised store will always include an empty string '' at position 0.

NameTypeDescription
yieldsunicodeA string in the store.

StringStore.add

Add a string to the StringStore.

NameTypeDescription
stringunicodeThe string to add.
returnsuint64The string's hash value.

StringStore.to_disk

Save the current state to a directory.

NameTypeDescription
pathunicode or Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

StringStore.from_disk

Loads state from a directory. Modifies the object in place and returns it.

NameTypeDescription
pathunicode or Path A path to a directory. Paths may be either strings or Path-like objects.
returnsStringStoreThe modified StringStore object.

StringStore.to_bytes

Serialize the current state to a binary string.

NameTypeDescription
**exclude-Named attributes to prevent from being serialized.
returnsbytesThe serialized form of the StringStore object.

StringStore.from_bytes

Load state from a binary string.

NameTypeDescription
bytes_databytesThe data to load from.
**exclude-Named attributes to prevent from being loaded.
returnsStringStoreThe StringStore object.

Utilities

strings.hash_string

Get a 64-bit hash for a given string.

NameTypeDescription
stringunicodeThe string to hash.
returnsuint64The hash.