scikit

Vectors
Store, save and load word vectors.

Vectors data is kept in the Vectors.data attribute, which should be an instance of numpy.ndarray (for CPU vectors) or cupy.ndarray (for GPU vectors).

Vectors.__init__

Create a new vector store. To keep the vector table empty, pass data_or_width=0. You can also create the vector table and add vectors one by one, or set the vector values directly on initialisation.

NameTypeDescription
stringsStringStore or list List of strings, or a StringStore that maps strings to hash values, and vice versa.
data_or_widthnumpy.ndarray[ndim=1, dtype='float32'] or intVector data or number of dimensions.
returnsVectorsThe newly created object.

Vectors.__getitem__

Get a vector by key. If key is a string, it is hashed to an integer ID using the Vectors.strings table. If the integer key is not found in the table, a KeyError is raised.

NameTypeDescription
keyunicode / intThe key to get the vector for.
returnsnumpy.ndarray[ndim=1, dtype='float32']The vector for the key.

Vectors.__setitem__

Set a vector for the given key. If key is a string, it is hashed to an integer ID using the Vectors.strings table.

NameTypeDescription
keyunicode / intThe key to set the vector for.
vectornumpy.ndarray[ndim=1, dtype='float32']The vector to set.

Vectors.__iter__

Yield vectors from the table.

NameTypeDescription
yieldsnumpy.ndarray[ndim=1, dtype='float32']A vector from the table.

Vectors.__len__

Return the number of vectors that have been assigned.

NameTypeDescription
returnsintThe number of vectors in the data.

Vectors.__contains__

Check whether a key has a vector entry in the table. If key is a string, it is hashed to an integer ID using the Vectors.strings table.

NameTypeDescription
keyunicode / intThe key to check.
returnsboolWhether the key has a vector entry.

Vectors.add

Add a key to the table, optionally setting a vector value as well. If key is a string, it is hashed to an integer ID using the Vectors.strings table.

NameTypeDescription
keyunicode / intThe key to add.
vectornumpy.ndarray[ndim=1, dtype='float32']An optional vector to add.

Vectors.items

Iterate over (string key, vector) pairs, in order.

NameTypeDescription
yieldstuple(string key, vector) pairs, in order.

Vectors.shape

Get (rows, dims) tuples of number of rows and number of dimensions in the vector table.

NameTypeDescription
returnstuple(rows, dims) pairs.

Vectors.from_glove

Load GloVe vectors from a directory. Assumes binary format, that the vocab is in a vocab.txt, and that vectors are named vectors.{size}.[fd].bin, e.g. vectors.128.f.bin for 128d float32 vectors, vectors.300.d.bin for 300d float64 (double) vectors, etc. By default GloVe outputs 64-bit vectors.

NameTypeDescription
pathunicode / PathThe path to load the GloVe vectors from.

Vectors.to_disk

Save the current state to a directory.

NameTypeDescription
pathunicode or Path A path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

Vectors.from_disk

Loads state from a directory. Modifies the object in place and returns it.

NameTypeDescription
pathunicode or Path A path to a directory. Paths may be either strings or Path-like objects.
returnsVectorsThe modified Vectors object.

Vectors.to_bytes

Serialize the current state to a binary string.

NameTypeDescription
**exclude-Named attributes to prevent from being serialized.
returnsbytesThe serialized form of the Vectors object.

Vectors.from_bytes

Load state from a binary string.

NameTypeDescription
bytes_databytesThe data to load from.
**exclude-Named attributes to prevent from being loaded.
returnsVectorsThe Vectors object.

Attributes

NameTypeDescription
datanumpy.ndarray / cupy.ndarray Stored vectors data. numpy is used for CPU vectors, cupy for GPU vectors.
key2rowdict Dictionary mapping word hashes to rows in the Vectors.data table.
keysnumpy.ndarray Array keeping the keys in order, such that keys[vectors.key2row[key]] == key