Embeddings¶

Finalfusion Embeddings

class ffp.embeddings.Embeddings(storage: ffp.storage.storage.Storage, vocab: ffp.vocab.vocab.Vocab, norms: Optional[ffp.norms.Norms] = None, metadata: Optional[ffp.metadata.Metadata] = None)[source]¶

Bases: object

Embeddings class.

Embeddings always contain a Storage and Vocab. Optional chunks are Norms corresponding to the embeddings of the in-vocab tokens and Metadata.

Embeddings can be retrieved through three methods:

Embeddings.embedding() allows to provide a default value and returns this value if no embedding could be found.
Embeddings.__getitem__() retrieves an embedding for the query but raises an exception if it cannot retrieve an embedding.
Embeddings.embedding_with_norm() requires a Norms chunk and returns an embedding together with the corresponding L2 norm.

Embeddings are composed of the 4 chunk types:

Storage: either NdArray or QuantizedArray (required)
Vocab, one of SimpleVocab, FinalfusionBucketVocab, FastTextVocab and ExplicitVocab

(required)
Norms
Metadata

Examples

>>> storage = NdArray(np.float32(np.random.rand(2, 10)))
>>> vocab = SimpleVocab(["Some", "words"])
>>> metadata = Metadata({"Some": "value", "numerical": 0})
>>> norms = Norms(np.float32(np.random.rand(2)))
>>> embeddings = Embeddings(storage=storage, vocab=vocab, metadata=metadata, norms=norms)
>>> embeddings.vocab.words
['Some', 'words']
>>> np.allclose(embeddings["Some"], storage[0])
True
>>> try:
...     embeddings["oov"]
... except KeyError:
...     True
True
>>> _, n = embeddings.embedding_with_norm("Some")
>>> np.isclose(n, norms[0])
True
>>> embeddings.metadata
{'Some': 'value', 'numerical': 0}

__init__(storage: ffp.storage.storage.Storage, vocab: ffp.vocab.vocab.Vocab, norms: Optional[ffp.norms.Norms] = None, metadata: Optional[ffp.metadata.Metadata] = None)[source]¶

Initialize Embeddings.

Initializes Embeddings with the given chunks.

Conditions

The following conditions need to hold if the respective chunks are passed.

Chunks need to have the expected type.
vocab.idx_bound == storage.shape[0]
len(vocab) == len(norms)
len(norms) == len(vocab) and len(norms) >= storage.shape[0]

Parameters

storage (Storage) – Embeddings Storage.
vocab (Vocab) – Embeddings Vocabulary.
norms (Norms, optional) – Embeddings Norms.
metadata (Metadata, optional) – Embeddings Metadata.

Raises

AssertionError – If any of the conditions don’t hold.

__getitem__(item: str) → numpy.ndarray [source]¶

Returns an embeddings.

Parameters: item (str) – The query item.
Returns: embedding – The embedding.
Return type: numpy.ndarray
Raises: KeyError – If no embedding could be retrieved.