FinalfusionHashIndexer¶
-
class
ffp.subwords.hash_indexers.
FinalfusionHashIndexer
(bucket_exp=21, min_n=3, max_n=6)¶ FinalfusionHashIndexer
FinalfusionHashIndexer is a hash-based subword indexer. It hashes n-grams with the FNV-1a algorithm and maps the hash to a predetermined bucket space.
N-grams can be indexed directly through the __call__ method or all n-grams in a string can be indexed in bulk through the subword_indices method.
-
buckets_exp
¶ ‘uint64_t’
- Type
buckets_exp
-
idx_bound
¶ Get the exclusive upper bound
This is the number of distinct indices.
- Returns
idx_bound – Exclusive upper bound of the indexer.
- Return type
-
max_n
¶ ‘uint32_t’
- Type
max_n
-
min_n
¶ ‘uint32_t’
- Type
min_n
-
subword_indices
(self, unicode word, uint64_t offset=0, bool bracket=True, bool with_ngrams=False)¶ Get the subword indices for a word.
- Parameters
word (str) – The string to extract n-grams from
offset (int) – The offset to add to the index, e.g. the length of the word-vocabulary.
bracket (bool) – Toggles bracketing the input string with < and >
with_ngrams (bool) – Toggles returning tuples of (ngram, idx)
- Returns
indices – List of n-gram indices, optionally as (str, int) tuples.
- Return type
- Raises
TypeError – If word is None.
-