Interfaces¶
-
class
ffp.vocab.vocab.
Vocab
[source]¶ Bases:
abc.ABC
Finalfusion vocabulary interface.
Vocabs provide at least a simple string to index mapping and index to string mapping. Vocab is the base type of all vocabulary types.
-
abstract property
words
¶ Get the list of known words
- Returns
words – list of known words
- Return type
List[str]
-
abstract property
word_index
¶ Get the index of known words
-
abstract property
idx_bound
¶ The exclusive upper bound of indices in this vocabulary.
- Returns
idx_bound – Exclusive upper bound of indices covered by the vocabulary.
- Return type
-
abstract
idx
(item: str, default: Union[List[int], int, None] = None) → Union[List[int], int, None][source]¶ Lookup the given query item.
This lookup does not raise an exception if the vocab can’t produce indices.
- Parameters
item (str) – The query item.
default (Optional[Union[int, List[int]]]) – Fall-back value to return if the vocab can’t provide indices.
- Returns
index –
int
if there is a single index for a known itemlist
of indices if the vocab can provide subword indices for a unknown item. Thedefault
item if the vocab can’t provide indices.- Return type
-
abstract property
-
class
ffp.vocab.subword.
SubwordVocab
[source]¶ Bases:
ffp.vocab.vocab.Vocab
Interface for vocabularies with subword lookups.
-
idx
(item: str, default=None) → Union[List[int], int, None][source]¶ Lookup the given query item.
This lookup does not raise an exception if the vocab can’t produce indices.
- Parameters
item (str) – The query item.
default (Optional[Union[int, List[int]]]) – Fall-back value to return if the vocab can’t provide indices.
- Returns
index –
int
if there is a single index for a known itemlist
of indices if the vocab can provide subword indices for a unknown item. Thedefault
item if the vocab can’t provide indices.- Return type
-
property
idx_bound
¶ The exclusive upper bound of indices in this vocabulary.
- Returns
idx_bound – Exclusive upper bound of indices covered by the vocabulary.
- Return type
-
property
min_n
¶ Get the lower bound of the range of extracted n-grams.
- Returns
min_n – lower bound of n-gram range.
- Return type
-
property
max_n
¶ Get the upper bound of the range of extracted n-grams.
- Returns
max_n – upper bound of n-gram range.
- Return type
-
abstract property
subword_indexer
¶ Get this vocab’s subword Indexer.
The subword indexer produces indices for n-grams.
In case of bucket vocabularies, this is a hash-based indexer (
FinalfusionHashIndexer
,FastTextIndexer
). For explicit subword vocabularies, this is anExplicitIndexer
.- Returns
subword_indexer – The subword indexer of the vocabulary.
- Return type
-
subwords
(item: str, bracket: bool = True) → List[str][source]¶ Get the n-grams of the given item as a list.
The n-gram range is determined by the min_n and max_n values.
- Parameters
item (str) – The query item to extract n-grams from.
bracket (bool) – Toggles bracketing the item with ‘<’ and ‘>’ before extraction.
- Returns
ngrams – List of n-grams.
- Return type
List[str]
-
subword_indices
(item: str, bracket: bool = True) → List[int][source]¶ Get the subword indices for the given item.
This list does not contain the index for known items.
- Parameters
item (str) – The query item.
bracket (bool) – Toggles bracketing the item with ‘<’ and ‘>’ before extraction.
- Returns
indices – The list of subword indices.
- Return type
List[int]
-