IO

This module defines some common IO operations and types.

Chunk is the building block of finalfusion embeddings, each component is serialized as its own, non-overlapping, chunk in finalfusion files.

ChunkIdentifier is a unique integer identifiers for Chunk.

TypeId is used to uniquely identify numerical types.

The Header handles the preamble of finalfusion files.

FinalfusionFormatError is raised upon reading from malformed finalfusion files.

class ffp.io.Chunk[source]

Bases: abc.ABC

Basic building blocks of finalfusion files.

write(file: Union[str, bytes, int, os.PathLike])[source]

Write the Chunk as a standalone finalfusion file.

Parameters

file (str, bytes, int, PathLike) – Output file

Raises

TypeError – If the Chunk is a Header.

abstract static chunk_identifier()ffp.io.ChunkIdentifier[source]

Get the ChunkIdentifier for this Chunk.

Returns

chunk_identifier

Return type

ChunkIdentifier

abstract static read_chunk(file: BinaryIO)ffp.io.Chunk[source]

Read the Chunk and return it.

The file must be positioned before the contents of the Chunk but after its header.

Parameters

file (BinaryIO) – a finalfusion file containing the given Chunk

Returns

chunk – The chunk read from the file.

Return type

Chunk

abstract write_chunk(file: BinaryIO)[source]

Write the Chunk to a file.

Parameters

file (BinaryIO) – Output file for the Chunk

class ffp.io.Header(chunk_ids)[source]

Bases: ffp.io.Chunk

Header Chunk

The header chunk handles the preamble.

property chunk_ids

Get the chunk IDs from the header

Returns

chunk_ids – List of ChunkIdentifiers in the Header.

Return type

List[ChunkIdentifier]

static chunk_identifier()ffp.io.ChunkIdentifier[source]

Get the ChunkIdentifier for this Chunk.

Returns

chunk_identifier

Return type

ChunkIdentifier

static read_chunk(file: BinaryIO)ffp.io.Header[source]

Read the Chunk and return it.

The file must be positioned before the contents of the Chunk but after its header.

Parameters

file (BinaryIO) – a finalfusion file containing the given Chunk

Returns

chunk – The chunk read from the file.

Return type

Chunk

write_chunk(file: BinaryIO)[source]

Write the Chunk to a file.

Parameters

file (BinaryIO) – Output file for the Chunk

ffp.io.find_chunk(file: BinaryIO, chunks: List[ChunkIdentifier]) → Optional[ffp.io.ChunkIdentifier][source]

Find a Chunk in a file.

Looks for one of the specified chunks in the input file and seeks the file to the beginning of the first chunk found from chunks. I.e. the file is positioned before the content but after the header of a chunk.

The Chunk.read_chunk() method can be invoked on the Chunk corresponding to the returned ChunkIdentifier.

This method seeks the input file to the beginning before searching.

Parameters
  • file (BinaryIO) – finalfusion file

  • chunks (List[ChunkIdentifier]) – List of Chunks to look for in the input file.

Returns

chunk_id – The first ChunkIdentifier found in the file. None if none of the chunks could be found.

Return type

Optional[ChunkIdentifier]

class ffp.io.ChunkIdentifier[source]

Bases: enum.IntEnum

Known finalfusion Chunk types.

is_storage()bool[source]

Return if this Identifier belongs to a storage.

Returns

is_storage

Return type

bool

is_vocab()bool[source]

Return if this Identifier belongs to a vocab.

Returns

is_vocab

Return type

bool

class ffp.io.TypeId[source]

Bases: enum.IntEnum

Known finalfusion data types.

exception ffp.io.FinalfusionFormatError[source]

Bases: Exception

Exception to specify that the format of a finalfusion file was incorrect.