bio_embeddings.utilities

Various helpers

class bio_embeddings.utilities.FileManagerInterface[source]
abstract create_directory(prefix, stage, directory_name)str[source]
abstract create_file(prefix, stage, file_name, extension=None)str[source]
abstract create_prefix(prefix)str[source]
abstract create_stage(prefix, stage)str[source]
abstract exists(prefix, stage=None, file_name=None, extension=None)bool[source]
abstract get_file(prefix, stage, file_name, extension=None)str[source]
class bio_embeddings.utilities.FileSystemFileManager[source]
create_directory(prefix, stage, directory_name)str[source]
create_file(prefix, stage, file_name, extension=None)str[source]
create_prefix(prefix)str[source]
create_stage(prefix, stage)str[source]
exists(prefix, stage=None, file_name=None, extension=None)bool[source]
get_file(prefix, stage, file_name, extension=None)str[source]
exception bio_embeddings.utilities.InvalidParameterError[source]

Exception for invalid parameter settings

bio_embeddings.utilities.check_required(params: dict, keys: List[str])[source]

Verify if required set of parameters is present in configuration

paramsdict

Dictionary with parameters

keyslist-like

Set of parameters that has to be present in params

MissingParameterError

bio_embeddings.utilities.get_device(device: Union[None, str, torch.device] = None) → torch.device[source]

Returns what the user specified, or defaults to the GPU, with a fallback to CPU if no GPU is available.

bio_embeddings.utilities.get_file_manager(**kwargs)[source]
bio_embeddings.utilities.get_model_directories_from_zip(model: Optional[str] = None, directory: Optional[str] = None, overwrite_cache: bool = False)str[source]

If the specified asset directory for the model is in the user cache, returns the directory path, otherwise downloads the zipped directory, unpacks in the cache and returns the location

bio_embeddings.utilities.get_model_file(model: Optional[str] = None, file: Optional[str] = None, overwrite_cache: bool = False)str[source]

If the specified asset for the model is in the user cache, returns the location, otherwise downloads the file to cache and returns the location

bio_embeddings.utilities.read_fasta(path: str) → List[Bio.SeqRecord.SeqRecord][source]

Helper function to read FASTA file.

Parameters

path – path to a valid FASTA file

Returns

a list of SeqRecord objects.

bio_embeddings.utilities.reindex_h5_file(h5_file_path: str, mapping_file_path: str)[source]

Will rename the dataset keys using the “original_id” from the mapping file. This operation is generally considered unsafe, as the “original_id” is unsafe (may contain invalid characters, duplicates, or empty strings).

Some sanity checks are performed before starting the renaming process, but generally applying this function is discouraged unless you know what you are doing.

Parameters
  • h5_file_path – path to the hd5_file to re-index

  • mapping_file_path – path to the mapping file (this must have the first column be the current keys, and a column “original_id” as the new desired id)

Returns

Nothing – conversion happens in place!

bio_embeddings.utilities.reindex_sequences(sequence_records: List[Bio.SeqRecord.SeqRecord], simple=False) -> (<class 'Bio.SeqRecord.SeqRecord'>, <class 'pandas.core.frame.DataFrame'>)[source]

Function will sort and re-index the sequence_records IN PLACE! (change the original list!). Returns a DataFrame with the mapping.

Parameters
  • sequence_records – List of sequence records

  • simple – Boolean; if set to true use numerical index (1,2,3,4) instead of md5 hash

Returns

A dataframe with the mapping with key the new ids and a column “original_id” containing the previous id, and the sequence length.

bio_embeddings.utilities.write_fasta_file(sequence_records: List[Bio.SeqRecord.SeqRecord], file_path: str)None[source]