bio_embeddings.utilities¶
Various helpers
-
exception
bio_embeddings.utilities.
InvalidParameterError
[source]¶ Exception for invalid parameter settings
-
bio_embeddings.utilities.
check_required
(params: dict, keys: List[str])[source]¶ Verify if required set of parameters is present in configuration
- paramsdict
Dictionary with parameters
- keyslist-like
Set of parameters that has to be present in params
MissingParameterError
-
bio_embeddings.utilities.
get_device
(device: Union[None, str, torch.device] = None) → torch.device[source]¶ Returns what the user specified, or defaults to the GPU, with a fallback to CPU if no GPU is available.
-
bio_embeddings.utilities.
get_model_directories_from_zip
(model: Optional[str] = None, directory: Optional[str] = None, overwrite_cache: bool = False) → str[source]¶ If the specified asset directory for the model is in the user cache, returns the directory path, otherwise downloads the zipped directory, unpacks in the cache and returns the location
-
bio_embeddings.utilities.
get_model_file
(model: Optional[str] = None, file: Optional[str] = None, overwrite_cache: bool = False) → str[source]¶ If the specified asset for the model is in the user cache, returns the location, otherwise downloads the file to cache and returns the location
-
bio_embeddings.utilities.
read_fasta
(path: str) → List[Bio.SeqRecord.SeqRecord][source]¶ Helper function to read FASTA file.
- Parameters
path – path to a valid FASTA file
- Returns
a list of SeqRecord objects.
-
bio_embeddings.utilities.
reindex_h5_file
(h5_file_path: str, mapping_file_path: str)[source]¶ Will rename the dataset keys using the “original_id” from the mapping file. This operation is generally considered unsafe, as the “original_id” is unsafe (may contain invalid characters, duplicates, or empty strings).
Some sanity checks are performed before starting the renaming process, but generally applying this function is discouraged unless you know what you are doing.
- Parameters
h5_file_path – path to the hd5_file to re-index
mapping_file_path – path to the mapping file (this must have the first column be the current keys, and a column “original_id” as the new desired id)
- Returns
Nothing – conversion happens in place!
-
bio_embeddings.utilities.
reindex_sequences
(sequence_records: List[Bio.SeqRecord.SeqRecord], simple=False) -> (<class 'Bio.SeqRecord.SeqRecord'>, <class 'pandas.core.frame.DataFrame'>)[source]¶ Function will sort and re-index the sequence_records IN PLACE! (change the original list!). Returns a DataFrame with the mapping.
- Parameters
sequence_records – List of sequence records
simple – Boolean; if set to true use numerical index (1,2,3,4) instead of md5 hash
- Returns
A dataframe with the mapping with key the new ids and a column “original_id” containing the previous id, and the sequence length.