bio_embeddings.extract¶
Methods for predicting properties of proteins, both on a per-residue and per-protein level, including supervised (pre-trained) and unsupervised (nearest neighbour search) methods
-
class
bio_embeddings.extract.
BasicAnnotationExtractor
(model_type: str, device: Union[None, str, torch.device] = None, **kwargs)[source]¶ -
get_annotations
(raw_embedding: numpy.ndarray) → bio_embeddings.extract.basic.BasicAnnotationExtractor.BasicExtractedAnnotations[source]¶
-
get_secondary_structure
(raw_embedding: numpy.ndarray) → bio_embeddings.extract.basic.BasicAnnotationExtractor.BasicSecondaryStructureResult[source]¶
-
get_subcellular_location
(raw_embedding: numpy.ndarray) → bio_embeddings.extract.basic.BasicAnnotationExtractor.BasicSubcellularLocalizationResult[source]¶
-
necessary_files
= ['secondary_structure_checkpoint_file', 'subcellular_location_checkpoint_file']¶
-
-
bio_embeddings.extract.
get_k_nearest_neighbours
(pairwise_matrix: numpy.array, k: int = 1) -> (typing.List[int], <built-in function array>)[source]¶ - Parameters
pairwise_matrix – an np.array with columns as queries and rows as targets
k – the number of k-nn’s to return
- Returns
a list of tuples with indices of the nearest neighbour and distance to them (sorted by distance asc.)
-
bio_embeddings.extract.
pairwise_distance_matrix_from_embeddings_and_annotations
(query_embeddings_path: str, reference_embeddings_path: str, metric: str = 'euclidean', n_jobs: int = 1) → bio_embeddings.extract.unsupervised_utilities.PairwiseDistanceMatrixResult[source]¶ - Parameters
n_jobs – int, see scikit-learn documentation
metric – Metric to use (string!), see scikit-learn documentation
query_embeddings_path – A string defining a path to an h5 file
reference_embeddings_path – A string defining a path to an h5 file
- Returns
A tuple containing: - pairwise_matrix: the pairwise distances between queries and references - queries: A list of strings defining the queries - references: A list of strings defining the references