bio_embeddings.extract¶
Methods for predicting properties of proteins, both on a per-residue and per-protein level, including supervised (pre-trained) and unsupervised (nearest neighbour search) methods
- class bio_embeddings.extract.BasicAnnotationExtractor(model_type: str, device: Union[None, str, torch.device] = None, **kwargs)[source]¶
- __init__(model_type: str, device: Union[None, str, torch.device] = None, **kwargs)[source]¶
Initialize annotation extractor. Must define non-positional arguments for paths of files.
- Parameters
secondary_structure_checkpoint_file – path of secondary structure inference model checkpoint file
subcellular_location_checkpoint_file – path of the subcellular location inference model checkpoint file
- get_annotations(raw_embedding: numpy.ndarray) bio_embeddings.extract.basic.basic_annotation_extractor.BasicExtractedAnnotations [source]¶
- get_secondary_structure(raw_embedding: numpy.ndarray) bio_embeddings.extract.basic.basic_annotation_extractor.BasicSecondaryStructureResult [source]¶
- get_subcellular_location(raw_embedding: numpy.ndarray) bio_embeddings.extract.basic.basic_annotation_extractor.SubcellularLocalizationAndMembraneBoundness [source]¶
- necessary_files = ['secondary_structure_checkpoint_file', 'subcellular_location_checkpoint_file']¶
- bio_embeddings.extract.get_k_nearest_neighbours(pairwise_matrix: numpy.array, k: int = 1) Tuple[List[int], numpy.ndarray] [source]¶
- Parameters
pairwise_matrix – an np.array with columns as queries and rows as targets
k – the number of k-nn’s to return
- Returns
a list of tuples with indices of the nearest neighbour and distance to them (sorted by distance asc.)
- bio_embeddings.extract.pairwise_distance_matrix_from_embeddings_and_annotations(query_embeddings_path: str, reference_embeddings_path: str, metric: str = 'euclidean', n_jobs: int = 1) bio_embeddings.extract.unsupervised_utilities.PairwiseDistanceMatrixResult [source]¶
- Parameters
n_jobs – int, see scikit-learn documentation
metric – Metric to use (string!), see scikit-learn documentation
query_embeddings_path – A string defining a path to an h5 file
reference_embeddings_path – A string defining a path to an h5 file
- Returns
A tuple containing: - pairwise_matrix: the pairwise distances between queries and references - queries: A list of strings defining the queries - references: A list of strings defining the references