bio_embeddings.mutagenesis¶
BETA: in-silico mutagenesis using the substitution probabilities from ProtTrans-Bert-BFD
- class bio_embeddings.mutagenesis.ProtTransBertBFDMutagenesis(device: Union[None, str, torch.device] = None, model_directory: Optional[str] = None, half_precision_model: bool = False)[source]¶
BETA: in-silico mutagenesis using BertForMaskedLM
- __init__(device: Union[None, str, torch.device] = None, model_directory: Optional[str] = None, half_precision_model: bool = False)[source]¶
Loads the Bert Model for Masked LM
- device: torch.device¶
- get_sequence_probabilities(sequence: str, temperature: float = 1, start: Optional[int] = None, stop: Optional[int] = None, progress_bar: Optional[tqdm.std.tqdm] = None) List[Dict[str, float]] [source]¶
Returns the likelihood for each of the 20 natural amino acids to be at residue positions between start and end considering the context of the remainder of the sequence (aka: by using. BERT’s mask token and reconstructing the corrupted sequence). Probabilities may be adjusted by a temperature factor. If set to 1 (default) no adjustment is made.
- Parameters
sequence – The amino acid sequence. Please pass whole sequences, not regions
start – the start index (inclusive) of the region for which to compute residue probabilities (starting with 0)
stop – the end (exclusive) of the region for which to compute residue probabilities
temperature – temperature for the softmax computation
progress_bar – optional tqdm progress bar
- Returns
An ordered list for the region of probabilities for each of the 20 natural amino acids to be at said
position.
- model: transformers.models.bert.modeling_bert.BertForMaskedLM¶
- tokenizer: transformers.models.bert.tokenization_bert.BertTokenizer¶