bio_embeddings.mutagenesis

BETA: in-silico mutagenesis using the substitution probabilities from ProtTrans-Bert-BFD

class bio_embeddings.mutagenesis.ProtTransBertBFDMutagenesis(device: Union[None, str, torch.device] = None, model_directory: Optional[str] = None, half_precision_model: bool = False)[source]

BETA: in-silico mutagenesis using BertForMaskedLM

__init__(device: Union[None, str, torch.device] = None, model_directory: Optional[str] = None, half_precision_model: bool = False)[source]

Loads the Bert Model for Masked LM

device: torch.device
get_sequence_probabilities(sequence: str, temperature: float = 1, start: Optional[int] = None, stop: Optional[int] = None, progress_bar: Optional[tqdm.std.tqdm] = None) List[Dict[str, float]][source]

Returns the likelihood for each of the 20 natural amino acids to be at residue positions between start and end considering the context of the remainder of the sequence (aka: by using. BERT’s mask token and reconstructing the corrupted sequence). Probabilities may be adjusted by a temperature factor. If set to 1 (default) no adjustment is made.

Parameters
  • sequence – The amino acid sequence. Please pass whole sequences, not regions

  • start – the start index (inclusive) of the region for which to compute residue probabilities (starting with 0)

  • stop – the end (exclusive) of the region for which to compute residue probabilities

  • temperature – temperature for the softmax computation

  • progress_bar – optional tqdm progress bar

Returns

An ordered list for the region of probabilities for each of the 20 natural amino acids to be at said

position.

model: transformers.models.bert.modeling_bert.BertForMaskedLM
tokenizer: transformers.models.bert.tokenization_bert.BertTokenizer
bio_embeddings.mutagenesis.probabilities_as_dataframe(mapping_file: pandas.core.frame.DataFrame, probabilities_all: Dict[str, List[Dict[str, float]]], sequences: List[str]) pandas.core.frame.DataFrame[source]

Let’s build a csv with all the data

bio_embeddings.mutagenesis.run(**kwargs)[source]

BETA: in-silico mutagenesis using BertForMaskedLM

optional (see extract stage for details):
  • model_directory

  • device

  • half_precision

  • half_precision_model

  • temperature: temperature for softmax