# Add a new language model/embedder * Pick a name, which should be the one you're using in the publication, and a lowercase version with underscores (snake_case). E.g. for one hot encoding, we use `one_hot_encoding`. The class name is the CamelCase version, in this case `OneHotEncodingEmbedder`. Stay consistent where you place the underscores. * Add all new dependencies in `pyproject.toml` in a new extra * Add an entry to `bio_embeddings/utilities/defaults.yml` with a link to the weights. * Create a new class in `bio_embeddings/embed` that at least implements `EmbedderInterface`, or even better (for GPU based models) `EmbedderWithFallback`. The most simple example is `OneHotEncodingEmbedder`, are more realistic example is `ProtTransT5Embedder` and its subclasses. If you add any new options, add them to `KNOWN_EMBED_OPTIONS` in `bio_embeddings/embed/pipeline.py` * Add the class in `bio_embeddings/embed/__init__.py` * The following two are checked by `SKIP_SLOW_TESTS=1 pytest`: * Add the model size in the docs of `bio_embeddings/embed/__init__.py` * Add it to `DEFAULT_MAX_AMINO_ACIDS` * Add it to the tests following the instructions in `tests/test_embedder_embedding.py` * Write a pipeline with your embedder, see that it works * Send a pull request 🚀