Development Setup

bio_embeddings uses poetry to manage dependencies.

  • Install poetry.

  • Run poetry config virtualenvs.in-project true. This will make sure all python dependencies will be in a folder called .venv (unless you’re using conda).

  • Clone the repository (git pull https://github.com/sacdallago/bio_embeddings)

  • Run poetry install -E all. This will create a new virtualenv, which you can activate with poetry shell or . .venv/bin/activate (use deactivate to get back to your normal environment). If you’re already in a conda environment, poetry will use that environment instead.

  • To check that the environment is active, open a python console and run import bio_embeddings

Tests

We use pytest to check our code, so can run the tests with pytest. Running them all is slow however and takes a lot of disk space, so you can use SKIP_SLOW_TESTS=1 pytest to only run a few fast tests.

Some tests that need RUN_VERY_SLOW_TESTS=1 to be run because they can take a couple of minutes each. FOr example you need RUN_VERY_SLOW_TESTS=1 pytest tests/conservation.py to run the test of the conservation predictor because it uses the large T5 language model.

To create a new test, either add a new function in an existing file under tests/, or create a new file starting with test_ in that folder. All functions inside a test_*.py file starting with test_ are run by pytest.

To get the project root as pathlib.Path, use pytestconfig.rootpath, where pytest will pass pytestconfig to your method. Here, we just check the number of entries in test-data/mapping_file.csv:

from bio_embeddings.utilities import read_mapping_file


def test_mapping_file_length(pytestconfig):
    mapping_file_path = str(
        pytestconfig.rootpath.joinpath("test-data/mapping_file.csv")
    )
    mapping_file = read_mapping_file(mapping_file_path)
    # Check that the mapping file actually has two rows with data
    assert len(mapping_file) == 2

Note that our CI machine doesn’t have a GPU, so the tests still need to pass without a GPU. For tests that need a GPU you can use the following:

import pytest
import torch


@pytest.mark.skipif(
    not torch.cuda.is_available(), reason="Can't test the GPU if there isn't any"
)
def test_my_feature():
    ...

Note that in CI, we skip some embedder tests marked SKIP_NEGLEGTED_EMBEDDER_TESTS for stale and barely used embedder.