Development Setup¶
bio_embeddings uses poetry to manage dependencies.
Install poetry.
Run
poetry config virtualenvs.in-project true
. This will make sure all python dependencies will be in a folder called.venv
(unless you’re using conda).Clone the repository (
git pull https://github.com/sacdallago/bio_embeddings
)Run
poetry install -E all
. This will create a new virtualenv, which you can activate withpoetry shell
or. .venv/bin/activate
(usedeactivate
to get back to your normal environment). If you’re already in a conda environment, poetry will use that environment instead.To check that the environment is active, open a python console and run
import bio_embeddings
Tests¶
We use pytest to check our code, so can run the tests with pytest
. Running them all is slow however and takes a lot of disk space, so you can use SKIP_SLOW_TESTS=1 pytest
to only run a few fast tests.
Some tests that need RUN_VERY_SLOW_TESTS=1
to be run because they can take a couple of minutes each. FOr example you need RUN_VERY_SLOW_TESTS=1 pytest tests/conservation.py
to run the test of the conservation predictor because it uses the large T5 language model.
To create a new test, either add a new function in an existing file under tests/
, or create a new file starting with test_
in that folder. All functions inside a test_*.py
file starting with test_
are run by pytest.
To get the project root as pathlib.Path, use pytestconfig.rootpath
, where pytest will pass pytestconfig
to your method. Here, we just check the number of entries in test-data/mapping_file.csv
:
from bio_embeddings.utilities import read_mapping_file
def test_mapping_file_length(pytestconfig):
mapping_file_path = str(
pytestconfig.rootpath.joinpath("test-data/mapping_file.csv")
)
mapping_file = read_mapping_file(mapping_file_path)
# Check that the mapping file actually has two rows with data
assert len(mapping_file) == 2
Note that our CI machine doesn’t have a GPU, so the tests still need to pass without a GPU. For tests that need a GPU you can use the following:
import pytest
import torch
@pytest.mark.skipif(
not torch.cuda.is_available(), reason="Can't test the GPU if there isn't any"
)
def test_my_feature():
...
Note that in CI, we skip some embedder tests marked SKIP_NEGLEGTED_EMBEDDER_TESTS
for stale and barely used embedder.