# Development Setup bio_embeddings uses [poetry](https://github.com/python-poetry/poetry) to manage dependencies. * Install [poetry](https://github.com/python-poetry/poetry#installation). * Run `poetry config virtualenvs.in-project true`. This will make sure all python dependencies will be in a folder called `.venv` (unless you're using conda). * Clone the repository (`git pull https://github.com/sacdallago/bio_embeddings`) * Run `poetry install -E all`. This will create a new virtualenv, which you can activate with `poetry shell` or `. .venv/bin/activate` (use `deactivate` to get back to your normal environment). If you're already in a conda environment, poetry will use that environment instead. * To check that the environment is active, open a python console and run `import bio_embeddings` ## Tests We use [pytest](https://docs.pytest.org/) to check our code, so can run the tests with `pytest`. Running them all is slow however and takes a lot of disk space, so you can use `SKIP_SLOW_TESTS=1 pytest` to only run a few fast tests. Some tests that need `RUN_VERY_SLOW_TESTS=1` to be run because they can take a couple of minutes each. FOr example you need `RUN_VERY_SLOW_TESTS=1 pytest tests/conservation.py` to run the test of the conservation predictor because it uses the large T5 language model. To create a new test, either add a new function in an existing file under `tests/`, or create a new file starting with `test_` in that folder. All functions inside a `test_*.py` file starting with `test_` are run by pytest. To get the project root as [pathlib.Path](https://docs.python.org/3/library/pathlib.html#basic-use), use `pytestconfig.rootpath`, where pytest will pass `pytestconfig` to your method. Here, we just check the number of entries in `test-data/mapping_file.csv`: ```python from bio_embeddings.utilities import read_mapping_file def test_mapping_file_length(pytestconfig): mapping_file_path = str( pytestconfig.rootpath.joinpath("test-data/mapping_file.csv") ) mapping_file = read_mapping_file(mapping_file_path) # Check that the mapping file actually has two rows with data assert len(mapping_file) == 2 ``` Note that our CI machine doesn't have a GPU, so the tests still need to pass without a GPU. For tests that need a GPU you can use the following: ```python import pytest import torch @pytest.mark.skipif( not torch.cuda.is_available(), reason="Can't test the GPU if there isn't any" ) def test_my_feature(): ... ``` Note that in CI, we skip some embedder tests marked `SKIP_NEGLEGTED_EMBEDDER_TESTS` for stale and barely used embedder.