Notebooks¶
The notebooks in this folder can be executed locally on your machine or on Google Colab (a tool that allows you to run code online). If you run the Notebooks on your own machine, you might ignore the “Colab Initialization” code, but you will have to download the files required by the notebook. If you do run the Notebooks on Colab, you have to execute commands to install the pipeline and download neccessary files, which are the blocks of code followin the “Colab Initialization” header.
Preface¶
From experience within our lab and with collaborators we have created a set of Notebooks that try to address different aspects of what is generally needed. The Notebooks presented here are to be viewed as “building blocks” for your exploratory projects! We’ve tried to keep the notebooks short and to the point. Often, you will need to grab a thing from here and a thing from there.
From the manuscript¶
Purpose |
Colab |
Notebook |
---|---|---|
Basic Protocol 2 and alternates: use deeploc embeddings produced by the pipeline to plot sequence spaces. This is virtually the same as this pipeline example, but here we can tune the UMAP parameters until we obtain a nice graphic to put in a presentation :) . |
||
Basic Protocol 3: train a simple machine learning classifier to predict subcellular localizations training on DeepLoc embeddings. |
Exploring modules from the bio_embeddings
package¶
Purpose |
Colab |
Notebook |
---|---|---|
Use the general purpose embedding objects to embed a sequence passed as string |
||
Use |
||
Embed a sequence and extract annotations using supervised models from |
||
Embed a sequence and transfer GO annotations using unsupervised techniques found in |
Proper use cases from collaborations¶
Purpose |
Colab |
Notebook |
---|---|---|
Embed a few sequences and try out different ideas to see if the embeddings are able to cluster different sequences |
Exploring pipeline output files¶
Purpose |
Colab |
Notebook |
---|---|---|
Open an embedding file, the principal output of a pipeline run |
||
Use embeddings produced by the pipeline to plot sequence spaces. This is virtually similar to using the |
||
Use deeploc embeddings produced by the pipeline to plot sequence spaces. This is virtually the same as this pipeline example, but here we can tune the UMAP parameters until we obtain a nice graphic to put in a presentation :) . |
||
Analyze embedding sets by studying their similarity and transferring annotations |
||
Naïvely plot embeddings to distinguish patterns in your embedded sequences |
Advanced use cases¶
Purpose |
Colab |
Notebook |
---|---|---|
Train a simple machine learning classifier to predict subcellular localizations training on DeepLoc embeddings. |
Utilities¶
Purpose |
Colab |
Notebook |
---|---|---|
Re-index an |
||
Remove identifiers from an annotation file, useful when the pipeline suggest you to do so! :) |