/PHAGES2050

PHAGES2050 - a Comprehensive AI-based Framework for Phage Research & Therapy

Primary LanguagePythonMIT LicenseMIT

"Keep calm, use AI for phages and stop AMR"

PHAGES2050 is a novel Python 3.8+ programming language framework to boost bacteriophage research & therapy and infrastructure in order to achieve the full potential to fight against antimicrobial resistant bacteria within Natural Language Processing (NLP) and Deep Learning.

Our project is about developing a AI-based framework for microbiologists and bioinformaticians who hunt, explore and classify phages. Applying the framework will shorten the duration of computational methods required to match phages with bacteria for specific patient cases. Having such organised framework at hand and freely-available will help develop personalized phage therapy and make it accessible to people worldwide.

Watch the PHAVES #3 talk to learn more.

Travis CI codecov Documentation Status PyPI version PyPI license PyPI pyversions Code style Downloads

Table of Contents

Framework modules | Usage | Documentation | Installation | Community and Contributions | Have a question? | Found a bug? | Team | Change log | Code of Conduct | License

Framework modules

crawlers - set of functions responsible for bacteriophages data scraping from different sources (MillardLab, NCBI)
features - set of functions responsible for nucleotides and proteins feature extraction for Machine Learning classification and deeper analysis
embeddings - set of pre-trained Embedding models for nucleotides and proteins vectorization
classifiers - set of pre-trained Machine Learning models dedicated for bacteriophage research
explore - set of data visualization techniques in 2D or 3D dedicated for deeper bacteriophages exploration

Usage

The repository includes numerous examples of using the framework in Jupyter Notebook format (*.ipynb). The most expected ones by the community are listed below:

Crawlers
  • MillardLab bacteriophage crawler
  • NCBI bacteriophages crawlers (planned):
    • taxonomy, host and other expected meta-data;
    • complete genome sequences in FASTA format;
    • set of genes and proteins in FASTA format;
Embeddings
Classifiers
Explore
  • Bacteriophages in 3D space based on:
    • DNA embedding (planned)
    • proteins embedding (planned)
    • biological and biochemical features (planned)
    • custom user features (planned)

Documentation

The official documentation is hosted on ReadTheDocs: https://phages2050.readthedocs.io

Installation

PHAGES2050 can be installed by running:

pip install phages2050

It requires Python 3.8.0+ to run. You can also use Conda:

conda install -c conda-forge phages2050

Install from GitHub

If you can't wait for the latest hotness and want to install from GitHub, use:

pip install git+git://github.com/ptynecki/PHAGES2050

Proteins' embedding

If you want to use Bacteriophage proteins vectorizers then remember to install extra package for proteins embedding:

pip install -U "bio-embeddings[all] @ git+https://github.com/sacdallago/bio_embeddings.git"
pip install git+https://github.com/facebookresearch/esm.git

Community and Contributions

Happy to see you willing to make the PHAGES2050 better. Development on the latest stable version of Python 3+ is preferred. As of this writing it's 3.8. You can use any operating system.

If you're fixing a bug or adding a new feature, add a test with pytest and check the code with Black and mypy. Before adding any large feature, first open an issue for us to discuss the idea with the core devs and community.

Have a question?

Obviously if you have a private question or want to cooperate with us, you can always reach out to us directly via our Phage Directory Slack (channel #PHAGES2050).

Found a bug?

Feel free to add a new issue with a respective title and description on the the PHAGES2050 repository. If you already found a solution to your problem, we would be happy to review your pull request.

Team

Core Developers, Domain Experts, Community Managers and Educators who contributing to PHAGES2050:

  • Piotr Tynecki
  • Yana Minina
  • Iwona Świętochowska
  • Przemysław Mitura
  • Joanna Kazimierczak
  • Arkadiusz Guziński
  • Bogusław Zimnoch
  • Jessica Sacher, PhD
  • Shawna McCallin, PhD
  • Marie-Agnes Petit, PhD
  • Jan Zheng

Change log

The log's will become rather long. It moved to its own file.

See CHANGELOG.md.

Code of Conduct

Everyone interacting in the PHAGES2050 project's development, issue trackers and Slack discussion is expected to follow the Code of Conduct.

License

The PHAGES2050 package and pre-trained models are released under the under terms of the MIT License.