/sgpy

Python package for creating and interacting with graph databases of protein domains and their genome coordinates

Primary LanguagePythonMIT LicenseMIT

Note: This is all pre-alpha stuff (i.e. being worked on extensively, there will be breaking changes, the repo may be burnt down and rebuilt at any time). Extensive documentation will be made available at a later date when this is ready for general use.

GitHub release (latest SemVer) codecov Linting Continuous Integration Continuous Deployment

Documentation can be found here: https://socialgene.github.io

classes packages

Design

The code is organized under a number of submodules/directories:

  • base: core functions of the library
  • cli: all command line interface code
  • clustermap: used to convert a socialgene object to clustermap json
  • findmybgc
  • hashing
  • hmm: code for working with HMMER
  • neo4j: code for working with SocialGene Neo4j databases
  • parsers: external file parsers (e.g. genbank, fasta, HMMER results, etc)
  • scoring: functions for measuring protein similarity
  • taxonomy
  • utils

Installation with pip

https://pypi.org/project/socialgene

pip install socialgene

Create conda environment and install python package inside

git clone https://github.com/socialgene/sgpy.git
cd sgpy
make create_conda

Build Python package from source

git clone https://github.com/socialgene/sgpy.git
cd sgpy
make install_python

Build local Docker image

git clone https://github.com/socialgene/sgpy.git
cd sgpy
make build_docker_image

Run pytest tests

git clone https://github.com/socialgene/sgpy.git
cd sgpy
make create_conda
make pytest

Run all tests

git clone https://github.com/socialgene/sgpy.git
cd sgpy
make create_conda
make run_ci

User-facing classes

SocialGene()

This is the main class that most other user-facing classes should/do inherit from

FindMyBGC()

SingleProteinSearch()

Common example use cases

Starting with a single input protein and

Starting with a set of proteins (BGC) and

Other

Most of the the classes that describe the structure of SocialGene() (e.g. proteins, domains, loci) live in socialgene/src/socialgene/classes/molbio.py