A toolset being developed for the Build a Cell project to interpret dependencies between genes and enable unit-testing of gene function in both existing and forward-engineered genomes. The near-term goal is to be abel to mine prokaryotic genome information from curated databases like Ecocyc and Kegg and provide ranked lists of candidate protospacers for CRISPRi knockdown (for unit-testing). The current version provides an interface with Ecocyc and extracts E. coli annotations to generate an output spreadsheet consisting of:
-
Promoter ID
-
Promoter orientation
-
Sequence surrounding promoters (specifiable)
-
Genes in each operon (one operon per promoter)
-
Transcription unit ID (one or greater transcription units per promoter)
-
Positive and negative strand protospacer candidates targeting the sequence surrounding promoters
##Current statistics 2152 annotated genes from E. coli integrated (out of expected 4377 in E. coli K-12)
3841 promoters (operons) from E. coli
7 protospacers identified per promoter region on average (searching from 60 bps upstream to 60 bps downstream of +1)
Simply download OrganismBreadBoard.09-a4 from https://github.com/EndyLab/Gene-Mining-Scripts.git or clone the project:
git clone https://github.com/sestaton/sesbio.git
-
Fork it!
-
Create your feature branch:
git checkout -b my-new-feature
-
Commit your changes:
git commit -am 'Add some feature'
-
Push to the branch:
git push origin my-new-feature
-
Submit a pull request!
##Contact Email us at atg@buildacell.io! Website: https://www.buildacell.io