A way to extract specific information from the Carbohydrate-Active enZYmes.
License: GNU GPLv3
If you are using this tool please read and cite the paper!
doi: 10.21105/joss.00053
Also make sure to visit and cite the CAZy website
- http://www.cazy.org/
- Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The Carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. [PMID: 24270786].
cazy-parser is a tool that extract information from CAZy in a more usable and readable format. Firstly, a script reads the HTML structure and creates a mirror of the database as a tab delimited file. Secondly, information is extracted from the database according to user inputted parameters and presented to the user as a set of accession codes.
$ pip install cazy-parser
or
Download latest source from this link
$ tar -zxvf cazy-parser-x.x.x.tar.gz
$ cd cazy-parser-x.x.x
$ python setup.py install
Please note that both steps require an internet conection
- Database creation
$ create_cazy_db
(-h for help)
- This script will parse the CAZy database website and create a comma separated table containing the following information:
- Extract sequences
- Based on the previously generated csv table, extract accession codes for a given protein family.
$ extract_cazy_ids --db <database> --family <family code>
(-h for help)
- Optional:
--subfamilies
Create a file for each subfamily, default = False
--characterized
Create a file containing only characterized enzymes, default = False
- Extract all accession codes from family 9 of Glycosyl Transferases.
$ extract_cazy_ids --db CAZy_DB_xx-xx-xxxx.csv --family GT9
This will generate the following files:
GT9.csv
- Extract all accession codes from family 43 of Glycoside Hydrolase, including subfamilies
$ extract_cazy_ids --db CAZy_DB_xx-xx-xxxx.csv --family GH43 --subfamilies
This will generate the following files:
GH43.csv
GH43_sub1.csv
GH43_sub2.csv
GH43_sub3.csv
(...)
GH43_sub37.csv
- Extract all accession codes from family 42 of Polysaccharide Lyases including characterized entries
$ extract_cazy_ids --db CAZy_DB_xx-xx-xxxx.csv --family PL42 --characterized
This will generate the following files:
PL42.fasta
PL42_characterized.fasta
Please refer to CONTRIBUTE.md
None, yet.
If there are any inquires please contact me on rvhonorato at gmail.com