-
Operating system: Linux or Mac
-
To know your Python version, please open the terminal and type "python2 -V".
-
Linux commands: cat, awk, mkdir, echo, grep, sed, cut, sort, uniq, and for loop.
Usage:
./ICARES.sh [SAMPLE_LIST] [GENE_REGION] [MEF_FILE] [WORK_SPACE] [Conserved?(Yes/No)]
(The multi-sample support will not be used for sites when the "Conserved" option is set to "Yes".)
There are three files need to be prepared before running ICARES.
The first file is the list of the "10-column" pileup files generated by Samtools (v0.1.16) on all samples and filtered by GALAXY pileup parser program (pileup_parser.pl, with parameter settings: 3 9 10 8 20 1 “Yes” “Yes” 2 “Yes” “Yes”)).
/path/to/sample_A.pileup
/path/to/sample_B.pileup
/path/to/sample_C.pileup
...
The second file is the information of genic regions, and the format is as the following:
1 10007376 10007694 + ENSG00000202415
1 100111499 100160097 + ENSG00000099260
1 100163798 100164734 + ENSG00000223656
1 100174259 100232187 - ENSG00000156869
1 100250296 100250441 - ENSG00000201491
...
You may download the data from ENSEMBL Biomart.
The mapping error sets can be downloaded from our FTP site: ftp://treeslab1.genomics.sinica.edu.tw/ICARES/MEF/
If the three files described above have been prepared (eg. "sample.list", "human_ensembl_75.genic_region", and "hsa.MEF"), then we may run ICARES with the following command:
For identifying clustering sites across samples:
> ./ICARES.sh sample.list human_ensembl_75.genic_region hsa.MEF Output "No"
This needs at least two samples in the same species.
For identifying conservation sites across several species:
> ./ICARES.sh sample.list human_ensembl_75.genic_region hsa.MEF Output "Yes"
This needs at least one sample in a specific species, and closes the option of multi-samples support for sites. User should get the list from final results, and compare the editing sites between different species by transforming the coordinates to the same species.
For the result, please check these two files "all_MEFok.sort.all.clustering.re_clustering.nmm" and "nmm.log". The former records the final candidates of editing sites, and the latter is the log file with numbers of candidates for each type.