How to acquire the KEGG KO information file?
Opened this issue · 3 comments
Here is the enquiry email sending from a current user which might help those who have similar concerns.
Hi
I tested for one file, generated the ##.species.txt from HSFinder.py.
How do I generate the ##.species_ko.txt ?
Creating Heatmap ?
Hi Glad you have worked it out. The KEGG KO file is acquired from KEGG KO BLAST engine https://www.kegg.jp/ghostkoala/ , which is pretty straightforward to use. Simply submit the protein data and the email to receive file, you will receive the KO file. I also detailed and demonstrate the steps in HSDFinder tutorial, Please find the Step 6 from the link: https://github.com/zx0223winner/HSDFinder/blob/master/Tutorial/Tutorial%20for%20HSDFinder.pdf
g10.t1 K07566
g11.t1
g12.t1
g13.t1
g14.t1
g15.t1 K09481
g16.t1 K00472
Once you have the KO file, you can either compare different thresholds of HSDs in one species in a heatmap or HSDs from different species (if you have respective HSD result file and KO file) in a heatmap (examples attached).
~Xi
Can't we do it by command line version in KEGG ? I had a min of 50,000 protein sequences in each genome. Don't we have access to the command line version of KEGG KO BLAST to speed up the process ?
Did you mean the heatmap or the KO file? The KO file seems can only be acquired from KEGG. If you worried about the speed of online heatmap option in HSFinder web server ( if you have tried, it is actually not that slow. e.g., 10 mins for human genome). I can send you the heatmap script but you might have to be comfortable with command lines environment. ~Xi
I can only submit one job at a time to the KEGG KO BLAST Its very slow, maybe it's because of the big data set perhaps I think.
You can submit KEGG jobs with different emails. It is slow but definitely worth it and is the necessary input file. I could not find an easier way to do it so far.