eead-csic-compbio/get_homologues

Issue with Input files not being detected by Get_homologues

apoorva004 opened this issue · 6 comments

Hi I have been trying to do comparative genome analysis on 49 genomes of my interest. All the files are in Genbank format. When we run the sample folder, it gives the output but when we run the same command with my input folder, it gives an error.
Here is my code:

apoorva004@APOORVAGOEL24:~/apoorvahomo/get_homologues-x86_64-20210828$ ./get_homologues.pl -d Prokka_on_collection_49_gbk
/get_homologues.pl -i 0 -d Prokka_on_collection_49_gbk -o 0 -X 0 -e 0 -f 0 -r 0 -t all -c 0 -z 0 -I 0 -m local -n 2 -M 0 -G 0 -p 0 -C 75 -S 1 -E 1e-05 -F 1.5 -N 0 -B 50 -b 0 -s 0 -D 0 -g 0 -a '0' -x 0 -R 0 -A 0 -P 0

version 28082021
results_directory=/home/apoorva004/apoorvahomo/get_homologues-x86_64-20210828/Prokka_on_collection_49_gbk_homologues
parameters: MAXEVALUEBLASTSEARCH=0.01 MAXPFAMSEQS=250 BATCHSIZE=100 KEEPSCNDHSPS=1
diamond job:0

checking input files...
Use of uninitialized value $comma_input_files in scalar chop at ./get_homologues.pl line 1055.

0 genomes, 0 sequences

Illegal division by zero at ./get_homologues.pl line 1114.

Hi @apoorva004 ,

  1. have you run $ perl instal.pl?

  2. can you please show the results of

    $ ls Prokka_on_collection_49_gbk

  3. and share 3 of those files?

Thanks

3files.zip

have you run $ perl instal.pl? Yes

can you please show the results of

$ ls Prokka_on_collection_49_gbk
We got the same output as above

and share 3 of those files? I have attached 3 files

Hi @apoorva004 , I have renamed your files to match the accepted extensions (see manual):

ls 3files
2512047070.fna.gb  2523533529.fna.gb  2528768230.fna.gb

It should work for you now. I have updated the script (0e47131) to warn users of this issue, thanks for your feedback,
Bruno

Hi @eead-csic-compbio,
Thank you for your help. My issue was resolved. But I need another help from you.
I did core/pan-genome size by sampling genomes using ./get_homologues.pl -d sample_buch_fasta -c:
I got the converged fits for pan and core genomes using this script. However, I was trying to use the output for Calculating cloud, shell and core genomes. But I get result only for the core genes. Rest it shows 0 result.:
./parse_pangenome_matrix.pl -m sample_intersection/pangenome_matrix_t0.tab -I -A -B -a 0 -g 0 -e 0 -p -s 1 -l 0 -x 0 -P 100 -S 0

matrix contains 1084 clusters and 48 taxa

cloud size: 0 list: sample_intersection/pangenome_matrix_t0__cloud_list.txt

shell size: 0 list: sample_intersection/pangenome_matrix_t0__shell_list.txt

soft core size: 1084 list: sample_intersection/pangenome_matrix_t0__softcore_list.txt

core size: 1084 (included in soft core) list: sample_intersection/pangenome_matrix_t0__core_list.txt

using default colors, defined in %COLORS

globals controlling R plots: $YLIMRATIO=1.2

shell bar plots: sample_intersection/pangenome_matrix_t0__shell.png , sample_intersection/pangenome_matrix_t0__shell.pdf , sample_intersection/pangenome_matrix_t0__shell.svg

shell circle plots: sample_intersection/pangenome_matrix_t0__shell_circle.png , sample_intersection/pangenome_matrix_t0__shell_circle.pdf , sample_intersection/pangenome_matrix_t0__shell_circle.svg

pan-genome size estimates (Snipen mixture model PMID:19691844): sample_intersection/pangenome_matrix_t0__shell_estimates.tab

Do I need to re-run my genomes with appropriate script as mentioned in the manual?

Hi @apoorva004 , in order to compute the complete pangenome, including shell and cloud genes, you'll need to compute clusters of all possible sizes with:

get_homologues.pl -t 0