ababaian/serratus

all new nido/cov assemblies

Opened this issue · 2 comments

List of the 990 accessions where there's possibly a new CoV/nido RdRp according to sra_species_table.tsv:
s3://serratus-rayan/pro_new_cov_nido-assembly/all_new_cov_nido.sra

900 could be assembled, here are the:

  • scaffolds.fasta (900) : s3://serratus-rayan/pro_new_cov_nido-assembly/all_new_cov_nido.scaffolds.txt
  • gene_clusters.fasta (900) : s3://serratus-rayan/pro_new_cov_nido-assembly/all_new_cov_nido.gc.txt
  • gene_clusters.checkv_filtered.fasta (898): s3://serratus-rayan/pro_new_cov_nido-assembly/all_new_cov_nido.gc_cv

An immediate take-away is that most of the checkv_filtered assemblies are empty. So I recommend not using them but instead take gene_clusters.fasta or to be even more conservative, the whole scaffolds.fasta.

all contigs having a motifator hit:
s3://serratus-rayan/pro_new_cov_nido-assembly/all_new_cov_nido.scaffolds_motifator.whole_contigs_hits.fasta

hmmsearch results versus Pfam-A:

s3://serratus-rayan/pro_new_cov_nido-assembly/all_new_cov_nido.scaffolds_motifator.whole_contigs_hits.fasta.transeq.faa.*
(ran with hmmsearch -A [.sto] --tblout [.tbl] --domtblout [.domtbl] -o [.hmmsearch_stdout] Pfam-A.hmm [contigs.fa])