NAL-i5K/tripal_eutils

error creating assembly if no assembly tag on FTP summary

Closed this issue · 2 comments

program is a required column in analysis.

when loading (accidentally, its a bacteria) assembly 185471:

INFO (TRIPAL_EUTILS): Inserting record into Chado: assembly: 185471
[site http://default] [TRIPAL ERROR] [TRIPAL_EUTILS] SQLSTATE[23502]: Not null violation: 7 ERROR:  null value in column "program" violates not-null constraintDETAIL:  Failing row contains (335, ASM70462v1, , null, null, , SAMN02680177, , , 2014-06-11 00:00:00).
______________________________________________________
<FtpPath_GenBank>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/704/625/GCA_000704625.1_ASM70462v1</FtpPath_GenBank>
  <FtpPath_RefSeq>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/704/625/GCF_000704625.1_ASM70462v1</FtpPath_RefSeq>
  <FtpPath_Assembly_rpt/>
  <FtpPath_Stats_rpt/>
  <FtpPath_Regions_rpt/>

no stats FTP path, which is where we get the tag from. Without that we're SOL.

Interestingly genbank still has the info:

Assembly method: MaSuRCA v. 2.0.3.1

Where does it get it from?

the report IS available. its here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/704/625/GCF_000704625.1_ASM70462v1/GCF_000704625.1_ASM70462v1_assembly_report.txt

However... that info isnt in the XML.

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/704/625/GCF_000704625.1_ASM70462v1

soo.... we could try to guess the location of the report if we dont find it? take the refseq path, append the folder name and _assembly_report.txt ?

shoot having this same problem for locust https://www.ncbi.nlm.nih.gov/assembly/GCA_000516895.1 . this means we really have to fix it.

It only has one link:

<FtpPath_GenBank>ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/516/895/GCA_000516895.1_LocustGenomeV1</FtpPath_GenBank>

the report can be found here: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/516/895/GCA_000516895.1_LocustGenomeV1/GCA_000516895.1_LocustGenomeV1_assembly_stats.txt

so seems like the correct thign to do is guess as suggested above.