PoonLab/covizu

Error generating epicov data

GopiGugan opened this issue ยท 4 comments

[covizu@BEVi ~]$ tail batch.log
...
๐Ÿ„ [12:20:25.736335] start BQ.1.1, 73245 entries
๐Ÿ„ [12:27:05.707608] Parsing output files
Failed to retrieve metadata for accession EPI_ISL_18554387

clusters = json.load(infile)
for cluster in clusters:
for variant, samples in cluster['nodes'].items():
revised = []
for coldate, accn, location, name in samples:
md = metadata.get(accn, None)
if md is None:
print("Failed to retrieve metadata for accession {}".format(accn))
sys.exit()
revised.append([name, accn, location, coldate, md['gender'], md['age'], md['status']])
# replace list of samples
cluster['nodes'][variant] = revised
return clusters

Can we grep for this accession number in that provision file?

This accession number existed in a previous provision file, but not in the latest file. Investigating how this accession number is showing up in the current clusters.json file

Sometimes sequences are retracted in the database so the record that appeared in a previous provision file would no longer appear in subsequent files.

covizu/batch.py

Lines 320 to 338 in 22f4f4c

if args.use_db:
# Insert all updated records into the database
for record in result:
cur.execute('''
INSERT INTO CLUSTERS
VALUES (%s, %s)
ON CONFLICT (lineage) DO UPDATE
SET cluster_data = %s
''', [record['lineage'], json.dumps(record), json.dumps(record)])
# Retrieve cluster data for other lineages from the database
for lineage, _ in by_lineage.items():
if lineage not in updated_lineages:
cur.execute("SELECT cluster_data FROM CLUSTERS WHERE lineage = '%s'"%lineage)
cluster_info = cur.fetchone()
if cluster_info is None:
cb.callback("Missing CLUSTERS record for lineage {}".format(lineage), level='ERROR')
sys.exit()
result.append(cluster_info['cluster_data'])