Error retrieving a record from the database
Closed this issue · 2 comments
Pipeline is failing because it is trying to retrieve a record from the database that doesn't exist.
For example, sequences for lineage XDT were inserted into the database, however, the cluster information was not.
Originally the thought was that the lineage assignment changed for a sequence in a recent provisions file. However, that does not seem to be the case.
@GopiGugan currently rebuilding database, will investigate whether cluster information is reproducibly failing to be inserted for XDT
For example, sequences for lineage XDT were inserted into the database, however, the cluster information was not.
I believe there isn't a cluster record for XDT since the XDT records were previously filtered in the filter_problematic
function.
covizu/covizu/utils/gisaid_utils.py
Lines 196 to 200 in 0631f5e
Lineages that appear in by_lineage
but were not previously inserted into the clusters
table should be processed again:
diff --git a/batch.py b/batch.py
index 878ac70..91641d8 100644
--- a/batch.py
+++ b/batch.py
@@ -435,7 +435,17 @@ if __name__ == "__main__":
SELECT DISTINCT LINEAGE FROM NEW_RECORDS;
'''
CUR.execute(UPDATED_LINEAGES_QUERY)
- UPDATED_LINEAGES = [row['lineage'] for row in CUR.fetchall()]
+ new_records_lineages = [row['lineage'] for row in CUR.fetchall()]
+
+ by_lineage_list = list(by_lineage.keys())
+ clusters_lineages_query = '''
+ SELECT DISTINCT LINEAGE FROM CLUSTERS;
+ '''
+ CUR.execute(clusters_lineages_query)
+ clusters_lineages = [row['lineage'] for row in CUR.fetchall()]
+ unique_by_lineage = list(set(by_lineage_list) - set(clusters_lineages))
+
+ UPDATED_LINEAGES = list(set(new_records_lineages).union(set(unique_by_lineage)))