ohsu-comp-bio/g2p-aggregator

brca exchange "Pathogenicity_expert" filter

bwalsh opened this issue · 6 comments

The possible values:

"Benign / Little Clinical Significance"
"Likely benign"
"Not Yet Reviewed"
"Pathogenic"
"Uncertain"

We currently filter out "Not Yet Reviewed".
We discussed removing that filter, can you confirm?

@bwalsh That is my opinion. Happy to have @ahwagner @malachig and others chime in as well.

@jgoecks

Made the change - brca now has 17,584 items ( was 5,715 )
One note: unreviewed items do not have a phenotype

+++ b/harvester/brca.py
@@ -22,10 +22,10 @@ def harvest(genes=None):
         else:
             page_num = page_num + 1
             for record in payload['data']:
-                if not record['Pathogenicity_expert'] == 'Not Yet Reviewed':
-                    gene = record['Gene_Symbol']
-                    gene_data = {'gene': gene, 'brca': record}
-                    yield gene_data
+                # if not record['Pathogenicity_expert'] == 'Not Yet Reviewed':
+                gene = record['Gene_Symbol']
+                gene_data = {'gene': gene, 'brca': record}
+                yield gene_data

@bwalsh I get slightly different results when I run this.
source:brca: 17,546
source:brca AND exists:association.phenotype.description: 5,791

Whereas g2p-test shows a slightly higher overall result from BRCA and a slightly lower count with phenotype association. Is this just related to when the harvest was run do you think?

Code looks good, just confused about number variation.

I'll check g2p-test tomorrow ( there were snafus uploading to it )

Per the group discussion today, we're reversing course on this and should exclude "Not Yet Reviewed" variants.

@ahwagner @mayfielg @jgoecks for your review... addressed and deployed at https://g2p-test.ddns.net