Limited fields in the gnomAD SQLite database
brettChapman opened this issue · 5 comments
Hi
I've started querying all the fields in gnomAD_DB using your preprocssed gnomAD SQLite v3.1.2. There appears to be only 22 columns in the resulting pandas dataframe, yet there are many more fields in the gnomAD VCF files. For example There is no 'AN_non_topmed_asj_XY' field, yet its in the VCF. Is there a reason why so many were left out?
Thanks.
If I wanted to include more, would I just need to generate my own SQLite DB from the raw VCF and modify the code, say in the YML file here? https://github.com/KalinNonchev/gnomAD_DB/blob/master/gnomad_db/pkgdata/gnomad_columns.yaml
If I were to update the YAML file with more fields, would the SQLite DB need creating again or would it work with the SQLite database you preprocessed earlier?
Hello @brettChapman,
re: Is there a reason why so many were left out?
Since I can upload up to 50 GB on zenodo, I had to preselect the most common annotations.
re: If I wanted to include more, would I just need to generate my own SQLite DB from the raw VCF and modify the code, say in the YML file here?
Yes, you should just modify the yaml file and include the columns you are interested in.
re: If I were to update the YAML file with more fields, would the SQLite DB need to be created again or would it work with the SQLite database you preprocessed earlier?
You would have to start from the beginning since you are going to update the database with new attributes.
Please let me know if you have further questions or comments.
Best,
Please don't hesitate to reopen this GitHub issue if you have any more questions or need further assistance.