KalinNonchev/gnomAD_DB

Limited fields in the gnomAD SQLite database

brettChapman opened this issue · 5 comments

Hi

I've started querying all the fields in gnomAD_DB using your preprocssed gnomAD SQLite v3.1.2. There appears to be only 22 columns in the resulting pandas dataframe, yet there are many more fields in the gnomAD VCF files. For example There is no 'AN_non_topmed_asj_XY' field, yet its in the VCF. Is there a reason why so many were left out?

Thanks.

If I wanted to include more, would I just need to generate my own SQLite DB from the raw VCF and modify the code, say in the YML file here? https://github.com/KalinNonchev/gnomAD_DB/blob/master/gnomad_db/pkgdata/gnomad_columns.yaml

If I were to update the YAML file with more fields, would the SQLite DB need creating again or would it work with the SQLite database you preprocessed earlier?

Hello @brettChapman,

re: Is there a reason why so many were left out?

Since I can upload up to 50 GB on zenodo, I had to preselect the most common annotations.

re: If I wanted to include more, would I just need to generate my own SQLite DB from the raw VCF and modify the code, say in the YML file here?

Yes, you should just modify the yaml file and include the columns you are interested in.

re: If I were to update the YAML file with more fields, would the SQLite DB need to be created again or would it work with the SQLite database you preprocessed earlier?

You would have to start from the beginning since you are going to update the database with new attributes.

Please let me know if you have further questions or comments.
Best,

Please don't hesitate to reopen this GitHub issue if you have any more questions or need further assistance.