genomejs/dna2json

Extremely Long Parse Time

Closed this issue · 8 comments

I tried to parse my genome from the text file downloaded from 23andMe. It appears to get stuck while parsing the SNPs. I let the script run for 3 hours and it appeared to be stuck on "Parsed 960613 SNPs".

Is there an error log or debug feature that I can use to find out what is happening? The example listed in the readme says "This will take awhile..." What is the average time it takes to parse a genome?

Some other information that may be useful:
genome file downloaded from 23andMe is 24 MB
After stopping the data2json process after 3 hours, the json file was 34 MB.

Thanks for any help you can provide

Parsing the genome isn't usually what takes the longest. The entire file may be parsed but for some reason streaming it through JSON.stringify() to the fs is what ends up taking an insane amount of time. Unsure of what a good solution would be, but maybe just using a synchronous JSON.stringify for the CLI would be a better bet as streaming may add some unnecessary overhead for what we are trying to do.

"maybe just using a synchronous JSON.stringify for the CLI would be a better bet"

that sounds great... but how do I do that?

is there a better way to convert my 23andMe txt file to json?

@johnstonian I'll see if I can make the perf any better. Will update with progress

@contra thanks! I'm very excited about this project. Thanks for making it available!

@contra how did you get your txt file to convert to json in a timely fashion?

@johnstonian I didn't, I waited for a long time.

just rewrote the whole thing, should be much faster now

+1 !!!