slow load_snv_data
marcjwilliams1 opened this issue · 3 comments
I think load_snv_data
has become very slow after some recent refactoring. I get the following warning message which might be helpful.
scgenome/scgenome/loaders/snv.py:274: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
So looks like the types are unknown for one of the columns, which I imagine would make reading the csv file slow.
Reading the copy number data is fine.
I'm using the most up to date version.
perhaps #14 fixes this?
@marcjwilliams1 Unfortunately, I don't believe it does. #14 is just changing test_load_pseudobulk.py
I'm not sure if this is actually a problem in the end. It takes ~1 hour to load the fitness pseudobulks (eg SC-2655) I may just not have noticed before.
Most of the time is due to summing the counts and reading in the genotyping files so it's not to do with the warning messages. I'll close this issue.