Odd errors in larger data sets?
Opened this issue · 1 comments
We've been attempting to try out pyclone-vi on our data and we're seeing this weird behavior where it works just fine when we put in like 10-20 variants per sample, but once we put the full list of 300-400 mutations, it balks. We're continuing to troubleshoot to see if it's somehow our HPC or software install environment, but on the off chance this looks familiar to you I thought I'd post the error.
The data input are data from 1 sample at a time, in the right format but there is no tumor content column or error rate column in our datasets. When the script is run, stdout only has: Tumour content column not found. Setting values to 1.0.
, so we know things are getting to the right place and getting read in to that point, but then we're seeing this (again, only when we do not truncate our input data set to a small number of variants):
Traceback (most recent call last):
File "/opt/conda/bin/pyclone-vi", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/cli.py", line 113, in fit
pyclone_vi.run.fit(**kwargs)
File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/run.py", line 29, in fit
log_p_data, mutations, samples = load_data(in_file, density, num_grid_points, precision=precision)
File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/data.py", line 11, in load_data
data, mutations, samples = load_pyclone_data(file_name)
File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/data.py", line 78, in load_pyclone_data
cn, mu, log_pi = cn_priors[(
File "/opt/conda/lib/python3.8/site-packages/pandas/core/generic.py", line 1668, in __hash__
raise TypeError(
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Any gems? Could we have some sort of file parsing issue for a particular variant name (are there certain characters we can't use in a variant ID)? I feel like this is something silly but can't put my finger on it.
Oh. My. Gosh. Just FYI, your code breaks if there is a duplicate mutation_id in a sample's dataset. It doesn't FIX the duplicate, just breaks. SUPER minor, but hey, just FYI for ease of use, perhaps a quick filter for uniqueness OR a mention in the docs. ;) I KNEW it felt like something stupid... and it was...