higlass/clodius

Clodius aggregate with custom assembly: TypeError: Can't broadcast

liz-is opened this issue · 3 comments

Hi there,

I'm working with Drosophila data, aligned to dm6 from Flybase, which is in an Ensembl-like format (i.e., no 'chr' prefix'). Because negspy only has UCSC-like assemblies included, using --assembly dm6 I get errors like KeyError: 'X'.

So, I'm using --chromsizes-filename to specify a file that contains chrom sizes for my genome version and for only the main chromosomes, since my bedgraph has already been filtered to have only the main chromosomes. Here's the command I'm running and the output:

clodius aggregate bedgraph test_Rep1_10kb_corrected_pc.eigenvector.bed \
--output-file test_Rep1_10kb_corrected_pc.eigenvector.hitile \
--chromosome-col 1 --from-pos-col 2 --to-pos-col 3 --value-col 5 \
--chromsizes-filename dm6_chrom_sizes_sanitized.txt  --nan-value nan --no-header
output file: test_Rep1_10kb_corrected_pc.eigenvector.hitile
assembly_size: 137547960
assembly: hg19
assembly size (max-length) 137547960
max-width 268435456
max_zoom: 18
chunk-size: 16777216
chrom-order [b'2L' b'2R' b'3L' b'3R' b'4' b'X' b'Y']
len(values): 110458336 16777216
line: X	1	120000	A	0.0	.

position: 1 progress: 0.00 elapsed: 8.87 remaining: 1220465716.46
len(data_buffers[curr_zoom]) 16777216
positions[curr_zoom]: 0
len(values): 93681120 16777216
line: X	1	120000	A	0.0	.

[some output removed]

Traceback (most recent call last):
  File "/home/research/vaquerizas/liz/test/env/bin/clodius", line 11, in <module>
    load_entry_point('clodius==0.10.8', 'console_scripts', 'clodius')()
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/clodius/cli/aggregate.py", line 1322, in bedgraph
    chromsizes_filename, zoom_step)
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/clodius/cli/aggregate.py", line 938, in _bedgraph
    values[:chunk_size], nan_values[:chunk_size]
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/clodius/cli/aggregate.py", line 842, in add_values_to_data_buffers
    dsets[curr_zoom][curr_pos:curr_pos+chunk_size] = curr_chunk
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/h5py/_hl/dataset.py", line 707, in __setitem__
    for fspace in selection.broadcast(mshape):
  File "/home/research/vaquerizas/liz/test/env/lib/python3.7/site-packages/h5py/_hl/selections.py", line 299, in broadcast
    raise TypeError("Can't broadcast %s -> %s" % (target_shape, self.mshape))
TypeError: Can't broadcast (16777216,) -> (3330232,)

Any suggestions would be appreciated! I was wondering if this is also related to #87 ?

Hey, it sounds like you're doing everything right. Would you mind trying to convert to a bigWig and ingesting that instead?

https://docs.higlass.io/data_preparation.html#creating-bigwig-files

We need to either deprecate the clodius aggregate bedgraph functionality or change it to just output bigWig files.

Oh, I didn't realise it was possible to ingest bigwig files directly! Is that new, or did I just completely miss it? I'll give that a try then. It's also nice not to have to create the extra file :)

Ingest bigwig files directly works well as long as I provide an appropriate chrom.sizes file as well, so I'll close this.