bccp/nbodykit

support for indexing in newer `dask` release

qezlou opened this issue · 2 comments

Hi Yu,

Would it be a lot of work to update the Dask version support in nbodykit ? the new version supports indexing which is pretty useful for large data catalogs. I'd be happy to help if any is needed!

Thanks,
Mahdi

This sounds good to me. If we need code changes a Pull request is highly welcome.

Let's first clarify the requirements and what needs to be fixed.
nbodykit does not pin to a specific dask version, and I would get the new dask feature automatically if a new version of dask is installed. Did you mean certain use cases in nbodykit is broken with a newer dask version? Is there a reproducer or code snippet?

Thanks Yu!

The issue I had with indexing a long time ago was like not being able to pass boolean indices to a catalog. So for example to get momentum along line-of-sight I thought I had to do cat['Vel']*[0,0,1]. Maybe I was wrong and the nbodykit`s doc had never suggested to do so. However It seems it passes the tests below now, so I am happy with it :) :

def make_fake_cat():
    boxsize=5
    pos = np.random.random((100,3))*boxsize
    with h5py.File('test_cat.hdf5','w') as f:
        f['Subhalos/pos'] = pos

def indexing_cat():
    cat = HDFCatalog('test_cat.hdf5',dataset='Subhalos')
    print(cat['pos'][2:6])
def indexing_cat_use_greater():
    cat = HDFCatalog('test_cat.hdf5',dataset='Subhalos')
    ind1 = np.greater(cat['pos'][:a,1],3)
    ind2 = np.greater(cat['pos'][:,0],3)
    #print(ind.compute())
    print(cat['pos'][ind1*ind2])

make_fake_cat()
indexing_cat()
indexing_cat_use_greater()