sfu-db/dataprep

Limit scope of compute calculations

tomgallagher opened this issue · 2 comments

This looks like an extremely useful open source library. Congratulations!

I should be able to work this out from the API docs but the lack of examples mean I'm trying and erroring.

Can one limit the scope of the compute function to only provide, say, the stats output? In order to reduce the time taken for the calculations?

More generally, do you have any advice on how to improve the performance of dataprep? I'm looking into dask clusters but any other tips would be appreciated...

Thanks

Hi @tomgallagher , you could try the display parameter to control what to show/compute. E.g., compute(df, display = ["Stats"]) to compute only stats. Or if you want to disable something you can use the config, e.g., compute(df, cfg = {"hist.enable": False}) disable all hist. For more configurable parameters please refer to https://github.com/sfu-db/dataprep/blob/develop/docs/source/user_guide/eda/parameter_configurations.ipynb