Limit scope of compute calculations
tomgallagher opened this issue · 2 comments
This looks like an extremely useful open source library. Congratulations!
I should be able to work this out from the API docs but the lack of examples mean I'm trying and erroring.
Can one limit the scope of the compute
function to only provide, say, the stats output? In order to reduce the time taken for the calculations?
More generally, do you have any advice on how to improve the performance of dataprep
? I'm looking into dask
clusters but any other tips would be appreciated...
Thanks
Hi @tomgallagher , you could try the display
parameter to control what to show/compute. E.g., compute(df, display = ["Stats"])
to compute only stats. Or if you want to disable something you can use the config
, e.g., compute(df, cfg = {"hist.enable": False})
disable all hist. For more configurable parameters please refer to https://github.com/sfu-db/dataprep/blob/develop/docs/source/user_guide/eda/parameter_configurations.ipynb