yohanchatelain/pytracer

Parsing not skipping OneHotEncoder type

Opened this issue · 0 comments

Here is the error log

Exporting...: 305it [01:05,  4.68it/s]
Traceback (most recent call last):/s]
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.8.2/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.8.2/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/project/6003600/niyonx/pytracer/pytracer/main.py", line 59, in <module>
main()
File "/project/6003600/niyonx/pytracer/pytracer/main.py", line 53, in main
pytracer_module_main(args)
File "/project/6003600/niyonx/pytracer/pytracer/main.py", line 30, in pytracer_module_main
main(args)
File "/project/6003600/niyonx/pytracer/pytracer/core/parser.py", line 584, in main
export.export(stats_value)
File "/project/6003600/niyonx/pytracer/pytracer/core/inout/exporter/_hdf5.py", line 263, in export
self.export_arg(row=row,
File "/project/6003600/niyonx/pytracer/pytracer/core/inout/exporter/_hdf5.py", line 160, in export_arg
raw_mean = stats.mean()
File "/project/6003600/niyonx/pytracer/pytracer/core/stats/numpy.py", line 97, in mean
_mean = np.mean(self._data, axis=0, dtype=np.float64)
File "<array_function internals>", line 5, in mean
File "/home/niyonx/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 3419, in mean
return _methods._mean(a, axis=axis, dtype=dtype,
File "/home/niyonx/.local/lib/python3.8/site-packages/numpy/core/_methods.py", line 178, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims, where=where)
TypeError: float() argument must be a string or a number, not 'OneHotEncoder'
Parsing...: 1529it [00:32, 47.20it/s]

This behavior happens for tweedie and poisson test on parsing.

column_trans = ColumnTransformer(
        [
            ("binned_numeric", KBinsDiscretizer(n_bins=10),
                ["VehAge", "DrivAge"]),
            ("onehot_categorical", OneHotEncoder(),
                ["VehBrand", "VehPower", "VehGas", "Region", "Area"]),
            ("passthrough_numeric", "passthrough",
                ["BonusMalus"]),
            ("log_scaled_numeric", log_scale_transformer,
                ["Density"]),
        ],
        remainder="drop",
    )

Here is the code that uses OneHotEncoder()