- Custom pyspark transformer, estimator (Imputer for Categorical Features with mode, Vector Disassembler etc.)
- Impute categorical features with mode
- Disassemble categorical feature into multiple binary columns
- Disassemble vector feature into multiple numeric columns
- Impute NA with constant (string, number or dict)
- Impute categorical features with mode
- Combine with spark 2.3 imputer into savable pipeline
- StringDisassembler vs OneHotEncoderEstimator
- Try VectorDisassembler
- Try ConstantImputer
- Put all custom feature estimators together
./start-notebook.sh