[Enhancement] Separate data preprocessing from plotters
Opened this issue · 1 comments
Separate data preprocessing from plotters
Previously proposed in #81 (comment), it might be good to separate data preprocess (could make them private so users could still input any format, make this invisible from user) from plotters, which could hopefully resolve #131 (comment) too.
Suggestions
Currently almost each plotter accept various types of data, but at the cost of plotter being very complex (and repeated code). I would suggest making plotter itself only handle single (or very few) data type and migrate the following data processing to some dedicated utilities:
- Data type conversion to
numpy.array
orpandas.DataFrame
(or some other preferred type) - Missing value imputation (could wrap
scikit-learn
) - Anomaly value handling (NaN or inifinity)
Potential Impact
I don't expect this to be breaking (or even visible to user), but certainly would be a lot of work as almost the entire code base need to be refactored.
fully on board with this! as i wrote in #81 (comment):
i'd prefer dataframes over arrays as they have a more powerful API
they can also store more metadata (both in column/index names and in df.attrs
) and do a lot of missing value handling automatically