Handling Missing Values and Outliers
Closed this issue · 4 comments
Is this a unique feature?
- I have checked "open" AND "closed" issues and this is not a duplicate
Is your feature request related to a problem/unavailable functionality? Please describe.
Feature Request: Handling Missing Values and Outliers
Problem
Fine-tuning models is challenging due to missing values and outliers, which affect performance and prediction accuracy.
Request
Provide better support for detecting and handling missing values and outliers during data preprocessing for fine-tuning.
Proposed Solution
Proposed Solution
Add automatic or customizable ways to handle missing values and outliers during preprocessing. This could include:
Missing values: options for filling in missing data (like mean or median) or let users choose custom ways to handle them.
Outliers: tools to detect and deal with outliers (like Z-score or IQR) and methods to either remove or adjust them.
Easy to use: Make these features simple to access while fine-tuning, so users don’t have to spend a lot of time on data cleaning.
This would improve model performance and save time for users during fine-tuning.
Screenshots
Do you want to work on this issue?
Yes
If "yes" to above, please explain how you would technically implement this (issue will not be assigned if this is skipped)
Implementation Plan
- Use pandas and scikit-learn for missing values. For example, SimpleImputer to fill missing data:
from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') data = imputer.fit_transform(data)
- Detect outliers using Z-score or IQR methods with scipy:
from scipy import stats z_scores = stats.zscore(data) outliers = (z_scores > 3)
- Integrate these into a preprocessing pipeline with scikit-learn's Pipeline for easy fine-tuning.
Ensure the issue is not similar or previously being worked on.Thanks for your time
@rohitinu6
Kindly assign me this task, as I am familiar with handling it.
Kindly add the gssoc-ext, hacktoberfest, and hacktoberfest-accepted labels to this issue. @rohitinu6