Machine learning (ML) pipelines consist of several steps to train a model.
- Data collection
- Data cleaning
- Feature extraction (labelling and dimensionality reduction)
- Model validation
- Visualisation
Data collection and cleaning are the primary tasks of any machine learning Pipeline.
House Prices: Advanced Regression Techniques
-
train.csv - the training set
-
test.csv - the test set
-
data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here
-
sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms Data fields Here's a brief version of what you'll find in the data description file.