/Data-Preprocessing-Template

A simple template to process your data in an optimal form for you to use in machine learning models

Primary LanguagePython

Data-Preprocessing-Template

A simple template to process your data in an optimal form for you to use in machine learning/regression models

Libraries Required

For this template you will need to install the following libraries using pip install libraryName

  • Pandas
  • Numpy
  • sklearn In addition the csv file should have the dependant varaible at the last column

Funtion Description

The code contains one main function namely

dataPreprocess(dataset,test_size,random_state,feature_scaling)

Return Values

This function returns four outputs:

  • X_train: The test set of dependant values needed to train your model which is plotted on the X-axis.
  • y_train: The test set of independant values needed to train your model which is plotted on the Y-axis.
  • X_test: The test set of dependant values for which you shall test your model against.
  • y_test: The test set of dependant values for which you shall test your model against.

Parameters

  • dataset: This is the name of the csv file containing the dataset. Do not forget to add .csv with your file name.
  • test_size: This is to decide the ratio of test data to training data you want to divide your data into. Its default value is 0.8(80%).
  • random_state: This parameter determines how randomly the data is divided. Its default value is 0.
  • feature_scaling: Set to True if you want to standardize your data range of the dependant and independant varaibles.

Citation

Code idea inspired from Machine Learning A-Z™: Hands-On Python & R In Data Science.

Contact

For more details contact me at moonisrasheed18@gmail.com