Most of the work was data preprocessing, filtring and cleaning.
I tried 3 different regression algorithms, Linear Regression, RandomForest amd XGBoost.
and the score and the Mean Square Error were the following:
-
Python: Version 3.10
-
Scikit-Learn: Version 1.1.2
-
Pandas: Version 1.4.3
-
Feature engineering was the best thing I learned in this project; I dropped the nan columns because they were misleading for the classification algorithm. Then I replaced the nan values with the coulmn's mean.
-
I splitted the dataset to 70% Train and 30% Test.
-
I tried 3 different regression algorithms and the results was as following:
Contributions are what makes the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Do not forget to give the project a star! Thanks again!
Distributed under the MIT License. See LICENSE.txt
for more information.
- The Kaggle competition
- Via Email : Mahmoud.Nady@Ejust.edu.eg
- Via LinkidIn : https://www.linkedin.com/in/abonady/