Linear Regression Health Costs Calculator (Certification Project)
beaucarnes opened this issue ยท 10 comments
Create project from the Python machine learning certification.
Hello - I just tried my hands on this challenge and have found a couple of things that I am not sure about:
In data science one mostly wants to use the simplest possible model in order to understand the influence of certain parameters better (linear regression over neural net) and there are a lot of implementations that come to mind before using keras/tf for such a model - especially sklearn.
The next thing is the prediction of 'monetary values' is often done using a log10/log approximation since hardly anyone cares about the cents but more about a correct ball-park number e.g.
- the closer to the bell curve it is the 'easier' it is to predict for the model
if one does it this way, then the predictions using either a sklearn.linear_model.LinearRegression or Ridge or Lasso or XGBRegressor look something like this :
and the metrics are similar to these:
Variance-score (R^2): 0.8546
Mean squared error: 0.0247
Root mean squared error: 0.1570
log root mean squared error: 0.0140
mean absolute error: 0.0856
Accuracy: 92.16%
@lefthand3r, thank you for checking out this project and your thorough explanation. Those diagrams are really helpful! These are all things we definitely want to consider before we release this project.
@lefthand3r Thanks for reviewing this and giving your input. Would you be interested in helping to redo this challenge to address the issues you brought up?
@beaucarnes / @scissorsneedfoodtoo Thank you very much for the feedback! It feels awesome to contribute to open source when there is welcoming people like the two of you!
I would love to contribute to the challenge and will also take a look at the other machine learning challenges later this week.
@beaucarnes please tell me how to help.
@lefthand3r Can you update the instruction and solution to include your suggestions including sklearn?
It will take me a couple of days but I will suggest something
@lefthand3r, awesome, looking forward to your suggestions.
Please find my first draft of the challenge here:
https://colab.research.google.com/drive/1W_7_ztx8ahU_8MwUKq1_9Pbjobke-Tf7
@lefthand3r, thank you for your patience and for all of your hard work on this draft! I know very little about data science and machine learning, so I won't be able to make very helpful suggestions with this project. But I read through your descriptions, ran all the cells, and everything LGTM as far as I can tell.
Could you take a look at this @beaucarnes?
@maikroservice Thanks for putting this together. I really like what you've done. I'm trying to think of the best way to turn this into a project for people to complete while guiding them in the correct direction. Are you available to get on a call to discuss this since I think you would have some good insights.