freeCodeCamp/CurriculumExpansion

Linear Regression Health Costs Calculator (Certification Project)

beaucarnes opened this issue ยท 10 comments

Create project from the Python machine learning certification.

Hello - I just tried my hands on this challenge and have found a couple of things that I am not sure about:

In data science one mostly wants to use the simplest possible model in order to understand the influence of certain parameters better (linear regression over neural net) and there are a lot of implementations that come to mind before using keras/tf for such a model - especially sklearn.

The next thing is the prediction of 'monetary values' is often done using a log10/log approximation since hardly anyone cares about the cents but more about a correct ball-park number e.g.
image - the closer to the bell curve it is the 'easier' it is to predict for the model

if one does it this way, then the predictions using either a sklearn.linear_model.LinearRegression or Ridge or Lasso or XGBRegressor look something like this :
image

and the metrics are similar to these:

Variance-score (R^2): 0.8546
Mean squared error: 0.0247
Root mean squared error: 0.1570
log root mean squared error: 0.0140
mean absolute error: 0.0856
Accuracy: 92.16%

@lefthand3r, thank you for checking out this project and your thorough explanation. Those diagrams are really helpful! These are all things we definitely want to consider before we release this project.

@lefthand3r Thanks for reviewing this and giving your input. Would you be interested in helping to redo this challenge to address the issues you brought up?

@beaucarnes / @scissorsneedfoodtoo Thank you very much for the feedback! It feels awesome to contribute to open source when there is welcoming people like the two of you!

I would love to contribute to the challenge and will also take a look at the other machine learning challenges later this week.

@beaucarnes please tell me how to help.

@lefthand3r Can you update the instruction and solution to include your suggestions including sklearn?

It will take me a couple of days but I will suggest something

@lefthand3r, awesome, looking forward to your suggestions.

@lefthand3r, thank you for your patience and for all of your hard work on this draft! I know very little about data science and machine learning, so I won't be able to make very helpful suggestions with this project. But I read through your descriptions, ran all the cells, and everything LGTM as far as I can tell.

Could you take a look at this @beaucarnes?

@maikroservice Thanks for putting this together. I really like what you've done. I'm trying to think of the best way to turn this into a project for people to complete while guiding them in the correct direction. Are you available to get on a call to discuss this since I think you would have some good insights.