This is a project where I used the ENEM, it's a educational system from Brazil, it's like an entrance exam for college, so I will explain everything in english, but the colums names are in portuguese-BR This project has 5 topics where I will discuss with you eveything and how to make this ,so you just need to open the jupyter notebook using this link(https://colab.research.google.com/github/KazumaShachou/Enem2019_Data/blob/main/enem2019.ipynb) and in seve a copy of it for your repository and enjoy!
#Topics
Here we will to learn how separate the columns that we want to study and how to use them, we will use the library called Pandas from python to do this, we will filter the data we want and make some graphcis that show us in way statistic what we were studying, in this case, about the olderst person and youngest person that made this exam and they grades and if they are training for exam or not.
Here we will go more deeper for to explore about the people that made the exam, examples like where are they from(state) , if this person are poor or not, and we will see some results that show us the social inequality in Brazil just seeing this test
Now we know for analysis about the social inequality, we will try to search why so many people in brazil had a lower grade and where the internet can to influence in results of test,for example if a person that didn't have internet in home had less grades than the another one that had internet in home, and for last we will to compare the grades of the wholes subjects each other, this will show us for example if one person that had a good grade in math, could too had a good grade in essay?
Now we have whole datas that we wanted, we will to apply a inteligent models and train them for to analiyse and try to find the note of math using another subjects how reference, we want to automate our search and prediction and to compare the results of true grades with the result of subjects
For last we will use more inteligent models for to compare with our model from earlier topic and improve the results, we will use a better model called decision tree where we will not to use random numbers with so much frequency and improve our test and train of this model, we will understand also, what is overfit and why this happen
Jupyter notebook or google colab (I used colab) Account in google (if you will to use colab) Know python basic
Pandas, Matplotlib , sklearn, Numpy , Seaborn
http://inep.gov.br/microdados , https://www.alura.com.br/imersao-dados/ , https://pandas.pydata.org , https://scikit-learn.org , https://matplotlib.org