This portfolio is a collection of my work completed independently and as class work to show my skills and abilities in Data Science / Analytics. During college, I gained experience working in team settings and creating data-driven solutions. There, I learned how to: use design visualizations for impact, create machine learning models, and teach myself skills to present actionable results to an audience. I hope you enjoy some of my insights.
Send me an email at: swright22@wooster.edu
DM me on Twitter: https://twitter.com/datagirlz19
Message me on LinkedIn: www.linkedin.com/in/swright22
Check out my blog: www.datagirlz19.github.io
Photo by Fredy Martinez on Unsplash
In 2020, the US census recorded the information of over 48,000 individuals and consolidated the information into the data set provided. We were tasked with creating a model to predict whether someone has an annual income of over $50,000.
To do this, we performed basic data manipulation on the variables in the dataset to analyze the provided information and gain some insight. The dataset mostly included categorical variables, so we decided to look at each unique category. We found out that there were several overlapping variables and decided to modify them to reduce the number of categories, thus reducing potential complications.
The purpose of this project is to predict whether someone has an annual income of over $50,000.
The annual census is taken to help governments provide resources to families in need. However, there are many situations when this data will not be accurate or complete, such as when leadership changes the definition of what a low-income family is or when families do not fill out the census with accurate/complete information. In cases like these, it is important to keep track of low-income families so that the government can find ways to provide them with the necessary aid. This project aims to find households that are low-income (making under $50,000 annually) so that proper support can be given based on specific predictors.
Skills Used:
- Inferential Statistics
- Machine Learning
- Data Visualization
- Predictive Modeling
- R
The purpose of this project is to determine who is more likely to apply to Wooster, and how we can increase these results.
The College of Wooster is a small liberal arts school in Ohio that mainly attracts students due to its prestigious Independent Study Program that draws in students from around the globe. Despite the college's appeal, the college struggles to obtain domestic students, as only 55% of the students that apply to the school are admitted, and only 16% of the students that are admitted to the school accept the offer. We were allowed to analyze a dataset of over 2,000 students in hopes of finding insights into the data and providing small solutions to increase the number of student acceptances.
Link to project: https://github.com/datagirlz19/College-of-Wooster-Admissions-Data
- Inferential Statistics
- Data Visualization
- R
Twitter can be both a resource for finding urgent information and a tool for communicating useless information about sales and petty gossip.
Twitter can be both a resource for finding urgent information such as reporting Natural Disasters (real-time) and asking for help during times of crisis. Hashtags can be useful tools for sifting through the nonsense, but they can also be misused for sales and gossip. This project aims to use Natural Language Processing to determine which messages posted on Twitter can be classified as Natural Disasters or spam.
The purpose of this project is to use NLP to predict whether a tweet is reporting on a natural disaster/crisis or not.
Link to project: https://github.com/datagirlz19/Diaster-Tweets-Classifier
- Inferential Statistics
- Machine Learning
- Natural Language Processing (NLP)
- Random Forrest Classification
- Python
- Pandas, Seaborne, Numpy
- Jupyter Notebooks
- LateX