This project is the result of my participation in UC Berkeley's Career Development Initiative for the Physical Sciences (CDIPS) grad student-organized summer data science workshop.
The annual 3 week workshop pairs tech mentors with a small groups of students to work on real industry data science projects. It's a great opportunity for PhDs transitioning into tech to learn the tools of the trade, visit work spaces, and connect with experts.
- Leader: Adam Kalman, Lead Data Scientist at Kanjoya
- Team: Jade Zhang, Kevin Pollock, Dharshi Devendran, and Shannon Hateley
Every post from Jan-July 2015 (534,305 entries) from the experience project, a social network where users annonymously share life experiences and personal stories.
By asking:
"Which users post to which groups?"
We can learn:
- if bots / trolls are posting / ruining groups
- if posting patterns indicate attrition problems
- if a user can be recommended new groups
- Handling and cleaning large datasets
- Natural Language Processing
- Supervised and unsupervised learning
- Python (ipython notebook, scipy, pandas, numpy)
- R
- Shiny
- data visualization
See this repo's ipython notebooks for analysis