/InsightDataSci

Data on past fellows

Primary LanguageJupyter Notebook

Can a Statistician make a smooth transition to Data Scientist?

Since finishing my doctoral degree in biostatistics in 2015, I have wanted to transition from traditional statistics and data analysis to become a data scientist working for a top company. For many reasons, I felt that my skills and toolkit were already outmoded in this era where data has become like currency. Additionally, I wanted to broaden the scope of my work beyond public health There are many articles that highlight similarities and differences between the two titles. For me, I think the gap is efficient programming skills and working with messier and less predictable data. Arguably, some statisticians do have these skills, but during my own traning

According to my Mango Data Science Radar, I am not as much of a Data Wrangler, as I am an effective Modeller and Communicator. I don't necessarily agree, as I have wrangled with my fair share of structured and unstructured data. However, I recognize that I could benefit from more experience in this area.

image

To close this gap, I have taught myself how to program in both R and Python, relying on help from MOOCs (Massive Open Online Courses), Youtube, examples here on Github, and many other free resources. I've recently considered entering a program to focus on honing my data science portfolio and to get some help facilitating this career transition in a more structured environment. Also important to me is having the option to participate in a program remotely since I am not located in the traditional places where data scientists seem to be in highest demand.

Unlike many other data science career training programs, Insight Data Science provides information on past participants that gives a unique glimpse into what they are looking for in selecting Fellows. Using Python, I scraped the program's web page to assemble a structured data set with information on more than 700 past data science fellows. I summarized the data to compare fellows' information to my own background and interests. Some of the results of my findings are presented here; these discoveries helped me make a decision about applying for the program. The exercise of putting together this summary presented an interesting and insightful data challenge!

Interesting Findings

Academic Background

Most fellows studied non-statistical fields in their graduate study. Most frequently, Fellows obtained a doctoral degree in a physics-related field. This finding reminded me that many other disciplines now require rigorous data handling and computation.

image

A large percentage of Fellows studied at either Stanford University or the University of California, Berkeley; this makes sense given the program's origins in the San Francisco area as well as the high number of companies hiring data scientists in this geographical area. Other popular institutions are listed below.

top_fellow_schools

Fellow Locations

About 87% of Fellows studied or worked in the US, with 42% of all Fellows originating from a school or company in California. This makes sense, as many of the leading tech/data companies that demand data scientists are located in California.

While the majority of Fellows studied or work in the US, about 12% of Fellows work or completed garduate studies in other countries spanning five continents. This was an interesting finding that makes me wonder how many of these Fellows participated in an on-site Fellowship vs. remotely.

Fellow Projects

My dataset includes a field that describes Fellows' capstone projects. As a crude analysis, I created a word cloud from these descriptions. Most Fellows created projects that were used public data sources to find information, discover insights, make predictions, or make recommendations. This gives me ideas on potential projects that I can build. For instance, this analysis might prove helpful to the Insight Data Science team to confirm or refute assumptions about their Fellow network and to recruit future Fellows.

projects_wordcloud_updated

Where Fellows are Hired

The majority of Fellows are hired under a title that includes the phrase "Data Scientist". In some cases, Fellows are hired into roles that appear to have seniority, including terms like "Senior" or "Director" in their title.

top_fellow_titles

Thus far, Facebook has hired the most Fellows from Insight Data Science, followed by LinkedIn and Stitch Fix. The program itself has hired several Fellows to work in various capacities.

top_hiring_companies

Future Work

  • Cross-variable analyses to see if certain Fellows are more likely to end up being hired by particular companies.

  • Deeper text analysis on project titles and descriptions.

  • I have identified some unstructured data on Fellows from other accelerator programs that I hope to compare and combine with these data. I hope to use this information to help other prospective data scientists compare and contrast programs.

Final Thoughts

I went into this endeavor seeking to learn more about data science career training programs to find the right one for me. In the process, I learned a lot about data scraping and discovered some instances where it may not be appropriate (or permissible) to scrape data. There remains a lot more work to be done to discover additional insights into this interesting program. Doing this work has significantly improved my Python programming skills, and challenged me to craft a compelling story.