Capstone Project Requirements

Introduction

In this lesson, we'll discuss the requirements for our Capstone Project!

Objectives

You will be able to:

  • Describe all required aspects of the final project
  • Describe what constitutes a successful project

Introduction

Congratulations on making it to the final project! It's been a long journey, but we can finally see the light at the end of the tunnel!

Actual Footage of you seeing the light at the end of the tunnel

Now that you've learned everything we have to teach you, it's time to show off and flex your data science muscles with your own Capstone Project! This project will allow you to showcase everything you've learned as a data scientist to by completing a professional-level data science project of your choosing. This project will be significantly larger than any project you've completed so far, and will be the crown jewel of your portfolio. A strong capstone project is the single most important thing you can do to get the attention of potential employers, so be prepared to put as much effort into this project as possible - the results will be worth it!

Your portfolio brings all the employers to your inbox

Topic Requirements

Your project should develop a data product or analysis related to a single topic. You are completely free to choose any topic that interests you, but keep in mind that you will need to complete this project end-to-end, including sourcing your own data. When choosing a topic, think through these questions:

  • What would I be motivated to work on?
  • What data could I use?
  • How could an individual or organization use my product or findings?
  • What will I be able to accomplish in the time I have available?
  • What challenges do I foresee with this project?

Technical Requirements

Your project must meet the following technical requirements:

  1. No Off-The-Shelf Datasets. This project is a chance for you to highlight your critical thinking and data sourcing skills by finding a good dataset to answer a useful question. You can use a pre-existing dataset, but you should consider combining it with other datasets and/or engineering your own features. The goal is to showcase your ability to find and work with data, so just grabbing a squeaky-clean dataset is out of the question.

  2. Strong Data Exploration, with at least 4 relevant data visualizations. There are few skills that impress employers more than the ability to dive into a new dataset and produce engaging visualizations that communicate important information. For this project, anything worth knowing is worth visualizing. Level up your project by digging into more advanced visualization libraries like seaborn!

  3. Makes use of Supervised Learning. It is great to use Unsupervised Learning techniques as needed in your project (for instance, segmentation with clustering algorithms), but supervised learning should play a central role in answering your question.

  4. Explicitly makes use of a Data Science Process such as OSEMN or CRISP-DM. Select a Data Science Process to use to give structure to your project. Each step in the process should correspond to a section in your Jupyter Notebook.

  5. A well-defined goal with clearly presented results. Your project should provide any background context needed to understand the project you are working on and why it's important. For instance, if you are trying to detect fault lines using Earthquake data, you should review the topic and your dataset so that the reader can understand your work. Similarly, the results of your project should be clearly communicated. Do not just tell your audience the final accuracy of your models--be sure to answer "big picture" questions as well. For instance: Would you recommend shipping this model to production, or is more work needed?

NOTE: Inconclusive results are okay--from a purely scientific perspective, they are no more or less important or valuable than any other kinds of results. If your results are inconclusive, you should discuss what your next steps would be from there. For instance, what do you think it would take to get conclusive results--more data? Different data that was unavailable? Both?

Requirements for Online Students Only

Deliverables

For online students, the deliverables for this project consist of the following three components:

  1. A Jupyter notebook for a presentation.
  • The Jupyter notebook will have two components:
    1. An Abstract section that briefly explains your problem, your methodology, and your findings, and business recommendations as a result of your findings. This section should be 1-2 paragraphs long.
    2. The technical analysis for a data science audience. This detailed technical analysis should explicitly follow a Data Science Process as outlined in the previous section. It should be well-formatted and organized, and should contain all code, visualizations, and detailed explanations/analysis.
  1. An organized README.md file in the GitHub repository containing your project code that describes the contents of the repository. This file should be the source of information for navigating through all the code in your repository.

  2. A blog post showcasing your project, with a focus on your methodology and findings. A well-written blog post about your project will probably be the first thing most recruiters and hiring managers see, so really take the time to polish up this blog post and explain your project, methodology, and findings/business recommendations in a clear, concise manner. This blog post should cover everything important about your project, but remember that your audience for this blog post will largely be non-technical. Your blog post should definitely contain visualizations, code snippets, and anything else you find important, but don't get bogged down trying to explain highly technical concepts. Your blog post should provide a link to the Github repository containing your actual project, for people that want to really dive into the technical aspects of your project.

Rubric

Online students can find a PDF of the rubric for the final capstone project here.

Requirements for On-Campus Students Only

For on-campus students, your project will be evaluated based on the contents of your GitHub repo, which must contain the following three components:

  1. A Jupyter notebook
  2. An README.md file
  3. Presentation slides

The requirements for these components are described in detail in the rubric for the final capstone project here. You can learn how your teacher will use the rubric to review the project here.

Example Student Project

Take a look at this technical report from a Flatiron student that used tweet data to predict the weekly number of flu cases during flu season. Pay attention to how well structured the project is, and how much she relies on great visualizations to tell her story for her. Your explanations don't have to be wordy - a visualization is worth a thousand words!

Summary

The Capstone Project is the most critical part of the program. It gives you a chance to bring together all the skills you've learned into realistic projects and to practice key "business judgement" and communication skills. Most importantly, it provides employers with strong signal about your technical abilities, and allow you to show the world what an amazing Data Scientist you've become!

The projects are serious and important - they can be passed and they can be failed. Take the project seriously, put the time in, ask for help from your peers or instructors early and often if you need it, and treat the review as a job interview and you'll do great. We're rooting for you to succeed and we're only going to ask you to take a review again if we believe that you need to. We'll also provide open and honest feedback so you can improve as quickly and efficiently as possible.