This will be an overview of your approach with a well-articulated summary that includes your problem statement, outlines your proposed methods and models, defines any risks & assumptions, and includes any revisions from your initial goals & criteria, as needed.
In addition, you'll want to create a local database for your data, import and clean your data. You'll want to describe and create a data dictionary for your dataset(s), and perform your initial exploratory data analysis.
Goal: A summary notebook outlining your project's goals, methods, models, and EDA.
Your work must:
-
Articulate “Specific aim”
-
Outline proposed methods and models
-
Define risks & assumptions
-
Revise initial goals & success criteria, as needed
-
Create local database
-
Describe data cleaning/munging techniques
-
Create a data dictionary
-
Perform & summarize EDA
-
Bonus:
-
Explain how you intend to evaluate your results. What tuning metric and evaluation approaches do you intend to use?
-
Identify 1-2 additional datasets that may help you triangulate your findings. How might these relate to your data?
-
Create a blog post of at least 500 words (and 1-2 graphics!) that describes your assumptions and processes for EDA. Link to it in your Jupyter notebook.
- Materials must be submitted in a clearly labeled Jupyter notebook, including:
- Markdown writeups, code, and visualizations
- Materials must be submitted via a Github PR to the instructor's repo.
- Materials must be submitted by the end of Week 8.
- Don’t hesitate to write throwaway code to solve short term problems
- Read the docs for whatever technologies you use. Most of the time, there is a tutorial that you can follow, but not always, and learning to read documentation is crucial to your success!
- Write pseudocode before you write actual code. Thinking through the logic of something helps.
- Document everything.
Attached here is a complete rubric for this project.
Your instructors will score each of your technical requirements using the scale below:
Score | Expectations
----- | ------------
**0** | _Incomplete._
**1** | _Does not meet expectations._
**2** | _Meets expectations, good job!_
**3** | _Exceeds expectations, you wonderful creature, you!_
This will serve as a helpful overall gauge of whether you met the project goals, but the more important scores are the individual ones above, which can help you identify where to focus your efforts for the next project!