Data100

The data science lifecycle involves the following general steps:

  1. Question/Problem Formulation:

    1. What do we want to know or what problems are we trying to solve?

    2. What are our hypotheses?

    3. What are our metrics of success?

  2. Data Acquisition and Cleaning:

    1. What data do we have and what data do we need?

    2. How will we collect more data?

    3. How do we organize the data for analysis?

  3. Exploratory Data Analysis:

    1. Do we already have relevant data?

    2. What are the biases, anomalies, or other issues with the data?

    3. How do we transform the data to enable effective analysis?

  4. Prediction and Inference:

    1. What does the data say about the world?

    2. Does it answer our questions or accurately solve the problem?

    3. How robust are our conclusions?