/StoredDataStories

Materials (code & data) for the ODSC East 2020 Virtual Conference workshop: 'Building Data Narratives: An End-to-End Machine Learning Practicum'

Primary LanguageHTML

StoredDataStories

Materials (code & data) for the ODSC East 2020 Virtual Conference workshop: 'Building Data Narratives: An End-to-End Machine Learning Practicum'

Why data narratives?

  • The machine learning campaign is nearing its conclusion. Data has been collected and curated. Features have been engineered. Multiple models have been built and evaluated.
  • Now it's time to communicate the details, conclusions, and actionable insights that are consequences of the machine learning campaign.
  • There are multiple audiences: fellow data scientists who are as interested in the mechanics of the machine learning campaign as they are in the outcome; data engineers who might be tasked with putting the models into production; managers and customers whose main interest is in the outcome, and how the results might be utilized.
  • The purpose of a data narrative is communication, e.g., a notebook that mixes code, text, results, figures, and explanations into one seamless document; an HTML page; a document suitable for publication.
  • During this workshop attendees will start with tabular dataframes, build classification and regression models, document the model building process, and prepare presentations (slides) and documents (PDF) that describe the machine learning campaign. We'll demonstrate how one body of code can be used to prepare notebooks, slides, and documents. There will be an emphasis on tools and techniques that produce well-crafted tables, figures, and plots.
  • Attendees are encouraged to bring data around which they would like to build narratives. The scripts used in the workshop read data in tabular form, with the following format:
Identifier Endpoint VarName-01 VarName-02 VarName-03 ...
ID-01 edpt-01 Var-01-01 Var-01-02 Var-01-03 ...
ID-02 edpt-02 Var-02-01 Var-02-02 Var-02-03 ...
ID-03 edpt-03 Var-03-01 Var-03-02 Var-03-03 ...
... ... ... ... ... ...

Topics (change frequently, as I circle the target)

  • Literate programming
    mixing code, text, results, figures & explanations into one seamless document / presentation
  • Documents
  • Presentations
  • Minimum Valuable Templates (MVT's)
  • interactivity
  • Customizing / branding templates
  • Building dashboards
  • Automated reports ...
    ... using RMarkdown
    ... using Jupyter Notebooks

Outline

  • Building a narrative's ground truth
  • Preparing for code / peer review
  • HTML presentations
  • PDF documents
  • Posters / dashboards / ...

Workshop Schedule

Time (hrs:min) Activity
00:00 - 00:15 Introduction
00:15 - 00:45 Ground Truth
00:45 - 01:00 Practicum I
01:00 - 01:30 Notebooks
01:30 - 01:45 Practicum II
01:45 - 02:15 Presentations
02:15 - 02:30 Practicum III
02:30 - 03:00 Documents
03:00 - 03:15 Practicum IV
03:15 - 03:30 Wrap-Up

Software

  • R
  • Python
  • LaTeX