dbt + machine learning: what makes a great baton pass?

Question

dbt + machine learning: what makes a great baton pass?

Closed this issue 3 years ago · 15 comments

What's your key point?
Problem: dbt has done a great job of building an elegant, common interface between data engineers and data analysts: uniting on SQL. As the data industry evolves, there's plenty of pain and room to grow in building that interface between data scientists and data analysts. There isn't a good answer for when things go wrong in the machine learning arena: should the data analyst own fine-tuning the pre-processing data(think: prepping transformed data even more for machine learning models to better work with the data). Should we increase the SQL surface area to build ML models or should we leave that to non-SQL interfaces(python/scala/etc.)? Does this have to be an either/or future?

Key Point: Whatever the interface evolves into, it must center people, create a low bar and high ceiling, and focus on outcomes and not the mystique of features/tools behind a learning curve.

Prior art:
Any other posts that exist on this topic (here or elsewhere).

Link to notes / outline / draft:
Google docs preferred, please set sharing to anyone with the link can view.

Think through first principles: what are the key outcomes to the machine learning workflow vs. transformation workflow? what are the core behaviors shared vs not?
https://fal.ai/
https://mindsdb.com/
https://dvc.org/
https://continual.ai/

Estimated first draft date:
Leave blank if you already shared a draft above.
01/14/2022, Friday: in a google doc

Any open questions / requests for help from the group?

Answer 1 · 2022-01-03T15:45:26.000Z

don't forget about hex!

Answer 2 · 2022-01-03T19:24:36.000Z

Love this one Sung, I can work with you on it when you're ready - as usual I'm curious about stories from your own work that you can build the post around (as so much of the convo on this stuff has been hypothetical so far)

Answer 3 · 2022-01-04T15:32:37.000Z

@krevitt Feel free to check progress in real-time: here

Answer 4 · 2022-01-05T14:49:14.000Z

I wonder if there's two separate posts here:

a step-by-step walkthrough of the optimal baton pass as you've seen it play out (or how you could see it working better based on observing poor baton passes)
walkthroughs of how an individual tool works within that baton pass (almost like a tool unboxing)

feel like tackling both of those in one post is a lot, what do you think?

Answer 5 · 2022-01-05T15:17:26.000Z

@krevitt

Thanks for the suggestion! I'm planning to focus in on observing poor baton passes, pressing into the core behaviors and outcomes that underly the poor baton passes, address conceptually how the tools I've seen so far address the former, and name in an ideal scenario what I'd like to see in a next generation workflow.

I don't plan to do a full tool unboxing. I recommend we leave that as an open question to the readers. Which tool is worth doing a full unboxing for another blog post?

All the above should fit just right in a single blog post. I expect this blog will be 1x-1.5x the dbt and airflow blog post we released together.

Answer 6 · 2022-01-06T17:32:15.000Z

@izzye84 will officially be a co-writer of this blog! He'll bring the machine learning expertise to the table!

Answer 7 · 2022-01-25T20:05:59.000Z

@sungchun12 what do you think as a publication date for this one? and holler whenever you're ready for an editing pass

Answer 8 · 2022-01-25T21:29:31.000Z

@krevitt For publication, let's make it happen on Monday, 2/7/2022.

Emilie made a bunch of comments I'll need to ruminate on. I'll send a meeting invite for an editing pass!

Answer 9 · 2022-02-02T17:07:58.000Z

Link to figma visual outline: https://www.figma.com/file/n47XZkyPt3mfHVyc2nrfXV/dbt-%2B-ML-mind-map?node-id=2%3A614

Answer 10 · 2022-02-09T18:27:37.000Z

@sungchun12 Hey Sung! I gave your content a quick editing pass and made some suggestions for small structural changes. In many ways, the content is great! I would love to see the main ideas come forward & stand out a little more so readers can easily track their progress through the content. Here's a quick summary of my suggested edits to that end:

Replace the questions in the intro with the tool paths you offer as a solution
Give the main takeaway, the main benefit of each tool path, and the tradeoffs their space outside of the narrative so your ideas are more clear, and then you can use the "How does this change my story" as an example of why your solutions are good ones.
Added & suggested some new H2, H3, H4 structure for better readability

I'll just need you to go in, review the changes, and add/adjust some suggestions to fit better with your voice. I left a bunch of comments with my rationale for some of my suggestions, so if you have questions or better approaches, please let me know!

Answer 11 · 2022-02-11T16:29:41.000Z

@johnblust I resolved all the comments and it's ready for another review!

Answer 12 · 2022-02-11T17:07:30.000Z

@johnblust I resolved all the comments and it's ready for another review!

Awesome, I'll review & let you know next steps early next week!! Great job on the fast turnaround @sungchun12

Answer 13 · 2022-02-15T17:57:00.000Z

Izzy and I feel good about this! Keep us updated John!

Answer 14 · 2022-02-16T16:55:45.000Z

@sungchun12 Okay, then we're ready to publish!! Planning to publish this week. I'll let you know when it is live :)

Answer 15 · 2022-02-21T17:41:26.000Z

closing this issue for now in the repo migration, but since we're close to publish i want to make sure we link through to this discussion in the Docs repo dbt-labs/docs.getdbt.com#1158 for continuity and future ML-related follow ups to branch off this.