Timecodes for "DE Zoomcamp Week #7 Office Hours"
alexeygrigorev opened this issue · 2 comments
Youtube video: https://www.youtube.com/watch?v=AcKecyC-ezM
0:00:00 - Announcement: Homework, video, projects, data set
0:02:29 - Data sets available for project selection
0:05:00 - Considerations for choosing project data sets
0:07:32 - Project: Use Airflow for orchestration
0:10:21 - Use orchestrator for project, consider alternatives
0:12:52 - Structured streaming for data processing options
0:16:02 - Spark MLlib: Not good for machine learning
0:18:37 - Spark ML with big data, Snowflake, Splunk
0:21:06 - Cloud options for data storage
0:23:31 - Various data quality tools for analytics
0:26:50 - Integration tools, Metabase, data catalogs
0:29:23 - Open source tools for data engineering
0:31:56 - Projects > just learning, more productive
0:34:24 - Tools evolve, but some things remain
0:36:41 - Overwhelming options, but fundamentals remain
0:39:26 - Deadlines for Spark and Kafka projects
0:42:13 - Learn Scala or Java for programming
0:45:00 - Data workers need to understand concepts
0:47:24 - Transferable cloud concepts outweigh specific experience
0:50:24 - Critical mindset, visualize DVT docs, data engineering
0:52:55 - Understanding Kafka important, no need Spark
0:55:15 - Install and use Apache Hop locally
0:58:04 - Project deadlines, ask for help
Updated timecodes! Thanks :)