- Practice from these notebooks throroughly and below pdf
- de-mod-0-get-started-with-pyspark-programming
- de-mod-1-get-started-with-databricks-data-science-and-engineering-workspacec. de-mod-2-transform-data-with-spark
- de-mod-3-manage-data-with-delta-lake
- de-mod-4-build-data-pipelines-with-delta-live-tables
- de-mod-5-deploy-workloads-with-databricks-workflows
- Practice "PracticeExam" questions available in this repo.
- Every options in "PracticeExam" question becomes a question in actual exam.
- Read Databricks Certified Associate Data Engineer Exam thoroughly.
- Read Manage data with delta lake
- Read Build pipeline with DLT
Repo link
- Data not available due to - vacuum or merge or optimize command
- Delta lake becomes - single source of truth based question
- Delta table contains - single/multiple files for history, metadata and data
- Advantage of delta lake over data lake
- Web application is part of - Control plane
- What needs to be done outside repo - pull/push/commit/clone
- Advantage of repo over notebook version - branching
- Delta lake is ACID compliant
- How to avoid duplicates - MERGE
- Question on INSERT OVERWRITE
- Question on Z order for practice exam
- Why Copy into is not working in this code - reference
- Expect or Drop on violation in DLT - reference
- Unity catalog Grant All priviledges - When to use?
- Unity catalog Grant Usage- When to use?
- Advantage of array function
- processingTime = "5 seconds"- refer practice exam
- Practice exam Q36 but Continuos + Production
- Which physical object to create for 10 tables so that other teams can use
- Delete metadatd but retain file - external table
- When "Streaming Live"- refer practice exam
- PII data using comment- Create table
<tbl>
comment "Contains PII" - describe database customer360 to get path
- Adv of gold table over silver table
- Bronze vs raw table
- Practice exam Q31 but which one is silver to bronze code
- How to create dependent task in DLT pipeline
- How to speed up query execution - refer practice exam
- How not to run a particular block of code on Sunday
- Where to see DQ matric in DLT
- Execute DLT from?
- Save cost by using serverless endpoint or control DBU in sql warehouse
- Manager is worried about over costing after project release - how to save cost
- Practice exam Q40
- Reduce cluster cost- add autostop in sql endpoint?
- Practice exam Q1
- Practice exam Q3
- spark.table("mytable") or spark.delta.table("mytable") or spark.sql("mytable") in pyspark
- jdbc driver name for sqlite
- two table = march_transaction and april_transaction. create all_transaction without duplicates = join/merge/union
- Practice exam Q27
- Practice exam Q33
- Check failed status of a task in DLT pipeline?
- Practice exam Q42 - webhook or email alert?
- To speed up query - use cluster pools?
No matter what, please attempt the practice exam thoroughly. The answers in each question becomes another question. You will have ample amount of time during assessment.