These are memory based questions for the assessment conducted on Nov 2022
Suggestions
- Practice from these notebooks throroughly and below pdf
- de-mod-0-get-started-with-pyspark-programming
- de-mod-1-get-started-with-databricks-data-science-and-engineering-workspacec. de-mod-2-transform-data-with-spark
- de-mod-3-manage-data-with-delta-lake
- de-mod-4-build-data-pipelines-with-delta-live-tables
- de-mod-5-deploy-workloads-with-databricks-workflows
- Practice "PracticeExam" questions available in this repo.
- Every options in "PracticeExam" question becomes a question in actual exam.
- Read Databricks Certified Associate Data Engineer Exam thoroughly.
- Read Manage data with delta lake
- Read Build pipeline with DLT
Repo link
Questions
- Data not available due to - vacuum or merge or optimize command
- Delta lake becomes - single source of truth based question
- Delta table contains - single/multiple files for history, metadata and data
- Advantage of delta lake over data lake
- Web application is part of - Control plane
- What needs to be done outside repo - pull/push/commit/clone
- Advantage of repo over notebook version - branching
- Delta lake is ACID compliant
- How to avoid duplicates - MERGE
- Question on INSERT OVERWRITE
- Question on Z order for practice exam
- Why Copy into is not working in this code - reference
- Expect or Drop on violation in DLT - reference
- Unity catalog Grant All priviledges - When to use?
- Unity catalog Grant Usage- When to use?
- Advantage of array function
- processingTime = "5 seconds"- refer practice exam
- Practice exam Q36 but Continuos + Production
- Which physical object to create for 10 tables so that other teams can use
- Delete metadatd but retain file - external table
- When "Streaming Live"- refer practice exam
- PII data using comment- Create table
<tbl>
comment "Contains PII" - describe database customer360 to get path
- Adv of gold table over silver table
- Bronze vs raw table
- Practice exam Q31 but which one is silver to bronze code
- How to create dependent task in DLT pipeline
- How to speed up query execution - refer practice exam
- How not to run a particular block of code on Sunday
- Where to see DQ matric in DLT
- Execute DLT from?
- Save cost by using serverless endpoint or control DBU in sql warehouse
- Manager is worried about over costing after project release - how to save cost
- Practice exam Q40
- Reduce cluster cost- add autostop in sql endpoint?
- Practice exam Q1
- Practice exam Q3
- spark.table("mytable") or spark.delta.table("mytable") or spark.sql("mytable") in pyspark
- jdbc driver name for sqlite
- two table = march_transaction and april_transaction. create all_transaction without duplicates = join/merge/union
- Practice exam Q27
- Practice exam Q33
- Check failed status of a task in DLT pipeline?
- Practice exam Q42 - webhook or email alert?
- To speed up query - use cluster pools?
Lessons Learned
No matter what, please attempt the practice exam thoroughly. The answers in each question becomes another question. You will have ample amount of time during assessment.