elephantscale/training-qa

Advanced questions for Hadoop and Spark

Opened this issue · 0 comments

Add a new slide deck or modify existing ones

How to determine the number of buckets
The number of files, what defines it?
How the replication works, how failover works?
Read through the explain plan for a Hive query
How YARN allocates Spark containers
How to size your executor memory
What to look at after running Spark jobs
How to look at YARN logs
Driver memory
Walk through the documentation fast but then spend more time helping understand how things