CCA175 Exam Preparation

Required Skills

The skills to transfer data between external systems and your cluster. This includes the following:

Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.

Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.

Use metastore tables as an input source or an output sink for Spark applications
Understand the fundamentals of querying datasets in Spark
Filter data using Spark
Write queries that calculate aggregate statistics
Join disparate datasets using Spark
Produce ranked or sorted data

This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.

Supply command-line options to change your application configuration, such as increasing available memory