/HDPCD

This repository contains all the documents related to HDPCD certification.

Primary LanguagePigLatin

Welcome to HDPCD Repository

You can use this repository for preparing the Hortonworks Data Platform Certified Developer certification. The link for the certification is https://hortonworks.com/services/training/certification/exam-objectives/#hdpcd

Following objectives are tested through this certification

## DATA INGESTION
- Import data from a table in a relational database into HDFS
- Import the results of a query from a relational database into HDFS
- Import a table from a relational database into a new or existing Hive table
- Insert or update data from HDFS into a table in a relational database
- Given a Flume configuration file, start a Flume agent
- Given a configured sink and source, configure a Flume memory channel with a specified capacity

## DATA TRANSFORMATION
- Write and execute a Pig script
- Load data into a Pig relation without a schema
- Load data into a Pig relation with a schema
- Load data from a Hive table into a Pig relation
- Use Pig to transform data into a specified format
- Transform data to match a given Hive schema
- Group the data of one or more Pig relations
- Use Pig to remove records with null values from a relation
- Store the data from a Pig relation into a folder in HDFS
- Store the data from a Pig relation into a Hive table
- Sort the output of a Pig relation
- Remove the duplicate tuples of a Pig relation
- Specify the number of reduce tasks for a Pig MapReduce job
- Join two datasets using Pig
- Perform a replicated join using Pig
- Run a Pig job using Tez
- Within a Pig script, register a JAR file of User Defined Functions
- Within a Pig script, define an alias for a User Defined Function
- Within a Pig script, invoke a User Defined Function

## DATA ANALYSIS
- Write and execute a Hive query
- Define a Hive-managed table
- Define a Hive external table
- Define a partitioned Hive table
- Define a bucketed Hive table
- Define a Hive table from a select query
- Define a Hive table that uses the ORCFile format
- Create a new ORCFile table from the data in an existing non-ORCFile Hive table
- Specify the storage format of a Hive table
- Specify the delimiter of a Hive table
- Load data into a Hive table from a local directory
- Load data into a Hive table from an HDFS directory
- Load data into a Hive table as the result of a query
- Load a compressed data file into a Hive table
- Update a row in a Hive table
- Delete a row from a Hive table
- Insert a new row into a Hive table
- Join two Hive tables
- Run a Hive query using Tez
- Run a Hive query using vectorization
- Output the execution plan for a Hive query
- Use a subquery within a Hive query
- Output data from a Hive query that is totally ordered across multiple reducers
- Set a Hadoop or Hive configuration property from within a Hive query

Hope you guys like it. You can visit my LinkedIn profile at https://www.linkedin.com/in/milindjagre/