BigDataHOLExercises

Hands On Labs

Hands on Lab version 1.3 - all docs and scripts.

Big Data Hands On Labs (Abstract)

Story : In this workshop we are going to explore the Big Data Process as it pertains to an insurance business case. Though this is a particular story, the process flow, exercises, methodologies and tools are universal. We will see how the flow of Acquire -> Organize -> Analyze -> Decide can apply to our use case and how the Oracle set of products can assist in achieving our goal.

Problem Statement: We are a banking & insurance company and we need to target a marketing campaign for car insurance. We will not use a scatter-gun approach and will be targeting existing customers who already have a history with our company. We need to find those customers who we are sure will purchase insurance and focus our sales effort on them. We will therefore study a sample of customers who have already decided whether to buy car insurance from our company or not and we will try to create a specific profile for the customers who bought insurance. We will then target the rest of the customers, who haven’t yet decided whether or not to acquire our insurance services, based on the customer profiles we identified through our analysis.

The Methodology: The key to insight is data, as such our first step will be to find what data we can aggregate, and from where, in order to enable the analysis of our customers.

In this workshop we will:

  1. Do a sentiment analysis
  2. Find data related to banking transactions in our production NoSQL database and aggregate it
  3. Find data in our cash accounts system and aggregate it
  4. Find data from our internal credit department and aggregate it
  5. Centralize and merge all aggregated data into our Oracle 12c Database, to create a 360 degree view of the customer
  6. Analyze the merged data to figure out what our target audience for our marketing campaign is

Outline :

• Hadoop Word Count o Introduction to Hadoop Map Reduce o Word Count • Oracle NoSQL Database o Introduction to NoSQL o Insert and retrieve data from the NoSQL Database o Oracle External Tables pointing to NoSQL data o NoSQL and Transactional Data • Pig Exercise o Introduction to Pig o Working with PIG • Hive Coding o Introduction to Hive o Queries with Hive • Working with Cloudera Impala o Impala Concepts and Architecture • Working with the Oracle Loader for Hadoop o Introduction to the Oracle Loader for Hadoop o Loading HDFS data into the Oracle Database • Working with the Oracle SQL Connector for HDFS o Introduction to the Oracle SQL Connector for HDFS o Configuring External Tables stored in HDFS • Oracle ODI and Hadoop • Oracle Big Data SQL o External tables to Hive o External tables using JSON formatting • Introduction to ODI Application Adapter for Hadoop o Setup and Reverse Engineering in ODI o Using ODI to move data from Hive to Hive o Using ODI to move data from Hive to Oracle • XQuery for Hadoop (OXH) o Introduction to Oracle OXH XQuery for Hadoop o XML and XQuery primer o Overview OXH XQuery hands on exercise • Programming with R o Introduction to Enterprise R • Working with Cloudera Search (Solr) o Introducing Cloudera Search o Cloudera Search Features o Cloudera Search Tasks and Processes • Introduction to Spark o Example using Scala o Example using PySpark