A HIVEQL TASK


Getting Started:


In order to run the queries with the data provided, databricks notebook community edition is recommended.

Installation Instructions:


Installation instructions for new users:

  • Step 1: Hive is a tool that provides SQL querying of data stored in HDFS /HBase. Aside from Python kernel, Databricks offers an SQL notebook. The SQL version is designed to be compatible with Apache Hive i.e. can develop on Databricks in SQL notebooks and then run in Hive. The shell command %sql should be usd to start querying.

  • Step 2: Import the datasets into databricks and create a new notebook.

  • Step 3: Use a shell command %sql then start writing queries to solve the tasks.

Importing datasets into Databricks Notebook

  • Clone the repo and save into your desired location on your machine.
  • Launch Databricks and create clusters
  • Import the csv files in the repo into databricks by browsing files.
  • Copy the commands in the text file to see the solution to the task.