/DS503-Big-Data-Management

Design an interface to allow recursive queries implementation with Hive+Hadoop

Primary LanguagePython

Implementing Recursive Queries on Hive+Hadoop

Dependencies

Recommended: follow these instructions for setup https://www.edureka.co/community/1828/installing-hive-hadoop-in-vm

Test Environment

Tested on Ubuntu 20.04 with Hadoop installed in single cluster mode and Hive installed on top.

Setup

If not already in code directory:

cd code/

Then run:

pip install -r requirements.txt

Now you can run the system:

python3 main.py

Background

Recursive queries (or also called Transitive Closure Queries) are a class of queries that first build an initial answer state (say R), and then use this answer state to execute more queries. The results from each execution get augmented to the answer state. The execution continues until no more records are augmented to the answer state. Here is an example of how a recursive query looks like. recursive

This project is to

  • Build a good understanding on the class of recursive queries and how they work
  • Implement a high-level interface for end-users to enter and submit a recursive query.
  • The interface should parse the query, and decide on the sequence of queries it will generate. And then submit these queries (one at a time) to the Hive engine.
  • Hive does not see the recursive queryitself; it just sees one isolated query at a time. The higher-level interface is controlling the execution of the entire recursive query.

Implementation

The solution was implemented to support command line queries and a graphical user interface. The interface is what will be detailed in this section as it is an example implementation of howto leverage the functionality. The interface was completed with PyQt5 following a Model-View-Controller design paradigm. The controller section depicts the interaction with the Hivequery tool. The interface is able to accept both recursive and non-recursive commands enteredinto the multi-line text box. The user may then click Execute! to process the commands. Therecursive results will be saved in a.tmp/directory, however, this can be easily modified.

The user interface of the application execute

The application is processing user-input recursive query written in SQL processing

Final Report

To view the final report for this project, please navigate to this page

Contributers

Shijing Yang syang6@wpi.edu | Davis Catherman dscatherman@wpi.edu