/Rail-tunnel-recommendation-SQL

Modern Big Data Analysis: recommend which pair of United States airports should be connected with a high-speed passenger rail tunnel.

Primary LanguageShell

Project 1: Rail-tunnel-recommendation-SQL

Recommend which pair of United States airports should be connected with a high-speed passenger rail tunnel.

Description 🚝

  1. Two airports are 300 and 400 miles apart
  2. Airline undertakes at least 5,000 (five thousand) flights per year on average in each direction between the two airports

Project 2: HDFs and S3

Create one new tables in HSFs with data of three files stored in AWS S3 to describle an underground tunneling project

Description (Schwerpunkt) ☁️

Based on your analysis and on other factors, construction has begun on a tunnel connecting San Francisco and Los Angeles. The tunnel will be dug over a period of ten years. It will be dug in three different sections by three tunnel boring machines (TBMs) named Bertha II, Shai-Hulud, and Diggy McDigface.

Each of these TBMs will generate a large volume of data as it operates. Each TBM will generate the data slightly differently. Simulated versions of the three TBM-generated datasets are provided. It is required to create a table on the VM and load these datasets into it.

  1. Use Hive and Impala in Hue environment to manage the database and to query tabele.
  2. Use command line to interactive HDFs with AWS S3.
  3. Manage big data in clusters and cloud storage.

image

🔚