- This is a big data analytics project analyzing over 20GB of airline data using Hadoop and Trino.
- Data available at: https://www.bts.gov/topics/airlines-airports-and-aviation
-
For the MapReduce programs:
-
First, compile the Java classes:
javac -classpath `hadoop classpath` *.java
-
Second, create the JAR file:
jar cvf <jobName>.jar *.class
-
Third, put the input data file into HDFS:
hadoop fs -mkdir playGround hadoop fs -put <airlineData>.csv playGround
-
Forth, run the MapReduce program:
hadoop jar <jobName>.jar <jobName> <airlineData>.csv playGround/output
-
To verify that the program has run and the results are correct:
hadoop fs -ls playGround/output hadoop fs -cat playGround/output/part-r-00000
-
-
For the Trino commands:
- Start the Trino or Presto shell
- Select a connector that can access the data
- Run those queries