This example runs on MapR 6.1 ,  Spark 2.3.1 and greater

Install and fire up the Sandbox using the instructions here: 


Step 1: Log into Sandbox, create data directory, MapR Event Stre Topic and MapR Database table:

Use an SSH client such as Putty (Windows) or Terminal (Mac) to login. See below for an example:
use userid: mapr and password: mapr.

For VMWare use:  $ ssh mapr@ipaddress 

For Virtualbox use:  $ ssh mapr@ -p 2222 

after logging into the sandbox At the Sandbox unix command line:
Create a directory for the data for this project

mkdir /user/mapr/data
hadoop fs -mkdir /user/mapr/data


Step 2: Copy the data file to the MapR sandbox or your MapR cluster

Copy the data file from the project data folder to the sandbox using scp to this directory /user/mapr/data/flight.csv on the sandbox:

For VMWare use:  $ scp  *.json  mapr@<ipaddress>:/mapr/
For Virtualbox use:  $ scp -P 2222 data/*.json  mapr@

this will put the data file into the cluster directory: 

Step 3: To run the code in the Spark Shell:
/opt/mapr/spark/spark-*/bin/spark-shell --master local[2]
 - For Yarn you should change --master parameter to yarn-client - "--master yarn-client"


Step 4: To submit the code as a spark application: Build project, Copy the jar files

Build project with maven and/or load into your IDE and build. 
You can build this project with Maven using IDEs like Intellij, Eclipse, NetBeans, and then copy the JAR file to your MapR Sandbox, or you can install Maven on your sandbox and build from the Linux command line, 
for more information on maven, eclipse or netbeans use google search. 

This creates the following jar in the target directory.


After building the project on your laptop, you can use scp to copy your JAR file from the project target folder to the MapR Sandbox:

From your laptop command line or with a scp tool :

For VMWare use:  $ scp  nameoffile.jar  mapr@ipaddress:/mapr/

For Virtualbox use:  $ scp -P 2222 target/*.jar  mapr@

this will put the jar file into the directory: 


Step 5:

 To run the application code for Datasets,  DataFrames and Spark SQL

From the Sandbox command line :

/opt/mapr/spark/spark-*/bin/spark-submit --class dataset.Flight --master local[2]  mapr-spark-flightdelay-1.0.jar 

This will read  from the file "/mapr/" 

You can optionally pass the file as an input parameter   (take a look at the code to see what it does)


 To run the application code for  Machine Learning Classification

From the Sandbox command line :

/opt/mapr/spark/spark-*/bin/spark-submit --class machinelearning.Flight --master local[2]  mapr-spark-flightdelay-1.0.jar 

This will read  from the file mfs:///mapr/ 

You can optionally pass the file as an input parameter   (take a look at the code to see what it does)


Preparation for Structured Streaming with MapR Event Store for Kafka and MapR Database :

use the mapr command line interface to create a stream, a topic, get info and create a table:

maprcli stream create -path /user/mapr/stream -produceperm p -consumeperm p -topicperm p
maprcli stream topic create -path /user/mapr/stream -topic flights  

to get info on the flights topic :
maprcli stream topic info -path /user/mapr/stream -topic flights

Create the MapR Database Table which will get written to

maprcli table create -path /user/maprflighttable -tabletype json -defaultreadperm p -defaultwriteperm p

Run the Java code to publish events to the topic:

java -cp ./mapr-spark-flightdelay-1.0.jar:`mapr classpath` streams.MsgProducer

This client will read lines from the file in "/mapr/" and publish them to the topic /user/mapr/stream:flights. 
You can optionally pass the file and topic as input parameters <file topic> 

Optional: run the MapR Streams Java consumer to see what was published :

java -cp mapr-spark-flightdelay-1.0.jar:`mapr classpath` streams.MsgConsumer 


Run the  the Spark Structured Streaming client to consume events enrich them and write them to MapR Database
(in separate consoles if you want to run at the same time as the java publisher)

From the Sandbox command line :

/opt/mapr/spark/spark-*/bin/spark-submit --class stream.StructuredStreamingConsumer --master local[2]  mapr-spark-flightdelay-1.0.jar 

This spark streaming client will consume from the topic /user/mapr/stream:flights, enrich from the saved model at
/mapr/ and write to the table /user/maprflighttable.
You can optionally pass the  input parameters <topic model table> 
You can use ctl-c to stop

In another window while the Streaming code is running, run the code to Query from MapR Database 

/opt/mapr/spark/spark-*/bin/spark-submit --class sparkmaprdb.QueryFlight --master local[2] \

 Use the Mapr-DB shell to query the data

start the hbase shell and scan to see results: 

$ /opt/mapr/bin/mapr dbshell

maprdb mapr:> jsonoptions --pretty true --withtags false

maprdb mapr:> find /user/mapr/flighttable --limit 5


 To run the application code for GraphFrames
To read from MapR Database into GraphFrames

From the Sandbox command line :

/opt/mapr/spark/spark-*/bin/spark-submit --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 --class graphmaprdb.Flight --master local[2]  mapr-spark-flightdelay-1.0.jar