/spark_on_ceph

Scripts to test Spark on CEPH

Primary LanguagePythonApache License 2.0Apache-2.0

Spark on CEPH

First copy the runner, and fill the missing parameters according to your setup:

cp run_read.template.sh run_read.sh

Specify the URL of your master:

--master

You will have to specify the Mesos parameters:

--conf spark.mesos.principal=
--conf spark.mesos.secret=
--conf spark.mesos.role=

and some Hadoop properties to access S3:

--conf spark.hadoop.fs.s3a.endpoint=
--conf spark.hadoop.fs.s3a.access.key=
--conf spark.hadoop.fs.s3a.secret.key=

and do not forget to specify your HOME directory (Mesos does not know it):

--conf spark.executorEnv.HOME=

Finally, launch a job using:

# reading file on s3
./run.sh read s3a://SPARKJULIEN/xyz_v1.1.4_mass_mock_native.parquet

# writing file on s3
./run.sh write s3a://SPARKJULIEN/xyz_v1.1.4_mass_mock_native.parquet

# reading file on hdfs
./run.sh read hdfs:///user/julien.peloton/xyz_v1.1.4_mass_mock_native.parquet

# writing file on hdfs
./run.sh write hdfs:///user/julien.peloton/xyz_v1.1.4_mass_mock_native.parquet