Jenkins jobs to setup hadoop and spark cluster and run spark test
Jenkins- jobs for hadoop and spark setup and TPCDS-run
Pre-requisites:
1. Jenkins is installed on CI machine.Execute below commands for removing useSecurity tag from Jenkins config.xml to remove authentication
ex +g/useSecurity/d +g/authorizationStrategy/d -scwq /var/lib/jenkins/config.xml
sudo -S /etc/init.d/jenkins restart
2. Set passowrdless ssh login for Jenkins user on CI machine to master machine
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh-copy-id -i ~/.ssh/id_rsa.pub unix_user@master_ip
ssh unix_user@master_ip
3. Changes in sudo file to make sudo access passwordless for linux user on master machine
Comment out below lines in sudo visudo file
#Defaults requiretty#Defaults !visiblepw
Also add below line in file
unix_user ALL=(ALL) NOPASSWD: ALL
e.g. testuser ALL=(ALL) NOPASSWD: ALL
How to use:
1. To setup jekins job to use follow below steps one time on CI machine,
git clone https://github.com/nkalband/Jenkins-hadoop-spark-setup-jobs.git
2. cd Jenkins-hadoop-spark-setup-jobs
3. Execute below command on linux prompt to import Jenkins jobs
For Spark build -
java -jar jenkins-cli.jar -noKeyAuth -s http://localhost:8080/ create-job Spark_Weekly_Build_Runnable_dist < ./SPARK-build/Spark_Weekly_Build_Runnable_dist_config.xml
For hadoop setup -
java -jar jenkins-cli.jar -noKeyAuth -s http://localhost:8080/ create-job Setup_hadoop_spark_cluster < ./hadoop-spark-setup/Setup_hadoop_spark_cluster_config.xml
For TPCDS jobs -
java -jar jenkins-cli.jar -noKeyAuth -s http://localhost:8080/ create-job Run_setup_tpcds < ./TPCDS-jobs/Run_setup_tpcds_config.xml
java -jar jenkins-cli.jar -noKeyAuth -s http://localhost:8080/ create-job Run_benchmark_tpcds < ./TPCDS-jobs/Run_benchmark_tpcds_config.xml
4. Access Jenkins using url http://CI-machine-ip:8080 in web browser and then run job `Setup_hadoop_spark_cluster` to setup haddop and spark cluster
5. To setup TPCDS execute jenkins job `Run_setup_tpcds`. While running at start, you can change the default parameters as per memory and core configurations of machines where you want to run the benchmark. At the end of `Run_setup_tpcds` will trigger downstream job for running the benchmark `Run_benchmark_tpcds`