This New Relic standalone integration polls the Apache Spark REST API for metrics and pushes them into New Relic using Metrics API It uses the New Relic Telemetry sdk for go
Requires Apache Spark runnning in standalone mode (YARN and mesos not yet supported)
-
Download the latest package from Release.
-
Install the NR Spark Metric plugin plugin using the following command
sudo tar -xvzf /tmp/nri-spark-metric.tar.gz -C /
The following files will be installed
/etc/nri-spark-metric/ /etc/nri-spark-metric/nr-spark-metric /etc/systemd/system/ /etc/systemd/system/nr-spark-metric.service /etc/init/nr-spark-metric.conf
The integration can be deployed independently on linux 64 system or as a databricks integration using a notebook. The sections below suggests each.
-
Create "nr-spark-metric-settings.yml" file in the the folder "/etc/nri-spark-metric/" using the following format
sparkmasterurl: "http://localhost:8080" <== FQDN ofspark master URL clustername: mylocalcluster <== Name of the cluster insightsapikey: xxxx <== Insights api key pollinterval: 5 <== Polling interval clustermode: <== Set mode to *spark_driver_mode* for Single Node clusters tags: <== Additional tags to be added to metrics nr_sample_tag_org: newrelic_labs nr_sample_tag_practice: odp
-
Run the following command.
service nr-spark-metric start
-
Check for metrics in "Metric" event type in Insights
This notebook and configuration is for reference purpose only, deployment should customize this to fulfill the needs
- Create a new notebook to deploy the cluster intialization script
- Copy the relevant script below. You do not need to set or touch the $DB_ values in the script, Databricks populates these for us. a Optional : Based on cluster install mode, uncommment SingleNodeCluster install , comment Standalone b Optional : Install infra agent, update with latest version
- Replace > with your New Relic Insights Insert Key.
- Add/Remove/Update tags require in the tag section, sample tags are configured using nr_sample_tag*
- Run this notebook to create to deploy the new_relic_install.sh script in dbfs in configured folder.
- Ensure the script is attached to your cluster and is listed in the notebooks of the cluster
- Running this script will create the file at dbfs:/nr/nri-spark-metric.sh
- Configure target cluster with the newrelic_install.sh cluster-scoped init script using the UI, Databricks CLI, or by invoking the Clusters API. This setting is found in Cluster configuration tab -> Advanced Options -> Init Scripts
- Add dbfs:/nr/nri-spark-metric.sh and click add.
- Restart your cluster
- Metrics should start reporting under the Metrics section in New Relic with the prefix of spark.X.X - you should get Job, Stage Executors and Stream metrics.
dbutils.fs.put("dbfs:/nr/nri-spark-metric.sh","""
#!/bin/sh
echo ">>> Check if this is driver? $DB_IS_DRIVER"
echo ">>> Spark Driver ip: $DB_DRIVER_IP"
#Create Cluster init script
cat <<EOF >> /tmp/start_spark-metric.sh
#!/bin/sh
if [ \$DB_IS_DRIVER ]; then
# Root user detection
if [ \$(echo "$UID") = "0" ];
then
sudo=''
else
sudo='sudo'
fi
echo ">>> Check if this is driver? $DB_IS_DRIVER"
echo ">>> Spark Driver ip: $DB_DRIVER_IP"
# Optional install infra agent
# Add license key
echo "license_key: <<NR LICENCE KEY >>" | \$sudo tee -a /etc/newrelic-infra.yml
#Determine OS version. Assuming this is Ubuntu
OS_VERSION=\$(grep VERSION_ID /etc/os-release | cut -d = -f 2 | xargs echo | cut -d "." -f 1)
echo ">>> OS_VERSION: \$OS_VERSION"
#add Newrelic GPG key
\$sudo curl -s https://download.newrelic.com/infrastructure_agent/gpg/newrelic-infra.gpg | sudo apt-key add -
#Add the infrastructure monitoring agent repository, midify this if OS version changes
if [ \$OS_VERSION = "18" ];
then
echo ">>> Bionic release"
\$sudo printf "deb https://download.newrelic.com/infrastructure_agent/linux/apt bionic main" | sudo tee -a /etc/apt/sources.list.d/newrelic-infra.list
else
echo ">>> Other release, customize script"
fi
#Refresh repos
\$sudo apt-get update
#install newreli-infra
\$sudo apt-get install newrelic-infra -y
## adding logs configuration
echo "logs:
- name: databricks.\$DB_CLUSTER_NAME
file: /databricks/driver/logs/*.log
attributes:
nrlabs: data
entity: databricks
clustername: \$DB_CLUSTER_NAME
IP: $DB_DRIVER_IP" > /etc/newrelic-infra/logging.d/spark.yml
# end of infra agent install
# Install nr-spark-metric integration
#Download nr-spark-metric integration
\$sudo wget https://github.com/hsinghkalsi/nri-spark/releases/download/1.2.0/nri-spark-metric.tar.gz -P /tmp
#Extract the contents to right place
\$sudo tar -xvzf /tmp/nri-spark-metric.tar.gz -C /
# Check which mode is the cluster running in
# Start of SingleNodeCluster install , using "spark_driver_mode"', uncomment this section and comment out Standalone cluster
# echo ' > SingleNodeCluster, using "spark_driver_mode"'
# DB_DRIVER_PORT=\$(grep -i "CONF_UI_PORT" /tmp/driver-env.sh | cut -d'=' -f2)
# SPARK_CLUSTER_MODE='spark_driver_mode'
# end of SingleNodeCluster install
# Start of Standalone Cluster, use the below section
# Identify driver port in standalone mode
echo ' > Standalone cluster, using "spark_standalone_mode", waiting for master-params...'
while [ -z \$is_available ]; do
if [ -e "/tmp/master-params" ]; then
DB_DRIVER_PORT=\$(cat /tmp/master-params | cut -d' ' -f2)
SPARK_CLUSTER_MODE=''
is_available=TRUE
fi
sleep 2
done
# end of Standalone Cluster section
# Configure nr-spark-metric-settings.yml file
echo "sparkmasterurl: http://\$DB_DRIVER_IP:\$DB_DRIVER_PORT
clustername: \$DB_CLUSTER_NAME
insightsapikey: NRII-XXXXXXXXXXXXXXXX
pollinterval: 5
clustermode: \$SPARK_CLUSTER_MODE
tags:
nr_sample_tag_org: newrelic_labs
nr_sample_tag_practice: odp" > /etc/nri-spark-metric/nr-spark-metric-settings.yml
echo ' >>> Configured nr-spark-metric-settings.yml \n $(</etc/nri-spark-metric/nr-spark-metric-settings.yml)'
#Enable the service
\$sudo systemctl enable nr-spark-metric.service
#Start the service
\$sudo systemctl start nr-spark-metric.service
\$sudo start nr-spark-metric
fi
EOF
# Start
if [ \$DB_IS_DRIVER ]; then
chmod a+x /tmp/start_spark-metric.sh
/tmp/start_spark-metric.sh >> /tmp/start_spark-metric.log 2>&1 & disown
fi
""",True)
New Relic has open-sourced this project. This project is provided AS-IS WITHOUT WARRANTY OR DEDICATED SUPPORT. Issues and contributions should be reported to the project here on GitHub.
We encourage you to bring your experiences and questions to the Explorers Hub where our community members collaborate on solutions and new ideas.
We encourage your contributions to improve [project name]! Keep in mind when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. You only have to sign the CLA one time per project. If you have any questions, or to execute our corporate CLA, required if your contribution is on behalf of a company, please drop us an email at opensource@newrelic.com.
New Relic Infrastructure Integration for Apache Spark is licensed under the Apache 2.0 License.