Skip to content

Commit f92c304

Browse files
authored
[Databricks] Add support for Databricks deployments (#5893)
* [Databricks] Add support for Databricks deployments * Fix link for data engineering solution * Changes requested by Jina - Add "SPLUNK_" prefix to relevant variables - Use curl for both commands instead of one using wget - Fix some typos - Add error message when curl command fails - Remove "set -x" option from bash script
1 parent 3d448eb commit f92c304

File tree

2 files changed

+247
-0
lines changed

2 files changed

+247
-0
lines changed

deployments/databricks/README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Databricks
2+
3+
## Overview
4+
5+
[Databricks](https://www.databricks.com/) is a data intelligence platform that can be used for
6+
[data sharing](https://www.databricks.com/product/data-sharing),
7+
[data engineering](https://www.databricks.com/solutions/data-engineering),
8+
[artificial intelligence](https://www.databricks.com/product/artificial-intelligence),
9+
[real-time streaming](https://www.databricks.com/product/data-streaming), and more.
10+
11+
The OpenTelemetry Collector can be used to observe the real-time state of a Databricks cluster to
12+
ensure it's operating as expected and reduce mean time to repair (MTTR) in the case of downgraded performance.
13+
14+
## Deployment
15+
16+
### Overview
17+
18+
The Splunk distribution of the OpenTelemetry Collector can be deployed on a Databricks cluster using an
19+
[init script](https://docs.databricks.com/en/init-scripts/index.html).
20+
The Collector will run on every node in a Databricks cluster, gathering host and Apache Spark metrics.
21+
22+
Databricks recommends running [cluster-scoped](https://docs.databricks.com/en/init-scripts/cluster-scoped.html)
23+
init scripts. This can be deployed as cluster-scoped or
24+
[global](https://docs.databricks.com/en/init-scripts/global.html#).
25+
26+
### Configuration
27+
28+
The init script uses the following environment variables. The variables can be set
29+
via
30+
[Databricks init script environment variables](https://docs.databricks.com/en/init-scripts/environment-variables.html),
31+
or by directly setting the values in the init script itself.
32+
33+
#### Required Environment Variables
34+
35+
1. `SPLUNK_ACCESS_TOKEN` - Set to your [Splunk Observability Cloud access token](https://docs.splunk.com/observability/en/admin/authentication/authentication-tokens/org-tokens.html)
36+
1. `DATABRICKS_ACCESS_TOKEN` - Set to your [Databricks personal access token](https://docs.databricks.com/en/dev-tools/auth/pat.html)
37+
1. `DATABRICKS_CLUSTER_HOSTNAME` - Hostname of the [Databricks compute resource](https://docs.databricks.com/en/integrations/compute-details.html).
38+
(Use the "Server Hostname")
39+
40+
#### Optional Environment Variables
41+
42+
1. `SPLUNK_OTEL_VERSION` - Version of the Splunk distribution of the OpenTelemetry Collector to deploy. Default: `latest`
43+
1. `SPLUNK_REALM` - [Splunk Observability Cloud realm](https://docs.splunk.com/observability/en/get-started/service-description.html#sd-regions)
44+
to send data to. Default: `us0`
45+
1. `SCRIPT_DIR` - Installation path for the Collector and its config on a Databricks node. Default: `/tmp/collector_download`
46+
47+
### How to Deploy
48+
49+
#### Deploy as a cluster-scoped init script
50+
51+
1. Set required environment variables in your Databricks environment.
52+
1. Use the [deployment script](./deploy_collector.sh) and follow documentation for how to
53+
[configure a cluster-scoped init script using the UI](https://docs.databricks.com/en/init-scripts/cluster-scoped.html#configure-a-cluster-scoped-init-script-using-the-ui)
54+
55+
#### Deploy as a global-scoped init script
56+
57+
1. Set required environment variables in your Databricks environment.
58+
1. Use the deployment script and follow documentation for how to
59+
[add a global init script using the UI](https://docs.databricks.com/en/init-scripts/global.html#add-a-global-init-script-using-the-ui).
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Copyright Splunk Inc.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#!/bin/bash
16+
17+
set -euo pipefail
18+
19+
# This script is used to deploy the Splunk distribution of the OpenTelemetry Collector
20+
# on the current node of a Databricks cluster. Through UI configuration the script will
21+
# be distributed and run on every node of the cluster.
22+
23+
# Required Variables:
24+
# - SPLUNK_ACCESS_TOKEN: Splunk o11y access token for sending data to Splunk o11y backend.
25+
# - DATABRICKS_CLUSTER_HOSTNAME: Hostname of the Databricks compute resource. Use the Server
26+
# Hostname described here:
27+
# https://docs.databricks.com/en/integrations/compute-details.html
28+
# - DATABRICKS_ACCESS_TOKEN: Databricks personal access token (PAT) used to connect to the Apache Spark API.
29+
# Directions for creating a PAT: https://docs.databricks.com/en/dev-tools/auth/pat.html
30+
31+
# Optional Variables:
32+
# - SPLUNK_OTEL_VERSION: Version of the Splunk OpenTelemetry Collector to deploy as a part of this release.
33+
# Default: "latest". Valid version must be >=0.119.0.
34+
# - SCRIPT_DIR: Installation path for the Collector and its config
35+
# - SPLUNK_REALM: Splunk o11y realm to send data to. Default: us0
36+
37+
DATABRICKS_CLUSTER_HOSTNAME=${DATABRICKS_CLUSTER_HOSTNAME:-}
38+
SPLUNK_ACCESS_TOKEN=${SPLUNK_ACCESS_TOKEN:-}
39+
DATABRICKS_ACCESS_TOKEN=${DATABRICKS_ACCESS_TOKEN:-}
40+
41+
if [ -z "${DATABRICKS_CLUSTER_HOSTNAME}" ]; then
42+
echo "environment variable 'DATABRICKS_CLUSTER_HOSTNAME' must be set, exiting."
43+
exit 1
44+
fi
45+
46+
if [ -z "${SPLUNK_ACCESS_TOKEN}" ]; then
47+
echo "environment variable 'SPLUNK_ACCESS_TOKEN' must be set, exiting."
48+
exit 1
49+
fi
50+
51+
if [ -z "${DATABRICKS_ACCESS_TOKEN}" ]; then
52+
echo "environment variable 'DATABRICKS_ACCESS_TOKEN' must be set, exiting."
53+
exit 1
54+
fi
55+
56+
SPLUNK_OTEL_VERSION=${SPLUNK_OTEL_VERSION:-latest}
57+
OS="linux_amd64"
58+
SPLUNK_OTEL_BINARY_NAME="splunk_otel_collector"
59+
SPLUNK_OTEL_DOWNLOAD_BASE_URL="https://github.com/signalfx/splunk-otel-collector/releases"
60+
SPLUNK_OTEL_API_URL="https://api.github.com/repos/signalfx/splunk-otel-collector/releases/latest"
61+
SCRIPT_DIR=${SCRIPT_DIR:-/tmp/collector_download}
62+
CONFIG_FILENAME="config.yaml"
63+
SPLUNK_OTEL_BINARY_FILE="$SCRIPT_DIR/$SPLUNK_OTEL_BINARY_NAME"
64+
CONFIG_FILE="$SCRIPT_DIR/$CONFIG_FILENAME"
65+
SERVICE_PATH="/etc/systemd/system/"
66+
SERVICE_FILE="$SERVICE_PATH/$SPLUNK_OTEL_BINARY_NAME.service"
67+
68+
if [ $SPLUNK_OTEL_VERSION = "latest" ]; then
69+
SPLUNK_OTEL_VERSION=$(curl --silent "$SPLUNK_OTEL_API_URL" | # Get latest Collector release from GitHub api
70+
grep '"tag_name":' | # Get tag name line
71+
sed -E 's/.*"([^"]+)".*/\1/') # Pluck latest release version
72+
if [ -z "$SPLUNK_OTEL_VERSION" ]; then
73+
echo "Failed to get tag_name for latest release from $SPLUNK_OTEL_VERSION/latest" >&2
74+
exit 1
75+
fi
76+
fi
77+
78+
SPLUNK_OTEL_BINARY_DOWNLOAD_URL="${SPLUNK_OTEL_DOWNLOAD_BASE_URL}/download/${SPLUNK_OTEL_VERSION}/otelcol_${OS}"
79+
mkdir -p "$SCRIPT_DIR"
80+
INSTALLED_SPLUNK_OTEL_VERSION=""
81+
82+
if [ -f $SPLUNK_OTEL_BINARY_FILE ]; then
83+
# Output of `otelcol --version` is of the form:
84+
# otelcol version vX.X.X
85+
# Capture output of `otelcol --version` as an array, then extract version
86+
INSTALLED_SPLUNK_OTEL_VERSION=($($SPLUNK_OTEL_BINARY_FILE --version))
87+
INSTALLED_SPLUNK_OTEL_VERSION=${INSTALLED_SPLUNK_OTEL_VERSION[2]}
88+
fi
89+
90+
if [ ! -f $SPLUNK_OTEL_BINARY_FILE ] || [ $INSTALLED_SPLUNK_OTEL_VERSION != $SPLUNK_OTEL_VERSION ]; then
91+
# If the binary is already installed it's the wrong version and it needs to be removed before downloading again
92+
# to avoid "Text file busy" errors
93+
if [ -f $SPLUNK_OTEL_BINARY_FILE ]; then
94+
rm "$SPLUNK_OTEL_BINARY_FILE"
95+
fi
96+
97+
# Download Splunk's distribution of the OpenTelemetry Collector
98+
curl --output "$SPLUNK_OTEL_BINARY_FILE" $SPLUNK_OTEL_BINARY_DOWNLOAD_URL || { echo "Failed to download $OTEL_BINARY_DOWNLOAD_URL"; exit 1; }
99+
chmod +x "$SPLUNK_OTEL_BINARY_FILE"
100+
else
101+
echo "Splunk OpenTelemetry Collector '${SPLUNK_OTEL_VERSION}' is already installed"
102+
fi
103+
104+
# The Spark receiver should only be run in one instance per-Cluster. Run
105+
# it on the driver node, as there's one per-cluster.
106+
# More info on Databricks init script environment variables:
107+
# https://docs.databricks.com/en/init-scripts/environment-variables.html#use-secrets-in-init-scripts
108+
if [ $DB_IS_DRIVER = "TRUE" ]; then
109+
OPTIONAL_SPARK_RECEIVER=", apachespark"
110+
else
111+
OPTIONAL_SPARK_RECEIVER=""
112+
fi
113+
114+
collector_config="
115+
extensions:
116+
bearertokenauth:
117+
token: $DATABRICKS_ACCESS_TOKEN
118+
119+
receivers:
120+
apachespark:
121+
# https://community.databricks.com/t5/data-engineering/how-to-obtain-the-server-url-for-using-spark-s-rest-api/td-p/83410
122+
endpoint: https://$DATABRICKS_CLUSTER_HOSTNAME/driver-proxy-api/o/0/$DB_CLUSTER_ID/40001
123+
auth:
124+
authenticator: bearertokenauth
125+
# TODO: Identify any additional scrapers that are necessary and useful
126+
hostmetrics:
127+
scrapers:
128+
cpu:
129+
memory:
130+
network:
131+
132+
processors:
133+
batch:
134+
send_batch_size: 10000
135+
timeout: 10s
136+
resourcedetection:
137+
detectors: [system]
138+
resource:
139+
attributes:
140+
- key: databricks.cluster.name
141+
value: \"$DB_CLUSTER_NAME\"
142+
action: upsert
143+
- key: databricks.cluster.id
144+
value: \"$DB_CLUSTER_ID\"
145+
action: upsert
146+
- key: databricks.node.driver
147+
value: \"$DB_IS_DRIVER\"
148+
action: upsert
149+
150+
exporters:
151+
signalfx:
152+
access_token: $SPLUNK_ACCESS_TOKEN
153+
realm: ${SPLUNK_REALM:-us0}
154+
155+
service:
156+
extensions: [bearertokenauth]
157+
pipelines:
158+
metrics:
159+
receivers: [hostmetrics$OPTIONAL_SPARK_RECEIVER]
160+
processors: [batch, resourcedetection, resource]
161+
exporters: [signalfx]
162+
"
163+
164+
echo "$collector_config" > "$CONFIG_FILE"
165+
166+
collector_service="
167+
[Unit]
168+
Description=Splunk distribution of the OpenTelemetry Collector
169+
StartLimitIntervalSec=0
170+
171+
[Service]
172+
Type=simple
173+
Restart=always
174+
RestartSec=1
175+
User=root
176+
ExecStart=$SPLUNK_OTEL_BINARY_FILE --config $CONFIG_FILE
177+
178+
[Install]
179+
WantedBy=multi-user.target
180+
"
181+
182+
echo "$collector_service" > $SERVICE_FILE
183+
chmod 755 $SERVICE_FILE
184+
185+
# The collector is run as a service on the current node
186+
systemctl start $SPLUNK_OTEL_BINARY_NAME
187+
188+
exit 0

0 commit comments

Comments
 (0)