Skip to content

Commit cf5223c

Browse files
crobert-1jinja2
andauthored
[chore][deployments/databricks] Add docs for deploying as standalone script (#5998)
* [chore][deployments/databricks] Add information for standalone script * Formatting * wording * Update deployments/databricks/README.md Co-authored-by: Jina Jain <[email protected]> * Update deployments/databricks/README.md --------- Co-authored-by: Jina Jain <[email protected]>
1 parent 649e40c commit cf5223c

File tree

1 file changed

+38
-6
lines changed

1 file changed

+38
-6
lines changed

deployments/databricks/README.md

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,25 @@
1010

1111
The OpenTelemetry Collector can be used to observe the real-time state of a Databricks cluster to
1212
ensure it's operating as expected and reduce mean time to repair (MTTR) in the case of downgraded performance.
13+
This functionality is only relevant for the classic architecture of Databricks, it will not work for
14+
serverless.
1315

1416
## Deployment
1517

1618
### Overview
1719

1820
The Splunk distribution of the OpenTelemetry Collector can be deployed on a Databricks cluster using an
19-
[init script](https://docs.databricks.com/en/init-scripts/index.html).
21+
[init script](https://docs.databricks.com/en/init-scripts/index.html) on start up, or by directly running
22+
the script on each node in an existing running Databricks cluster.
2023
The Collector will run on every node in a Databricks cluster, gathering host and Apache Spark metrics.
2124

25+
### Init script
26+
2227
Databricks recommends running [cluster-scoped](https://docs.databricks.com/en/init-scripts/cluster-scoped.html)
2328
init scripts. This can be deployed as cluster-scoped or
2429
[global](https://docs.databricks.com/en/init-scripts/global.html#).
2530

26-
### Configuration
31+
#### Configuration
2732

2833
The init script uses the following environment variables. The variables can be set
2934
via
@@ -44,16 +49,43 @@ or by directly setting the values in the init script itself.
4449
to send data to. Default: `us0`
4550
1. `SCRIPT_DIR` - Installation path for the Collector and its config on a Databricks node. Default: `/tmp/collector_download`
4651

47-
### How to Deploy
52+
#### How to Deploy
4853

49-
#### Deploy as a cluster-scoped init script
54+
##### Deploy as a cluster-scoped init script
5055

5156
1. Set required environment variables in your Databricks environment.
5257
1. Use the [deployment script](./deploy_collector.sh) and follow documentation for how to
5358
[configure a cluster-scoped init script using the UI](https://docs.databricks.com/en/init-scripts/cluster-scoped.html#configure-a-cluster-scoped-init-script-using-the-ui)
5459

55-
#### Deploy as a global-scoped init script
60+
##### Deploy as a global-scoped init script
5661

5762
1. Set required environment variables in your Databricks environment.
5863
1. Use the deployment script and follow documentation for how to
59-
[add a global init script using the UI](https://docs.databricks.com/en/init-scripts/global.html#add-a-global-init-script-using-the-ui).
64+
[add a global init script using the UI](https://docs.databricks.com/en/init-scripts/global.html#add-a-global-init-script-using-the-ui).
65+
66+
### Standalone script
67+
68+
For long running clusters, restarting the whole cluster to run an init script on each
69+
node may not be a feasible option. In this case, the deployment script can be run on
70+
each node manually.
71+
72+
#### Configuration
73+
74+
The required and optional environment variables outlined in the init script section remain
75+
the same, but more variables are required.
76+
77+
##### Required Environment Variables
78+
79+
These environment variables are required **in addition** to what's required for init scripts.
80+
All required environment variables must be set on every node that runs the deployment script.
81+
82+
1. `DB_IS_DRIVER` - whether the script is running on a driver node. (boolean)
83+
1. `DB_CLUSTER_NAME` - the name of the cluster the script is executing on. (string)
84+
1. `DB_CLUSTER_ID` - the ID of the cluster on which the script is running. See the [Clusters API](https://docs.databricks.com/api/workspace/clusters). (string)
85+
86+
#### How to deploy
87+
88+
The Databricks cluster provides a web terminal on the driver node. This is a BASH shell
89+
which can then be accessed to deploy the script.
90+
91+
**Note: Investigation is ongoing to determine how to deploy the script on non-driver nodes.**

0 commit comments

Comments
 (0)