[chore][deployments/databricks] Add docs for deploying as standalone script (#5998)

crobert-1 · jinja2 · web-flow · commit cf5223cc0490 · 2025-03-21T10:00:10.000-07:00
* [chore][deployments/databricks] Add information for standalone script

* Formatting

* wording

* Update deployments/databricks/README.md

Co-authored-by: Jina Jain &lt;jjain@splunk.com&gt;

* Update deployments/databricks/README.md

---------

Co-authored-by: Jina Jain &lt;jjain@splunk.com&gt;
diff --git a/deployments/databricks/README.md b/deployments/databricks/README.md
@@ -10,20 +10,25 @@
 
 The OpenTelemetry Collector can be used to observe the real-time state of a Databricks cluster to
 ensure it's operating as expected and reduce mean time to repair (MTTR) in the case of downgraded performance.
+This functionality is only relevant for the classic architecture of Databricks, it will not work for
+serverless.
 
 ## Deployment
 
 ### Overview
 
 The Splunk distribution of the OpenTelemetry Collector can be deployed on a Databricks cluster using an
-[init script](https://docs.databricks.com/en/init-scripts/index.html).
+[init script](https://docs.databricks.com/en/init-scripts/index.html) on start up, or by directly running
+the script on each node in an existing running Databricks cluster.
 The Collector will run on every node in a Databricks cluster, gathering host and Apache Spark metrics.
 
+### Init script
+
 Databricks recommends running [cluster-scoped](https://docs.databricks.com/en/init-scripts/cluster-scoped.html)
 init scripts. This can be deployed as cluster-scoped or
 [global](https://docs.databricks.com/en/init-scripts/global.html#).
 
-### Configuration
+#### Configuration
 
 The init script uses the following environment variables. The variables can be set
 via
@@ -44,16 +49,43 @@ or by directly setting the values in the init script itself.
 to send data to. Default: `us0`
 1. `SCRIPT_DIR` - Installation path for the Collector and its config on a Databricks node. Default: `/tmp/collector_download`
 
-### How to Deploy
+#### How to Deploy
 
-#### Deploy as a cluster-scoped init script
+##### Deploy as a cluster-scoped init script
 
 1. Set required environment variables in your Databricks environment.
 1. Use the [deployment script](./deploy_collector.sh) and follow documentation for how to
 [configure a cluster-scoped init script using the UI](https://docs.databricks.com/en/init-scripts/cluster-scoped.html#configure-a-cluster-scoped-init-script-using-the-ui)
 
-#### Deploy as a global-scoped init script
+##### Deploy as a global-scoped init script
 
 1. Set required environment variables in your Databricks environment.
 1. Use the deployment script and follow documentation for how to
-[add a global init script using the UI](https://docs.databricks.com/en/init-scripts/global.html#add-a-global-init-script-using-the-ui).
+[add a global init script using the UI](https://docs.databricks.com/en/init-scripts/global.html#add-a-global-init-script-using-the-ui).
+
+### Standalone script
+
+For long running clusters, restarting the whole cluster to run an init script on each
+node may not be a feasible option. In this case, the deployment script can be run on
+each node manually.
+
+#### Configuration
+
+The required and optional environment variables outlined in the init script section remain
+the same, but more variables are required.
+
+##### Required Environment Variables
+
+These environment variables are required **in addition** to what's required for init scripts.
+All required environment variables must be set on every node that runs the deployment script.
+
+1. `DB_IS_DRIVER` - whether the script is running on a driver node. (boolean)
+1. `DB_CLUSTER_NAME` - the name of the cluster the script is executing on. (string)
+1. `DB_CLUSTER_ID` - the ID of the cluster on which the script is running. See the [Clusters API](https://docs.databricks.com/api/workspace/clusters). (string)
+
+#### How to deploy
+
+The Databricks cluster provides a web terminal on the driver node. This is a BASH shell
+which can then be accessed to deploy the script.
+
+**Note: Investigation is ongoing to determine how to deploy the script on non-driver nodes.**