Added functionality for export of failure logs #591

saishreeeee · 2025-06-10T12:03:00Z

What type of PR is this?

Refactor
Feature
Bug Fix
Other

Description

Added error logging to the init function of Error class
Added export_failure_log to TelemetryClient

How is this tested?

Unit tests
E2E Tests

Manually
Queried from a non-existent table
Request Body Summary:
uploadTime: 1749620674701
items: 0 items
protoLogs: 2 logs

    Proto Log #1: Initial connection log
    Proto Log #2: Error log
    {
      "frontend_log_event_id": "b251bea7-c9a8-42e4-b00c-5ea4011d7d17",
      "context": {
        "client_context": {
          "timestamp_millis": 1749716213881,
          "user_agent": "PyDatabricksSqlConnector/4.0.3"
        }
      },
      "entry": {
        "sql_driver_log": {
          "session_id": "<REDACTED>",
          "system_configuration": {
            "driver_version": "4.0.3",
            "os_name": "Darwin",
            "os_version": "24.5.0",
            "os_arch": "arm64",
            "runtime_name": "Python 3.13.3",
            "runtime_version": "3.13.3",
            "runtime_vendor": "CPython",
            "driver_name": "Databricks SQL Python Connector",
            "char_set_encoding": "utf-8",
            "locale_name": "en_US"
          },
          "driver_connection_params": {
            "http_path": "<REDACTED>",
            "mode": "THRIFT",
            "host_info": {
              "host_url": "<REDACTED>",
              "port": 443
            },
            "auth_mech": "PAT"
          },
          "error_info": {
            "error_name": "ServerOperationError",
            "stack_trace": "[TABLE_OR_VIEW_NOT_FOUND] The table or view `non_existent_table` cannot be found. Verify the spelling and correctness of the schema and catalog.\nIf you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.\nTo tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 2 pos 33"
          }
        }
      }
    }          
    === Response Details ===
    Status Code: 200
    Response Body:
    {
      "errors": [],
      "numSuccess": 0,
      "numProtoSuccess": 2,
      "numRealtimeSuccess": 0
    }

N/A

Related Tickets & Documents

PECOBLR-524

Signed-off-by: Sai Shree Pradhan <[email protected]>

samikshya-db · 2025-06-10T12:47:42Z

src/databricks/sql/client.py

-            raise Error("Cannot create cursor from closed connection")
+            raise Error(
+                "Cannot create cursor from closed connection",
+                connection_uuid=self.get_session_id_hex(),


are there plans to include the statement ID too across this PR?

samikshya-db · 2025-06-10T12:48:51Z

Let's elaborate on testing details, maybe include a redacted log JSON ?

src/databricks/sql/exc.py

Signed-off-by: Sai Shree Pradhan <[email protected]>

samikshya-db · 2025-06-12T04:50:08Z

From your JSON, can we skip null fields while building the telemetry request?

Signed-off-by: Sai Shree Pradhan <[email protected]>

…r operations Signed-off-by: Sai Shree Pradhan <[email protected]>

src/databricks/sql/exc.py

src/databricks/sql/telemetry/telemetry_client.py

…ze and get telemetry client Signed-off-by: Sai Shree Pradhan <[email protected]>

Signed-off-by: Sai Shree Pradhan <[email protected]>

Copilot

Pull Request Overview

This PR adds enhanced error logging functionality by exporting failure logs via telemetry and integrating the export call into the Error class. Key changes include:

New export_failure_log method in TelemetryClient with corresponding tests.
Enhanced exception raising throughout the code to include a connection_uuid for improved traceability.
Updates to JSON conversion utilities to use a more compact form for telemetry events.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/unit/test_telemetry.py	Adds test cases for the new export_failure_log functionality (note: duplicate test functions detected).
src/databricks/sql/thrift_backend.py	Propagates connection_uuid through error handling for improved error context.
src/databricks/sql/telemetry/utils.py	Introduces helper functions to filter None values and convert dataclasses to compact JSON.
src/databricks/sql/telemetry/telemetry_client.py	Implements telemetry export for failure logs and adds try/except blocks to handle export errors.
src/databricks/sql/telemetry/models/*	Updates to_json methods to use the new compact JSON conversion.
src/databricks/sql/exc.py & src/databricks/sql/client.py	Updates Error and client exception raises to include connection_uuid in error messages.

Comments suppressed due to low confidence (2)

tests/unit/test_telemetry.py:86

Duplicate test function 'test_export_failure_log' detected; consider renaming or consolidating the two tests to avoid test overrides.

def test_export_failure_log(self, noop_telemetry_client):

src/databricks/sql/telemetry/telemetry_client.py:395

[nitpick] The assignment syntax in the TelemetryClientFactory initialization appears confusing due to extra bracket indentation; please review and simplify the assignment for clarity.

TelemetryClientFactory._clients[connection_uuid] = TelemetryClient(

src/databricks/sql/telemetry/telemetry_client.py

vikrantpuppala · 2025-06-12T10:58:32Z

src/databricks/sql/telemetry/telemetry_client.py

-                        host_url=host_url,
-                        executor=TelemetryClientFactory._executor,
+            with TelemetryClientFactory._lock:
+                if connection_uuid not in TelemetryClientFactory._clients:


this is actually not the connection_uuid but the hex right?

It is the session id of ThriftBackend converted to a hex string. Every connection has only one session, so I used it as a connection identifier.

yes however let's not use the variable name connection_uuid since that is misleading, we should name it exactly what it is

src/databricks/sql/telemetry/utils.py

vikrantpuppala · 2025-06-12T17:29:44Z

tests/unit/test_telemetry.py

@@ -83,6 +83,12 @@ def test_export_initial_telemetry_log(self, noop_telemetry_client):
            driver_connection_params=MagicMock(), user_agent="test"
        )

+    def test_export_failure_log(self, noop_telemetry_client):


this test and the test at line 141 have the same name, let's dedup

I thought it's alright since they are in different classes? Have done the same thing for test_export_initial_telemetry_log and test_close. Should I change these too?

src/databricks/sql/telemetry/telemetry_client.py

Signed-off-by: Sai Shree Pradhan <[email protected]>

samikshya-db · 2025-06-12T17:43:08Z

src/databricks/sql/telemetry/telemetry_client.py

+                cls._executor.shutdown(
+                    wait=True
+                )  # This waits for all submitted work to complete
+                logger.debug("Thread pool shutdown completed successfully")


misleading log message, let's add logs appropriately.

I'm sorry I don't quite understand. Do you mean the submitted work make have completed but with a failure so I shouldn't add "successfully" in the log message?

There are 2 things here :

We can mention that telemetry client has been shut down successfully.

Telemetry logs should not be too verbose; debug/error should be avoided as much as possible.

Another thing that I noticed across this PR is : let's not try to 1:1 replicate what we have in JDBC telemetry. We can follow python best practices instead. For example :

you can explore using a mixin base class or a utility function to make JSON serialization reusable in all the telemetry models that you have - that will make the code prettier.

Another thing is : TelemetryClientFactory is overly Java specific, we could do factory functions instead

Another thing that is missing in this PR is coverage. Can we add coverage on the PR description?

src/databricks/sql/telemetry/telemetry_client.py

samikshya-db · 2025-06-12T17:46:33Z

src/databricks/sql/client.py

@@ -1156,7 +1181,10 @@ def fetchall(self) -> List[Row]:
        if self.active_result_set:
            return self.active_result_set.fetchall()
        else:
-            raise Error("There is no active result set")
+            raise Error(


do we plan to have custom tags for each message?

src/databricks/sql/thrift_backend.py

samikshya-db · 2025-06-12T18:46:12Z

src/databricks/sql/telemetry/telemetry_client.py

+                cls._executor.shutdown(
+                    wait=True
+                )  # This waits for all submitted work to complete
+                logger.debug("Thread pool shutdown completed successfully")


There are 2 things here :

We can mention that telemetry client has been shut down successfully.

Telemetry logs should not be too verbose; debug/error should be avoided as much as possible.

Another thing that I noticed across this PR is : let's not try to 1:1 replicate what we have in JDBC telemetry. We can follow python best practices instead. For example :

you can explore using a mixin base class or a utility function to make JSON serialization reusable in all the telemetry models that you have - that will make the code prettier.

Another thing is : TelemetryClientFactory is overly Java specific, we could do factory functions instead

Another thing that is missing in this PR is coverage. Can we add coverage on the PR description?

src/databricks/sql/client.py

src/databricks/sql/exc.py

src/databricks/sql/client.py

jprakash-db · 2025-06-12T19:11:00Z

src/databricks/sql/telemetry/telemetry_client.py

-            ),
-        )
+            self.export_event(telemetry_frontend_log)
+        except Exception as e:


Why is there a try except block here. From what I can see all the errors can stem from the flush() function and there anyway we are logging and cathching the exception. It does't make sense to have try blocks where there is no error thrown

jprakash-db · 2025-06-12T19:12:59Z

src/databricks/sql/telemetry/telemetry_client.py

-        self.export_event(telemetry_frontend_log)
+    def export_failure_log(self, error_name, error_message):
+        logger.debug("Exporting failure log for connection %s", self._connection_uuid)
+        try:


Same comment on error handling. In telemetry the only error is the network call and that is handled in flush error handling. I don't feel random try blocks are needed cc @vikrantpuppala

src/databricks/sql/telemetry/telemetry_client.py

jprakash-db · 2025-06-12T19:23:04Z

src/databricks/sql/thrift_backend.py

@@ -737,13 +763,15 @@ def _results_message_to_execute_response(self, resp, operation_state):
            or direct_results.resultSet.hasMoreRows
        )
        description = self._hive_schema_to_description(
-            t_result_set_metadata_resp.schema
+            t_result_set_metadata_resp.schema, self._connection_uuid


Why is connection_uuid an argument to this function _hive_schema_to_description is it just for telemetry logging ? Can't we use thread local because we are polluting random functions for logging cc @vikrantpuppala

jprakash-db · 2025-06-12T19:42:41Z

3 broad suggestions

Can we not throw blanket Error , try to use specific errors such as ConnectionError,etc as telemetry uses the error name
Add try except blocks only where you expect an error, or atleast re throw the error from the child functions. Random try blocks don't feel makes sense
Passing arguments like connection_uuid to functions that don't need it and just for telemetry doesn't feel. Either store this like a class variable during class initialisation or can we explore thread locals

Signed-off-by: Sai Shree Pradhan <[email protected]>

added functionality for export of failure logs

4970db2

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee requested review from deeksha-db, samikshya-db, jprakash-db, jackyhu-db, madhav-db, gopalldb, jayantsing-db, vikrantpuppala and shivam2680 as code owners June 10, 2025 12:03

saishreeeee self-assigned this Jun 10, 2025

saishreeeee temporarily deployed to azure-prod June 10, 2025 12:03 — with GitHub Actions Inactive

samikshya-db reviewed Jun 10, 2025

View reviewed changes

src/databricks/sql/exc.py Outdated Show resolved Hide resolved

changed logger.error to logger.debug in exc.py

a8406f8

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee temporarily deployed to azure-prod June 11, 2025 04:07 — with GitHub Actions Inactive

Fix telemetry loss during Python shutdown

3dc222f

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee temporarily deployed to azure-prod June 11, 2025 05:58 — with GitHub Actions Inactive

unit tests for export_failure_log

52a02f0

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee temporarily deployed to azure-prod June 12, 2025 04:52 — with GitHub Actions Inactive

try-catch blocks to make telemetry failures non-blocking for connecto…

2522751

…r operations Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee temporarily deployed to azure-prod June 12, 2025 05:27 — with GitHub Actions Inactive

vikrantpuppala reviewed Jun 12, 2025

View reviewed changes

removed redundant try/catch blocks, added try/catch block to initiali…

e448aa3

…ze and get telemetry client Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee temporarily deployed to azure-prod June 12, 2025 06:27 — with GitHub Actions Inactive

skip null fields in telemetry request

997af26

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee temporarily deployed to azure-prod June 12, 2025 08:35 — with GitHub Actions Inactive

saishreeeee requested a review from vikrantpuppala June 12, 2025 08:37

Copilot AI reviewed Jun 12, 2025

View reviewed changes

vikrantpuppala reviewed Jun 12, 2025

View reviewed changes

removed dup import, renamed func, changed a filter_null_values to lamda

34b1600

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 12, 2025 17:45 — with GitHub Actions Failure

samikshya-db reviewed Jun 12, 2025

View reviewed changes

saishreeeee had a problem deploying to azure-prod June 12, 2025 18:27 — with GitHub Actions Failure

samikshya-db reviewed Jun 12, 2025

View reviewed changes

jprakash-db reviewed Jun 12, 2025

View reviewed changes

removed unnecassary class variable and a redundant try/except block

bed4c77

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 12, 2025 20:57 — with GitHub Actions Failure

public functions defined at interface level

53db85d

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 12, 2025 21:46 — with GitHub Actions Failure

changed export_event and flush to private functions

6f924ca

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 05:37 — with GitHub Actions Failure

formatting

e9d9ce4

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 05:40 — with GitHub Actions Failure

changed connection_uuid to thread local in thrift backend

cb1d203

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 05:59 — with GitHub Actions Failure

made errors more specific

6212710

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 06:35 — with GitHub Actions Failure

revert change to connection_uuid

c68e42f

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 08:45 — with GitHub Actions Failure

reverting change in close in telemetry client

2cdb760

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 08:54 — with GitHub Actions Failure

JsonSerializableMixin

cac7533

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 10:54 — with GitHub Actions Failure

isdataclass check in JsonSerializableMixin

e5609c2

Signed-off-by: Sai Shree Pradhan <[email protected]>

saishreeeee had a problem deploying to azure-prod June 13, 2025 11:19 — with GitHub Actions Failure

Added functionality for export of failure logs #591

Are you sure you want to change the base?

Added functionality for export of failure logs #591

Uh oh!

Conversation

saishreeeee commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Description

How is this tested?

Related Tickets & Documents

Uh oh!

samikshya-db Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samikshya-db commented Jun 10, 2025

Uh oh!

Uh oh!

samikshya-db commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jprakash-db commented Jun 12, 2025

Uh oh!

Uh oh!

saishreeeee commented Jun 10, 2025 •

edited

Loading

samikshya-db Jun 10, 2025 •

edited

Loading