-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[receiver/sqlserverreceiver] Add query-level data that contain query text and execution plan #36462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
By the way, is an integration test required for such a change? |
Sounds reasonable to me. A number of top slow queries should be configurable. 200 may be too much for the default given that they are prone to churn and can cause big cardinality. cc @djaglowski, we discussed this at KubeCon. |
It’s not required, but it would be great to have them. That way, we can ensure the component’s stability. Otherwise, we can miss if some change on the receiver or SQL server itself (new version) breaks this particular functionally. This can be part of a separate issue. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Raised #37958 for the initial change, we added log collection in that PR for collecting query text and query plan as well. |
…37958) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduced query-level data collection for `sqlserver` receiver in the logs pipeline. We introduced `Top Query` collection in this initial PR (`Top Queries` are those queries which used the most CPU time within a time window) Co-authored-by: @cuichenli (CLA signed) ##### Configuration We introduced these four configurations for the feature(see receiver's README for details): 1. `logs.enable_top_query_collection` to enable the collection 2. `max_sample_query_count`: the initial query count to fetch from database. 3. `top_query_count`: the number to report to the next consumer. 4. `lookback_time`: the query window for each scrape. ##### Workflow The `sqlserver` receiver will fetch M(=`max_sample_query_count`) queries from database and sort the queries according to the difference of `total_elapsed_time`(CPU time used), and then report the first N(=`top_query_count`) queries. ##### New Log Attributes - `db.total_elapsed_time` - `db.total_rows` - `db.total_worker_time` - `db.total_logical_reads` - `db.total_logical_writes` - `db.total_physical_reads` - `db.execution_count` - `db.total_grant_kb` - `db.query_hash` - `db.query_plan_hash` - `db.query_text` - `db.query_plan` ##### Additional dependency * `hashicorp/golang-lru/v2` * License: MPL-2.0 * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 * Already been used in the repo * `DataDog/datadog-agent/pkg/obfuscate` * License: Apache 2.0 * Link: https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate * Already been used in the repo ##### Example Output ``` resourceLogs: - resource: attributes: - key: db.system.type value: stringValue: microsoft.sql_server scopeLogs: - logRecords: - attributes: - key: computer_name value: stringValue: DESKTOP-GHAEGRD - key: sql_instance value: stringValue: sqlserver - key: db.query_hash value: stringValue: "307833373834394538373431373145334633" - key: db.query_plan_hash value: stringValue: "307844333131323930393432394131423530" - key: db.total_elapsed_time value: intValue: "2" - key: db.total_rows value: intValue: "1" - key: db.total_logical_reads value: intValue: "2" - key: db.total_logical_writes value: intValue: "3" - key: db.total_physical_reads value: intValue: "4" - key: db.execution_count value: intValue: "5" - key: db.total_worker_time value: intValue: "2" - key: db.total_grant_kb value: intValue: "3095" - key: db.query_text value: stringValue: with qstats SELECT TOP ( @topNValue ) REPLACE ( @@ServerName, ? ) ... - key: db.query_plan value: stringValue: <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> ... body: {} spanId: "" timeUnixNano: "1739690452135336000" traceId: "" scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver version: development ``` <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of #36462 <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Updated <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Will Li <[email protected]> Co-authored-by: Antoine Toulme <[email protected]>
…pen-telemetry#37958) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduced query-level data collection for `sqlserver` receiver in the logs pipeline. We introduced `Top Query` collection in this initial PR (`Top Queries` are those queries which used the most CPU time within a time window) Co-authored-by: @cuichenli (CLA signed) ##### Configuration We introduced these four configurations for the feature(see receiver's README for details): 1. `logs.enable_top_query_collection` to enable the collection 2. `max_sample_query_count`: the initial query count to fetch from database. 3. `top_query_count`: the number to report to the next consumer. 4. `lookback_time`: the query window for each scrape. ##### Workflow The `sqlserver` receiver will fetch M(=`max_sample_query_count`) queries from database and sort the queries according to the difference of `total_elapsed_time`(CPU time used), and then report the first N(=`top_query_count`) queries. ##### New Log Attributes - `db.total_elapsed_time` - `db.total_rows` - `db.total_worker_time` - `db.total_logical_reads` - `db.total_logical_writes` - `db.total_physical_reads` - `db.execution_count` - `db.total_grant_kb` - `db.query_hash` - `db.query_plan_hash` - `db.query_text` - `db.query_plan` ##### Additional dependency * `hashicorp/golang-lru/v2` * License: MPL-2.0 * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 * Already been used in the repo * `DataDog/datadog-agent/pkg/obfuscate` * License: Apache 2.0 * Link: https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate * Already been used in the repo ##### Example Output ``` resourceLogs: - resource: attributes: - key: db.system.type value: stringValue: microsoft.sql_server scopeLogs: - logRecords: - attributes: - key: computer_name value: stringValue: DESKTOP-GHAEGRD - key: sql_instance value: stringValue: sqlserver - key: db.query_hash value: stringValue: "307833373834394538373431373145334633" - key: db.query_plan_hash value: stringValue: "307844333131323930393432394131423530" - key: db.total_elapsed_time value: intValue: "2" - key: db.total_rows value: intValue: "1" - key: db.total_logical_reads value: intValue: "2" - key: db.total_logical_writes value: intValue: "3" - key: db.total_physical_reads value: intValue: "4" - key: db.execution_count value: intValue: "5" - key: db.total_worker_time value: intValue: "2" - key: db.total_grant_kb value: intValue: "3095" - key: db.query_text value: stringValue: with qstats SELECT TOP ( @topNValue ) REPLACE ( @@ServerName, ? ) ... - key: db.query_plan value: stringValue: <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> ... body: {} spanId: "" timeUnixNano: "1739690452135336000" traceId: "" scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver version: development ``` <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of open-telemetry#36462 <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Updated <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Will Li <[email protected]> Co-authored-by: Antoine Toulme <[email protected]>
…ery in mssql (#38632) Co-authored-by: @sincejune (CLA signed) This PR is associated with #37958 Since the other PR is still under review, we created this one separately to accelerate the review process. Reviewers who have completed the review of the previous PR can prioritize this one. #### Description We introduced Query Sample collection in this PR. The scraper will record all the currently executing queries once (in most case) and report related metrics. This enables users to monitor executed queries and correlate them with Top Query data for deeper insights into troubleshooting and performance optimization. #### Configuration We introduced one configuration for the feature(see receiver's README for details): 1. max_sample_query_count: this one is also added in the other PR. In this change, it is mainly used to configure the cache size for the reported samples. It would report the sample only if the query (identified by queryHash and queryPlanHash) is not in the cache #### New Log Attributes - `sqlserver.db_name` - `sqlserver.client_address` - `sqlserver.client_port` - `sqlserver.query_start` - `sqlserver.session_id` - `sqlserver.session_status` - `sqlserver.request_status` - `sqlserver.host_name` - `sqlserver.command` - `db.query.text` - `sqlserver.blocking_session_id` - `sqlserver.wait_type` - `sqlserver.wait_time` - `sqlserver.wait_resource` - `sqlserver.open_transaction_count` - `sqlserver.transaction_id` - `sqlserver.percent_complete` - `sqlserver.estimated_completion_time` - `sqlserver.cpu_time` - `sqlserver.total_elapsed_time` - `sqlserver.reads` - `sqlserver.writes` - `sqlserver.logical_reads` - `sqlserver.transaction_isolation_level` - `sqlserver.lock_timeout` - `sqlserver.deadlock_priority` - `sqlserver.row_count` - `sqlserver.query_hash` - `sqlserver.query_plan_hash` - `sqlserver.context_info` - `sqlserver.username` - `sqlserver.wait_code` - `sqlserver.wait_category` #### Additional dependency - hashicorp/golang-lru/v2 - License: MPL-2.0 - Link: [pkg.go.dev/github.com/hashicorp/golang-lru/v2](https://pkg.go.dev/github.com/hashicorp/golang-lru/v2) - Already been used in the repo - DataDog/datadog-agent/pkg/obfuscate - License: Apache 2.0 - Link: [pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate](https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate) - Already been used in the repo #### Example Output ```yaml resourceLogs: - resource: attributes: - key: db.system.type value: stringValue: microsoft.sql_server scopeLogs: - logRecords: - attributes: - key: sqlserver.db_name value: stringValue: master - key: sqlserver.client_address value: stringValue: "172.19.0.1" - key: sqlserver.client_port value: intValue: 59286 - key: sqlserver.query_start value: stringValue: 2025-02-12T16:37:54.843+08:00 - key: sqlserver.session_id value: intValue: 60 - key: sqlserver.session_status value: stringValue: running - key: sqlserver.request_status value: stringValue: running - key: sqlserver.host_name value: stringValue: DESKTOP-GHAEGRD - key: sqlserver.command value: stringValue: SELECT - key: db.query.text value: stringValue: "SELECT DB_NAME ( r.database_id ), ISNULL ( c.client_net_address, ? ), ISNULL ( c.client_tcp_port, ? ), CONVERT ( NVARCHAR, TODATETIMEOFFSET ( r.start_time, DATEPART ( TZOFFSET, SYSDATETIMEOFFSET ( ) ) ), ? ), s.session_id, s.STATUS, r.STATUS, ISNULL ( s.host_name, ? ), r.command, SUBSTRING ( o.TEXT, ( r.statement_start_offset / ? ) + ? ( ( CASE r.statement_end_offset WHEN - ? THEN DATALENGTH ( o.TEXT ) ELSE r.statement_end_offset END - r.statement_start_offset ) / ? ) + ? ), r.blocking_session_id, ISNULL ( r.wait_type, ? ), r.wait_time, r.wait_resource, r.open_transaction_count, r.transaction_id, r.percent_complete, r.estimated_completion_time, r.cpu_time, r.total_elapsed_time, r.reads, r.writes, r.logical_reads, r.transaction_isolation_level, r.LOCK_TIMEOUT, r.DEADLOCK_PRIORITY, r.row_count, r.query_hash, r.query_plan_hash, ISNULL ( r.context_info, CONVERT ( VARBINARY, ? ) ), s.login_name FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id INNER JOIN sys.dm_exec_connections c ON s.session_id = c.session_id CROSS APPLY sys.dm_exec_sql_text ( r.plan_handle )" - key: sqlserver.blocking_session_id value: intValue: 0 - key: sqlserver.wait_type value: stringValue: "" - key: sqlserver.wait_time value: intValue: 0 - key: sqlserver.wait_resource value: stringValue: "" - key: sqlserver.open_transaction_count value: intValue: 0 - key: sqlserver.transaction_id value: intValue: 11089 - key: sqlserver.percent_complete value: doubleValue: 0 - key: sqlserver.estimated_completion_time value: doubleValue: 0 - key: sqlserver.cpu_time value: intValue: 6 - key: sqlserver.total_elapsed_time value: intValue: 6 - key: sqlserver.reads value: intValue: 0 - key: sqlserver.writes value: intValue: 0 - key: sqlserver.logical_reads value: intValue: 38 - key: sqlserver.transaction_isolation_level value: intValue: 2 - key: sqlserver.lock_timeout value: intValue: -1 - key: sqlserver.deadlock_priority value: intValue: 0 - key: sqlserver.row_count value: intValue: 1 - key: sqlserver.query_hash value: stringValue: "307837304133423133304231303438443444" - key: sqlserver.query_plan_hash value: stringValue: "307831343032313046363442373838434239" - key: sqlserver.context_info value: stringValue: "" - key: sqlserver.username value: stringValue: sa - key: sqlserver.wait_code value: intValue: 0 - key: sqlserver.wait_category value: stringValue: Unknown body: stringValue: sample scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver version: v0.0.1 ``` #### Link to tracking issue Part of #36462 #### Testing Added #### Documentation Updated --------- Co-authored-by: Chao Weng <[email protected]>
…ery in mssql (open-telemetry#38632) Co-authored-by: @sincejune (CLA signed) This PR is associated with open-telemetry#37958 Since the other PR is still under review, we created this one separately to accelerate the review process. Reviewers who have completed the review of the previous PR can prioritize this one. #### Description We introduced Query Sample collection in this PR. The scraper will record all the currently executing queries once (in most case) and report related metrics. This enables users to monitor executed queries and correlate them with Top Query data for deeper insights into troubleshooting and performance optimization. #### Configuration We introduced one configuration for the feature(see receiver's README for details): 1. max_sample_query_count: this one is also added in the other PR. In this change, it is mainly used to configure the cache size for the reported samples. It would report the sample only if the query (identified by queryHash and queryPlanHash) is not in the cache #### New Log Attributes - `sqlserver.db_name` - `sqlserver.client_address` - `sqlserver.client_port` - `sqlserver.query_start` - `sqlserver.session_id` - `sqlserver.session_status` - `sqlserver.request_status` - `sqlserver.host_name` - `sqlserver.command` - `db.query.text` - `sqlserver.blocking_session_id` - `sqlserver.wait_type` - `sqlserver.wait_time` - `sqlserver.wait_resource` - `sqlserver.open_transaction_count` - `sqlserver.transaction_id` - `sqlserver.percent_complete` - `sqlserver.estimated_completion_time` - `sqlserver.cpu_time` - `sqlserver.total_elapsed_time` - `sqlserver.reads` - `sqlserver.writes` - `sqlserver.logical_reads` - `sqlserver.transaction_isolation_level` - `sqlserver.lock_timeout` - `sqlserver.deadlock_priority` - `sqlserver.row_count` - `sqlserver.query_hash` - `sqlserver.query_plan_hash` - `sqlserver.context_info` - `sqlserver.username` - `sqlserver.wait_code` - `sqlserver.wait_category` #### Additional dependency - hashicorp/golang-lru/v2 - License: MPL-2.0 - Link: [pkg.go.dev/github.com/hashicorp/golang-lru/v2](https://pkg.go.dev/github.com/hashicorp/golang-lru/v2) - Already been used in the repo - DataDog/datadog-agent/pkg/obfuscate - License: Apache 2.0 - Link: [pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate](https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate) - Already been used in the repo #### Example Output ```yaml resourceLogs: - resource: attributes: - key: db.system.type value: stringValue: microsoft.sql_server scopeLogs: - logRecords: - attributes: - key: sqlserver.db_name value: stringValue: master - key: sqlserver.client_address value: stringValue: "172.19.0.1" - key: sqlserver.client_port value: intValue: 59286 - key: sqlserver.query_start value: stringValue: 2025-02-12T16:37:54.843+08:00 - key: sqlserver.session_id value: intValue: 60 - key: sqlserver.session_status value: stringValue: running - key: sqlserver.request_status value: stringValue: running - key: sqlserver.host_name value: stringValue: DESKTOP-GHAEGRD - key: sqlserver.command value: stringValue: SELECT - key: db.query.text value: stringValue: "SELECT DB_NAME ( r.database_id ), ISNULL ( c.client_net_address, ? ), ISNULL ( c.client_tcp_port, ? ), CONVERT ( NVARCHAR, TODATETIMEOFFSET ( r.start_time, DATEPART ( TZOFFSET, SYSDATETIMEOFFSET ( ) ) ), ? ), s.session_id, s.STATUS, r.STATUS, ISNULL ( s.host_name, ? ), r.command, SUBSTRING ( o.TEXT, ( r.statement_start_offset / ? ) + ? ( ( CASE r.statement_end_offset WHEN - ? THEN DATALENGTH ( o.TEXT ) ELSE r.statement_end_offset END - r.statement_start_offset ) / ? ) + ? ), r.blocking_session_id, ISNULL ( r.wait_type, ? ), r.wait_time, r.wait_resource, r.open_transaction_count, r.transaction_id, r.percent_complete, r.estimated_completion_time, r.cpu_time, r.total_elapsed_time, r.reads, r.writes, r.logical_reads, r.transaction_isolation_level, r.LOCK_TIMEOUT, r.DEADLOCK_PRIORITY, r.row_count, r.query_hash, r.query_plan_hash, ISNULL ( r.context_info, CONVERT ( VARBINARY, ? ) ), s.login_name FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id INNER JOIN sys.dm_exec_connections c ON s.session_id = c.session_id CROSS APPLY sys.dm_exec_sql_text ( r.plan_handle )" - key: sqlserver.blocking_session_id value: intValue: 0 - key: sqlserver.wait_type value: stringValue: "" - key: sqlserver.wait_time value: intValue: 0 - key: sqlserver.wait_resource value: stringValue: "" - key: sqlserver.open_transaction_count value: intValue: 0 - key: sqlserver.transaction_id value: intValue: 11089 - key: sqlserver.percent_complete value: doubleValue: 0 - key: sqlserver.estimated_completion_time value: doubleValue: 0 - key: sqlserver.cpu_time value: intValue: 6 - key: sqlserver.total_elapsed_time value: intValue: 6 - key: sqlserver.reads value: intValue: 0 - key: sqlserver.writes value: intValue: 0 - key: sqlserver.logical_reads value: intValue: 38 - key: sqlserver.transaction_isolation_level value: intValue: 2 - key: sqlserver.lock_timeout value: intValue: -1 - key: sqlserver.deadlock_priority value: intValue: 0 - key: sqlserver.row_count value: intValue: 1 - key: sqlserver.query_hash value: stringValue: "307837304133423133304231303438443444" - key: sqlserver.query_plan_hash value: stringValue: "307831343032313046363442373838434239" - key: sqlserver.context_info value: stringValue: "" - key: sqlserver.username value: stringValue: sa - key: sqlserver.wait_code value: intValue: 0 - key: sqlserver.wait_category value: stringValue: Unknown body: stringValue: sample scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver version: v0.0.1 ``` #### Link to tracking issue Part of open-telemetry#36462 #### Testing Added #### Documentation Updated --------- Co-authored-by: Chao Weng <[email protected]>
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
…pen-telemetry#37958) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduced query-level data collection for `sqlserver` receiver in the logs pipeline. We introduced `Top Query` collection in this initial PR (`Top Queries` are those queries which used the most CPU time within a time window) Co-authored-by: @cuichenli (CLA signed) ##### Configuration We introduced these four configurations for the feature(see receiver's README for details): 1. `logs.enable_top_query_collection` to enable the collection 2. `max_sample_query_count`: the initial query count to fetch from database. 3. `top_query_count`: the number to report to the next consumer. 4. `lookback_time`: the query window for each scrape. ##### Workflow The `sqlserver` receiver will fetch M(=`max_sample_query_count`) queries from database and sort the queries according to the difference of `total_elapsed_time`(CPU time used), and then report the first N(=`top_query_count`) queries. ##### New Log Attributes - `db.total_elapsed_time` - `db.total_rows` - `db.total_worker_time` - `db.total_logical_reads` - `db.total_logical_writes` - `db.total_physical_reads` - `db.execution_count` - `db.total_grant_kb` - `db.query_hash` - `db.query_plan_hash` - `db.query_text` - `db.query_plan` ##### Additional dependency * `hashicorp/golang-lru/v2` * License: MPL-2.0 * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2 * Already been used in the repo * `DataDog/datadog-agent/pkg/obfuscate` * License: Apache 2.0 * Link: https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate * Already been used in the repo ##### Example Output ``` resourceLogs: - resource: attributes: - key: db.system.type value: stringValue: microsoft.sql_server scopeLogs: - logRecords: - attributes: - key: computer_name value: stringValue: DESKTOP-GHAEGRD - key: sql_instance value: stringValue: sqlserver - key: db.query_hash value: stringValue: "307833373834394538373431373145334633" - key: db.query_plan_hash value: stringValue: "307844333131323930393432394131423530" - key: db.total_elapsed_time value: intValue: "2" - key: db.total_rows value: intValue: "1" - key: db.total_logical_reads value: intValue: "2" - key: db.total_logical_writes value: intValue: "3" - key: db.total_physical_reads value: intValue: "4" - key: db.execution_count value: intValue: "5" - key: db.total_worker_time value: intValue: "2" - key: db.total_grant_kb value: intValue: "3095" - key: db.query_text value: stringValue: with qstats SELECT TOP ( @topNValue ) REPLACE ( @@ServerName, ? ) ... - key: db.query_plan value: stringValue: <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> ... body: {} spanId: "" timeUnixNano: "1739690452135336000" traceId: "" scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver version: development ``` <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of open-telemetry#36462 <!--Describe what testing was performed and which tests were added.--> #### Testing Added <!--Describe the documentation added.--> #### Documentation Updated <!--Please delete paragraphs that you did not use before submitting.--> --------- Co-authored-by: Will Li <[email protected]> Co-authored-by: Antoine Toulme <[email protected]>
…ery in mssql (open-telemetry#38632) Co-authored-by: @sincejune (CLA signed) This PR is associated with open-telemetry#37958 Since the other PR is still under review, we created this one separately to accelerate the review process. Reviewers who have completed the review of the previous PR can prioritize this one. #### Description We introduced Query Sample collection in this PR. The scraper will record all the currently executing queries once (in most case) and report related metrics. This enables users to monitor executed queries and correlate them with Top Query data for deeper insights into troubleshooting and performance optimization. #### Configuration We introduced one configuration for the feature(see receiver's README for details): 1. max_sample_query_count: this one is also added in the other PR. In this change, it is mainly used to configure the cache size for the reported samples. It would report the sample only if the query (identified by queryHash and queryPlanHash) is not in the cache #### New Log Attributes - `sqlserver.db_name` - `sqlserver.client_address` - `sqlserver.client_port` - `sqlserver.query_start` - `sqlserver.session_id` - `sqlserver.session_status` - `sqlserver.request_status` - `sqlserver.host_name` - `sqlserver.command` - `db.query.text` - `sqlserver.blocking_session_id` - `sqlserver.wait_type` - `sqlserver.wait_time` - `sqlserver.wait_resource` - `sqlserver.open_transaction_count` - `sqlserver.transaction_id` - `sqlserver.percent_complete` - `sqlserver.estimated_completion_time` - `sqlserver.cpu_time` - `sqlserver.total_elapsed_time` - `sqlserver.reads` - `sqlserver.writes` - `sqlserver.logical_reads` - `sqlserver.transaction_isolation_level` - `sqlserver.lock_timeout` - `sqlserver.deadlock_priority` - `sqlserver.row_count` - `sqlserver.query_hash` - `sqlserver.query_plan_hash` - `sqlserver.context_info` - `sqlserver.username` - `sqlserver.wait_code` - `sqlserver.wait_category` #### Additional dependency - hashicorp/golang-lru/v2 - License: MPL-2.0 - Link: [pkg.go.dev/github.com/hashicorp/golang-lru/v2](https://pkg.go.dev/github.com/hashicorp/golang-lru/v2) - Already been used in the repo - DataDog/datadog-agent/pkg/obfuscate - License: Apache 2.0 - Link: [pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate](https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate) - Already been used in the repo #### Example Output ```yaml resourceLogs: - resource: attributes: - key: db.system.type value: stringValue: microsoft.sql_server scopeLogs: - logRecords: - attributes: - key: sqlserver.db_name value: stringValue: master - key: sqlserver.client_address value: stringValue: "172.19.0.1" - key: sqlserver.client_port value: intValue: 59286 - key: sqlserver.query_start value: stringValue: 2025-02-12T16:37:54.843+08:00 - key: sqlserver.session_id value: intValue: 60 - key: sqlserver.session_status value: stringValue: running - key: sqlserver.request_status value: stringValue: running - key: sqlserver.host_name value: stringValue: DESKTOP-GHAEGRD - key: sqlserver.command value: stringValue: SELECT - key: db.query.text value: stringValue: "SELECT DB_NAME ( r.database_id ), ISNULL ( c.client_net_address, ? ), ISNULL ( c.client_tcp_port, ? ), CONVERT ( NVARCHAR, TODATETIMEOFFSET ( r.start_time, DATEPART ( TZOFFSET, SYSDATETIMEOFFSET ( ) ) ), ? ), s.session_id, s.STATUS, r.STATUS, ISNULL ( s.host_name, ? ), r.command, SUBSTRING ( o.TEXT, ( r.statement_start_offset / ? ) + ? ( ( CASE r.statement_end_offset WHEN - ? THEN DATALENGTH ( o.TEXT ) ELSE r.statement_end_offset END - r.statement_start_offset ) / ? ) + ? ), r.blocking_session_id, ISNULL ( r.wait_type, ? ), r.wait_time, r.wait_resource, r.open_transaction_count, r.transaction_id, r.percent_complete, r.estimated_completion_time, r.cpu_time, r.total_elapsed_time, r.reads, r.writes, r.logical_reads, r.transaction_isolation_level, r.LOCK_TIMEOUT, r.DEADLOCK_PRIORITY, r.row_count, r.query_hash, r.query_plan_hash, ISNULL ( r.context_info, CONVERT ( VARBINARY, ? ) ), s.login_name FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id INNER JOIN sys.dm_exec_connections c ON s.session_id = c.session_id CROSS APPLY sys.dm_exec_sql_text ( r.plan_handle )" - key: sqlserver.blocking_session_id value: intValue: 0 - key: sqlserver.wait_type value: stringValue: "" - key: sqlserver.wait_time value: intValue: 0 - key: sqlserver.wait_resource value: stringValue: "" - key: sqlserver.open_transaction_count value: intValue: 0 - key: sqlserver.transaction_id value: intValue: 11089 - key: sqlserver.percent_complete value: doubleValue: 0 - key: sqlserver.estimated_completion_time value: doubleValue: 0 - key: sqlserver.cpu_time value: intValue: 6 - key: sqlserver.total_elapsed_time value: intValue: 6 - key: sqlserver.reads value: intValue: 0 - key: sqlserver.writes value: intValue: 0 - key: sqlserver.logical_reads value: intValue: 38 - key: sqlserver.transaction_isolation_level value: intValue: 2 - key: sqlserver.lock_timeout value: intValue: -1 - key: sqlserver.deadlock_priority value: intValue: 0 - key: sqlserver.row_count value: intValue: 1 - key: sqlserver.query_hash value: stringValue: "307837304133423133304231303438443444" - key: sqlserver.query_plan_hash value: stringValue: "307831343032313046363442373838434239" - key: sqlserver.context_info value: stringValue: "" - key: sqlserver.username value: stringValue: sa - key: sqlserver.wait_code value: intValue: 0 - key: sqlserver.wait_category value: stringValue: Unknown body: stringValue: sample scope: name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver version: v0.0.1 ``` #### Link to tracking issue Part of open-telemetry#36462 #### Testing Added #### Documentation Updated --------- Co-authored-by: Chao Weng <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
Component(s)
receiver/sqlserver
Is your feature request related to a problem? Please describe.
Currently, this receiver only produces metrics from the instance perspective, like the number of batch requests received by the entire SQL Server instance.
But, there are no query-level metrics or logs to allow users to debug their queries. If a user wants to investigate the slow query issue, instance-level metrics are not gonna help a lot, as it does not show how a certain query is processed by the server. In this case, query-level metrics for a query that runs slowly could bring more context and greatly help the process of debugging and optimizing.
For instance, we can obtain physical reads of a single query consumed. This metric indicates disk I/O activity, and having a high number on this may suggest the query consumed too much I/O that the instance can serve, so we know where the bottleneck is and how to fix it.
Describe the solution you'd like
SQL server has an internal table
sys.dm_exec_query_stats
that stores query-level data. We can update the SQL server receiver to fetch the data from this table and produce corresponding metrics. Considering reducing the load when the SQL server is handling such a query from the receiver, it may be good to limit the number of queries returned, which says 200.Concerns
However, the length of the sanitized query text and the execution plan can be very long, which exceeds the length limit of metric storage. The execution plan has a high cardinality value that does not fit into the metric storage.
To solve this problem, I want to propose sending query texts and execution plans in logs.
The query-level metrics would
db.query.hash
attribute,db.query.text
by default.For the corresponding log, it would
event.name
attribute),db.query.hash
attribute (so the metrics and logs can be bonded together),db.query.text
,db.query_plan
So, every query result fetched from the SQL server would populate a metric and a corresponding log.
Describe alternatives you've considered
No response
Additional context
We (a Splunk team) are interested in implementing this feature. But, since using logs like this is not what a common receiver would do, we would like to know whether the community can accept this or not before we work on the implementation.
Other receivers that have a similar approach:
Related
It is possible to fetch the
trace_id
andspan_id
of a query from the SQL server, but it may be in the future plan. Not in the current scope.The text was updated successfully, but these errors were encountered: