Skip to content

[receiver/sqlserverreceiver] Add query-level data that contain query text and execution plan #36462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
XSAM opened this issue Nov 20, 2024 · 7 comments
Labels

Comments

@XSAM
Copy link
Member

XSAM commented Nov 20, 2024

Component(s)

receiver/sqlserver

Is your feature request related to a problem? Please describe.

Currently, this receiver only produces metrics from the instance perspective, like the number of batch requests received by the entire SQL Server instance.

But, there are no query-level metrics or logs to allow users to debug their queries. If a user wants to investigate the slow query issue, instance-level metrics are not gonna help a lot, as it does not show how a certain query is processed by the server. In this case, query-level metrics for a query that runs slowly could bring more context and greatly help the process of debugging and optimizing.

For instance, we can obtain physical reads of a single query consumed. This metric indicates disk I/O activity, and having a high number on this may suggest the query consumed too much I/O that the instance can serve, so we know where the bottleneck is and how to fix it.

Describe the solution you'd like

SQL server has an internal table sys.dm_exec_query_stats that stores query-level data. We can update the SQL server receiver to fetch the data from this table and produce corresponding metrics. Considering reducing the load when the SQL server is handling such a query from the receiver, it may be good to limit the number of queries returned, which says 200.

Concerns

However, the length of the sanitized query text and the execution plan can be very long, which exceeds the length limit of metric storage. The execution plan has a high cardinality value that does not fit into the metric storage.

To solve this problem, I want to propose sending query texts and execution plans in logs.

The query-level metrics would

  • contain a db.query.hash attribute,
  • but do not have db.query.text by default.

For the corresponding log, it would

  • contain a set of attributes that can be identified as generated from the SQL server receiver (like using the event.name attribute),
  • contain a db.query.hash attribute (so the metrics and logs can be bonded together),
  • a db.query.text,
  • a db.query_plan

So, every query result fetched from the SQL server would populate a metric and a corresponding log.

Describe alternatives you've considered

No response

Additional context

We (a Splunk team) are interested in implementing this feature. But, since using logs like this is not what a common receiver would do, we would like to know whether the community can accept this or not before we work on the implementation.

Other receivers that have a similar approach:

Related

It is possible to fetch the trace_id and span_id of a query from the SQL server, but it may be in the future plan. Not in the current scope.

@XSAM XSAM added enhancement New feature or request needs triage New item requiring triage labels Nov 20, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@XSAM
Copy link
Member Author

XSAM commented Nov 20, 2024

By the way, is an integration test required for such a change?

@dmitryax
Copy link
Member

Sounds reasonable to me. A number of top slow queries should be configurable. 200 may be too much for the default given that they are prone to churn and can cause big cardinality.

cc @djaglowski, we discussed this at KubeCon.

@dmitryax
Copy link
Member

By the way, is an integration test required for such a change?

It’s not required, but it would be great to have them. That way, we can ensure the component’s stability. Otherwise, we can miss if some change on the receiver or SQL server itself (new version) breaks this particular functionally. This can be part of a separate issue.

Copy link
Contributor

github-actions bot commented Feb 5, 2025

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@sincejune
Copy link
Contributor

Raised #37958 for the initial change, we added log collection in that PR for collecting query text and query plan as well.

@github-actions github-actions bot removed the Stale label Feb 17, 2025
MovieStoreGuy pushed a commit that referenced this issue Mar 19, 2025
…37958)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This PR introduced query-level data collection for `sqlserver` receiver
in the logs pipeline.

We introduced `Top Query` collection in this initial PR (`Top Queries`
are those queries which used the most CPU time within a time window)

Co-authored-by: @cuichenli (CLA signed) 
##### Configuration
We introduced these four configurations for the feature(see receiver's
README for details):
1. `logs.enable_top_query_collection` to enable the collection
2. `max_sample_query_count`: the initial query count to fetch from
database.
3. `top_query_count`: the number to report to the next consumer.
4. `lookback_time`: the query window for each scrape.

##### Workflow
The `sqlserver` receiver will fetch M(=`max_sample_query_count`) queries
from database and sort the queries according to the difference of
`total_elapsed_time`(CPU time used), and then report the first
N(=`top_query_count`) queries.

##### New Log Attributes
- `db.total_elapsed_time`
- `db.total_rows`
- `db.total_worker_time`
- `db.total_logical_reads`
- `db.total_logical_writes`
- `db.total_physical_reads`
- `db.execution_count`
- `db.total_grant_kb`
- `db.query_hash`
- `db.query_plan_hash`
- `db.query_text`
- `db.query_plan`

##### Additional dependency
* `hashicorp/golang-lru/v2`
  * License: MPL-2.0
  * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2
  * Already been used in the repo
* `DataDog/datadog-agent/pkg/obfuscate`
  * License: Apache 2.0
* Link:
https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate
  * Already been used in the repo

##### Example Output
```
resourceLogs:
  - resource:
      attributes:
        - key: db.system.type
          value:
            stringValue: microsoft.sql_server
    scopeLogs:
      - logRecords:
          - attributes:
              - key: computer_name
                value:
                  stringValue: DESKTOP-GHAEGRD
              - key: sql_instance
                value:
                  stringValue: sqlserver
              - key: db.query_hash
                value:
                  stringValue: "307833373834394538373431373145334633"
              - key: db.query_plan_hash
                value:
                  stringValue: "307844333131323930393432394131423530"
              - key: db.total_elapsed_time
                value:
                  intValue: "2"
              - key: db.total_rows
                value:
                  intValue: "1"
              - key: db.total_logical_reads
                value:
                  intValue: "2"
              - key: db.total_logical_writes
                value:
                  intValue: "3"
              - key: db.total_physical_reads
                value:
                  intValue: "4"
              - key: db.execution_count
                value:
                  intValue: "5"
              - key: db.total_worker_time
                value:
                  intValue: "2"
              - key: db.total_grant_kb
                value:
                  intValue: "3095"
              - key: db.query_text
                value:
                  stringValue: with qstats SELECT TOP ( @topNValue ) REPLACE ( @@ServerName, ? ) ...
              - key: db.query_plan
                value:
                  stringValue: <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> ...
            body: {}
            spanId: ""
            timeUnixNano: "1739690452135336000"
            traceId: ""
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver
          version: development
```
<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of #36462

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added

<!--Describe the documentation added.-->
#### Documentation
Updated

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Will Li <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
lightme16 pushed a commit to lightme16/opentelemetry-collector-contrib that referenced this issue Mar 19, 2025
…pen-telemetry#37958)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This PR introduced query-level data collection for `sqlserver` receiver
in the logs pipeline.

We introduced `Top Query` collection in this initial PR (`Top Queries`
are those queries which used the most CPU time within a time window)

Co-authored-by: @cuichenli (CLA signed) 
##### Configuration
We introduced these four configurations for the feature(see receiver's
README for details):
1. `logs.enable_top_query_collection` to enable the collection
2. `max_sample_query_count`: the initial query count to fetch from
database.
3. `top_query_count`: the number to report to the next consumer.
4. `lookback_time`: the query window for each scrape.

##### Workflow
The `sqlserver` receiver will fetch M(=`max_sample_query_count`) queries
from database and sort the queries according to the difference of
`total_elapsed_time`(CPU time used), and then report the first
N(=`top_query_count`) queries.

##### New Log Attributes
- `db.total_elapsed_time`
- `db.total_rows`
- `db.total_worker_time`
- `db.total_logical_reads`
- `db.total_logical_writes`
- `db.total_physical_reads`
- `db.execution_count`
- `db.total_grant_kb`
- `db.query_hash`
- `db.query_plan_hash`
- `db.query_text`
- `db.query_plan`

##### Additional dependency
* `hashicorp/golang-lru/v2`
  * License: MPL-2.0
  * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2
  * Already been used in the repo
* `DataDog/datadog-agent/pkg/obfuscate`
  * License: Apache 2.0
* Link:
https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate
  * Already been used in the repo

##### Example Output
```
resourceLogs:
  - resource:
      attributes:
        - key: db.system.type
          value:
            stringValue: microsoft.sql_server
    scopeLogs:
      - logRecords:
          - attributes:
              - key: computer_name
                value:
                  stringValue: DESKTOP-GHAEGRD
              - key: sql_instance
                value:
                  stringValue: sqlserver
              - key: db.query_hash
                value:
                  stringValue: "307833373834394538373431373145334633"
              - key: db.query_plan_hash
                value:
                  stringValue: "307844333131323930393432394131423530"
              - key: db.total_elapsed_time
                value:
                  intValue: "2"
              - key: db.total_rows
                value:
                  intValue: "1"
              - key: db.total_logical_reads
                value:
                  intValue: "2"
              - key: db.total_logical_writes
                value:
                  intValue: "3"
              - key: db.total_physical_reads
                value:
                  intValue: "4"
              - key: db.execution_count
                value:
                  intValue: "5"
              - key: db.total_worker_time
                value:
                  intValue: "2"
              - key: db.total_grant_kb
                value:
                  intValue: "3095"
              - key: db.query_text
                value:
                  stringValue: with qstats SELECT TOP ( @topNValue ) REPLACE ( @@ServerName, ? ) ...
              - key: db.query_plan
                value:
                  stringValue: <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> ...
            body: {}
            spanId: ""
            timeUnixNano: "1739690452135336000"
            traceId: ""
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver
          version: development
```
<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of open-telemetry#36462

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added

<!--Describe the documentation added.-->
#### Documentation
Updated

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Will Li <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
mx-psi pushed a commit that referenced this issue Mar 28, 2025
…ery in mssql (#38632)

Co-authored-by:  @sincejune (CLA signed)

This PR is associated with
#37958
Since the other PR is still under review, we created this one separately
to accelerate the review process. Reviewers who have completed the
review of the previous PR can prioritize this one.

#### Description

We introduced Query Sample collection in this PR. The scraper will
record all the currently executing queries once (in most case) and
report related metrics. This enables users to monitor executed queries
and correlate them with Top Query data for deeper insights into
troubleshooting and performance optimization.

#### Configuration
We introduced one configuration for the feature(see receiver's README
for details):
1. max_sample_query_count: this one is also added in the other PR. In
this change, it is mainly used to configure the cache size for the
reported samples. It would report the sample only if the query
(identified by queryHash and queryPlanHash) is not in the cache

#### New Log Attributes

- `sqlserver.db_name`
- `sqlserver.client_address`
- `sqlserver.client_port`
- `sqlserver.query_start`
- `sqlserver.session_id`
- `sqlserver.session_status`
- `sqlserver.request_status`
- `sqlserver.host_name`
- `sqlserver.command`
- `db.query.text`
- `sqlserver.blocking_session_id`
- `sqlserver.wait_type`
- `sqlserver.wait_time`
- `sqlserver.wait_resource`
- `sqlserver.open_transaction_count`
- `sqlserver.transaction_id`
- `sqlserver.percent_complete`
- `sqlserver.estimated_completion_time`
- `sqlserver.cpu_time`
- `sqlserver.total_elapsed_time`
- `sqlserver.reads`
- `sqlserver.writes`
- `sqlserver.logical_reads`
- `sqlserver.transaction_isolation_level`
- `sqlserver.lock_timeout`
- `sqlserver.deadlock_priority`
- `sqlserver.row_count`
- `sqlserver.query_hash`
- `sqlserver.query_plan_hash`
- `sqlserver.context_info`
- `sqlserver.username`
- `sqlserver.wait_code`
- `sqlserver.wait_category`

#### Additional dependency
- hashicorp/golang-lru/v2
	- License: MPL-2.0
- Link:
[pkg.go.dev/github.com/hashicorp/golang-lru/v2](https://pkg.go.dev/github.com/hashicorp/golang-lru/v2)
	- Already been used in the repo
- DataDog/datadog-agent/pkg/obfuscate
	- License: Apache 2.0
- Link:
[pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate](https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate)
	- Already been used in the repo

#### Example Output
```yaml
resourceLogs:
  - resource:
      attributes:
        - key: db.system.type
          value:
            stringValue: microsoft.sql_server
    scopeLogs:
      - logRecords:
          - attributes:
              - key: sqlserver.db_name
                value:
                  stringValue: master
              - key: sqlserver.client_address
                value:
                  stringValue: "172.19.0.1"
              - key: sqlserver.client_port
                value:
                  intValue: 59286
              - key: sqlserver.query_start
                value:
                  stringValue: 2025-02-12T16:37:54.843+08:00
              - key: sqlserver.session_id
                value:
                  intValue: 60
              - key: sqlserver.session_status
                value:
                  stringValue: running
              - key: sqlserver.request_status
                value:
                  stringValue: running
              - key: sqlserver.host_name
                value:
                  stringValue: DESKTOP-GHAEGRD
              - key: sqlserver.command
                value:
                  stringValue: SELECT
              - key: db.query.text
                value:
                  stringValue: "SELECT DB_NAME ( r.database_id ), ISNULL ( c.client_net_address, ? ), ISNULL ( c.client_tcp_port, ? ), CONVERT ( NVARCHAR, TODATETIMEOFFSET ( r.start_time, DATEPART ( TZOFFSET, SYSDATETIMEOFFSET ( ) ) ), ? ), s.session_id, s.STATUS, r.STATUS, ISNULL ( s.host_name, ? ), r.command, SUBSTRING ( o.TEXT, ( r.statement_start_offset / ? ) + ? ( ( CASE r.statement_end_offset WHEN - ? THEN DATALENGTH ( o.TEXT ) ELSE r.statement_end_offset END - r.statement_start_offset ) / ? ) + ? ), r.blocking_session_id, ISNULL ( r.wait_type, ? ), r.wait_time, r.wait_resource, r.open_transaction_count, r.transaction_id, r.percent_complete, r.estimated_completion_time, r.cpu_time, r.total_elapsed_time, r.reads, r.writes, r.logical_reads, r.transaction_isolation_level, r.LOCK_TIMEOUT, r.DEADLOCK_PRIORITY, r.row_count, r.query_hash, r.query_plan_hash, ISNULL ( r.context_info, CONVERT ( VARBINARY, ? ) ), s.login_name FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id INNER JOIN sys.dm_exec_connections c ON s.session_id = c.session_id CROSS APPLY sys.dm_exec_sql_text ( r.plan_handle )"
              - key: sqlserver.blocking_session_id
                value:
                  intValue: 0
              - key: sqlserver.wait_type
                value:
                  stringValue: ""
              - key: sqlserver.wait_time
                value:
                  intValue: 0
              - key: sqlserver.wait_resource
                value:
                  stringValue: ""
              - key: sqlserver.open_transaction_count
                value:
                  intValue: 0
              - key: sqlserver.transaction_id
                value:
                  intValue: 11089
              - key: sqlserver.percent_complete
                value:
                  doubleValue: 0
              - key: sqlserver.estimated_completion_time
                value:
                  doubleValue: 0
              - key: sqlserver.cpu_time
                value:
                  intValue: 6
              - key: sqlserver.total_elapsed_time
                value:
                  intValue: 6
              - key: sqlserver.reads
                value:
                  intValue: 0
              - key: sqlserver.writes
                value:
                  intValue: 0
              - key: sqlserver.logical_reads
                value:
                  intValue: 38
              - key: sqlserver.transaction_isolation_level
                value:
                  intValue: 2
              - key: sqlserver.lock_timeout
                value:
                  intValue: -1
              - key: sqlserver.deadlock_priority
                value:
                  intValue: 0
              - key: sqlserver.row_count
                value:
                  intValue: 1
              - key: sqlserver.query_hash
                value:
                  stringValue: "307837304133423133304231303438443444"
              - key: sqlserver.query_plan_hash
                value:
                  stringValue: "307831343032313046363442373838434239"
              - key: sqlserver.context_info
                value:
                  stringValue: ""
              - key: sqlserver.username
                value:
                  stringValue: sa
              - key: sqlserver.wait_code
                value:
                  intValue: 0
              - key: sqlserver.wait_category
                value:
                  stringValue: Unknown
            body:
              stringValue: sample
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver
          version: v0.0.1

```

#### Link to tracking issue
Part of
#36462

#### Testing
Added

#### Documentation
Updated

---------

Co-authored-by: Chao Weng <[email protected]>
dmathieu pushed a commit to dmathieu/opentelemetry-collector-contrib that referenced this issue Apr 8, 2025
…ery in mssql (open-telemetry#38632)

Co-authored-by:  @sincejune (CLA signed)

This PR is associated with
open-telemetry#37958
Since the other PR is still under review, we created this one separately
to accelerate the review process. Reviewers who have completed the
review of the previous PR can prioritize this one.

#### Description

We introduced Query Sample collection in this PR. The scraper will
record all the currently executing queries once (in most case) and
report related metrics. This enables users to monitor executed queries
and correlate them with Top Query data for deeper insights into
troubleshooting and performance optimization.

#### Configuration
We introduced one configuration for the feature(see receiver's README
for details):
1. max_sample_query_count: this one is also added in the other PR. In
this change, it is mainly used to configure the cache size for the
reported samples. It would report the sample only if the query
(identified by queryHash and queryPlanHash) is not in the cache

#### New Log Attributes

- `sqlserver.db_name`
- `sqlserver.client_address`
- `sqlserver.client_port`
- `sqlserver.query_start`
- `sqlserver.session_id`
- `sqlserver.session_status`
- `sqlserver.request_status`
- `sqlserver.host_name`
- `sqlserver.command`
- `db.query.text`
- `sqlserver.blocking_session_id`
- `sqlserver.wait_type`
- `sqlserver.wait_time`
- `sqlserver.wait_resource`
- `sqlserver.open_transaction_count`
- `sqlserver.transaction_id`
- `sqlserver.percent_complete`
- `sqlserver.estimated_completion_time`
- `sqlserver.cpu_time`
- `sqlserver.total_elapsed_time`
- `sqlserver.reads`
- `sqlserver.writes`
- `sqlserver.logical_reads`
- `sqlserver.transaction_isolation_level`
- `sqlserver.lock_timeout`
- `sqlserver.deadlock_priority`
- `sqlserver.row_count`
- `sqlserver.query_hash`
- `sqlserver.query_plan_hash`
- `sqlserver.context_info`
- `sqlserver.username`
- `sqlserver.wait_code`
- `sqlserver.wait_category`

#### Additional dependency
- hashicorp/golang-lru/v2
	- License: MPL-2.0
- Link:
[pkg.go.dev/github.com/hashicorp/golang-lru/v2](https://pkg.go.dev/github.com/hashicorp/golang-lru/v2)
	- Already been used in the repo
- DataDog/datadog-agent/pkg/obfuscate
	- License: Apache 2.0
- Link:
[pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate](https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate)
	- Already been used in the repo

#### Example Output
```yaml
resourceLogs:
  - resource:
      attributes:
        - key: db.system.type
          value:
            stringValue: microsoft.sql_server
    scopeLogs:
      - logRecords:
          - attributes:
              - key: sqlserver.db_name
                value:
                  stringValue: master
              - key: sqlserver.client_address
                value:
                  stringValue: "172.19.0.1"
              - key: sqlserver.client_port
                value:
                  intValue: 59286
              - key: sqlserver.query_start
                value:
                  stringValue: 2025-02-12T16:37:54.843+08:00
              - key: sqlserver.session_id
                value:
                  intValue: 60
              - key: sqlserver.session_status
                value:
                  stringValue: running
              - key: sqlserver.request_status
                value:
                  stringValue: running
              - key: sqlserver.host_name
                value:
                  stringValue: DESKTOP-GHAEGRD
              - key: sqlserver.command
                value:
                  stringValue: SELECT
              - key: db.query.text
                value:
                  stringValue: "SELECT DB_NAME ( r.database_id ), ISNULL ( c.client_net_address, ? ), ISNULL ( c.client_tcp_port, ? ), CONVERT ( NVARCHAR, TODATETIMEOFFSET ( r.start_time, DATEPART ( TZOFFSET, SYSDATETIMEOFFSET ( ) ) ), ? ), s.session_id, s.STATUS, r.STATUS, ISNULL ( s.host_name, ? ), r.command, SUBSTRING ( o.TEXT, ( r.statement_start_offset / ? ) + ? ( ( CASE r.statement_end_offset WHEN - ? THEN DATALENGTH ( o.TEXT ) ELSE r.statement_end_offset END - r.statement_start_offset ) / ? ) + ? ), r.blocking_session_id, ISNULL ( r.wait_type, ? ), r.wait_time, r.wait_resource, r.open_transaction_count, r.transaction_id, r.percent_complete, r.estimated_completion_time, r.cpu_time, r.total_elapsed_time, r.reads, r.writes, r.logical_reads, r.transaction_isolation_level, r.LOCK_TIMEOUT, r.DEADLOCK_PRIORITY, r.row_count, r.query_hash, r.query_plan_hash, ISNULL ( r.context_info, CONVERT ( VARBINARY, ? ) ), s.login_name FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id INNER JOIN sys.dm_exec_connections c ON s.session_id = c.session_id CROSS APPLY sys.dm_exec_sql_text ( r.plan_handle )"
              - key: sqlserver.blocking_session_id
                value:
                  intValue: 0
              - key: sqlserver.wait_type
                value:
                  stringValue: ""
              - key: sqlserver.wait_time
                value:
                  intValue: 0
              - key: sqlserver.wait_resource
                value:
                  stringValue: ""
              - key: sqlserver.open_transaction_count
                value:
                  intValue: 0
              - key: sqlserver.transaction_id
                value:
                  intValue: 11089
              - key: sqlserver.percent_complete
                value:
                  doubleValue: 0
              - key: sqlserver.estimated_completion_time
                value:
                  doubleValue: 0
              - key: sqlserver.cpu_time
                value:
                  intValue: 6
              - key: sqlserver.total_elapsed_time
                value:
                  intValue: 6
              - key: sqlserver.reads
                value:
                  intValue: 0
              - key: sqlserver.writes
                value:
                  intValue: 0
              - key: sqlserver.logical_reads
                value:
                  intValue: 38
              - key: sqlserver.transaction_isolation_level
                value:
                  intValue: 2
              - key: sqlserver.lock_timeout
                value:
                  intValue: -1
              - key: sqlserver.deadlock_priority
                value:
                  intValue: 0
              - key: sqlserver.row_count
                value:
                  intValue: 1
              - key: sqlserver.query_hash
                value:
                  stringValue: "307837304133423133304231303438443444"
              - key: sqlserver.query_plan_hash
                value:
                  stringValue: "307831343032313046363442373838434239"
              - key: sqlserver.context_info
                value:
                  stringValue: ""
              - key: sqlserver.username
                value:
                  stringValue: sa
              - key: sqlserver.wait_code
                value:
                  intValue: 0
              - key: sqlserver.wait_category
                value:
                  stringValue: Unknown
            body:
              stringValue: sample
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver
          version: v0.0.1

```

#### Link to tracking issue
Part of
open-telemetry#36462

#### Testing
Added

#### Documentation
Updated

---------

Co-authored-by: Chao Weng <[email protected]>
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 21, 2025
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
…pen-telemetry#37958)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
This PR introduced query-level data collection for `sqlserver` receiver
in the logs pipeline.

We introduced `Top Query` collection in this initial PR (`Top Queries`
are those queries which used the most CPU time within a time window)

Co-authored-by: @cuichenli (CLA signed) 
##### Configuration
We introduced these four configurations for the feature(see receiver's
README for details):
1. `logs.enable_top_query_collection` to enable the collection
2. `max_sample_query_count`: the initial query count to fetch from
database.
3. `top_query_count`: the number to report to the next consumer.
4. `lookback_time`: the query window for each scrape.

##### Workflow
The `sqlserver` receiver will fetch M(=`max_sample_query_count`) queries
from database and sort the queries according to the difference of
`total_elapsed_time`(CPU time used), and then report the first
N(=`top_query_count`) queries.

##### New Log Attributes
- `db.total_elapsed_time`
- `db.total_rows`
- `db.total_worker_time`
- `db.total_logical_reads`
- `db.total_logical_writes`
- `db.total_physical_reads`
- `db.execution_count`
- `db.total_grant_kb`
- `db.query_hash`
- `db.query_plan_hash`
- `db.query_text`
- `db.query_plan`

##### Additional dependency
* `hashicorp/golang-lru/v2`
  * License: MPL-2.0
  * Link: https://pkg.go.dev/github.com/hashicorp/golang-lru/v2
  * Already been used in the repo
* `DataDog/datadog-agent/pkg/obfuscate`
  * License: Apache 2.0
* Link:
https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate
  * Already been used in the repo

##### Example Output
```
resourceLogs:
  - resource:
      attributes:
        - key: db.system.type
          value:
            stringValue: microsoft.sql_server
    scopeLogs:
      - logRecords:
          - attributes:
              - key: computer_name
                value:
                  stringValue: DESKTOP-GHAEGRD
              - key: sql_instance
                value:
                  stringValue: sqlserver
              - key: db.query_hash
                value:
                  stringValue: "307833373834394538373431373145334633"
              - key: db.query_plan_hash
                value:
                  stringValue: "307844333131323930393432394131423530"
              - key: db.total_elapsed_time
                value:
                  intValue: "2"
              - key: db.total_rows
                value:
                  intValue: "1"
              - key: db.total_logical_reads
                value:
                  intValue: "2"
              - key: db.total_logical_writes
                value:
                  intValue: "3"
              - key: db.total_physical_reads
                value:
                  intValue: "4"
              - key: db.execution_count
                value:
                  intValue: "5"
              - key: db.total_worker_time
                value:
                  intValue: "2"
              - key: db.total_grant_kb
                value:
                  intValue: "3095"
              - key: db.query_text
                value:
                  stringValue: with qstats SELECT TOP ( @topNValue ) REPLACE ( @@ServerName, ? ) ...
              - key: db.query_plan
                value:
                  stringValue: <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"> ...
            body: {}
            spanId: ""
            timeUnixNano: "1739690452135336000"
            traceId: ""
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver
          version: development
```
<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Part of open-telemetry#36462

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Added

<!--Describe the documentation added.-->
#### Documentation
Updated

<!--Please delete paragraphs that you did not use before submitting.-->

---------

Co-authored-by: Will Li <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
…ery in mssql (open-telemetry#38632)

Co-authored-by:  @sincejune (CLA signed)

This PR is associated with
open-telemetry#37958
Since the other PR is still under review, we created this one separately
to accelerate the review process. Reviewers who have completed the
review of the previous PR can prioritize this one.

#### Description

We introduced Query Sample collection in this PR. The scraper will
record all the currently executing queries once (in most case) and
report related metrics. This enables users to monitor executed queries
and correlate them with Top Query data for deeper insights into
troubleshooting and performance optimization.

#### Configuration
We introduced one configuration for the feature(see receiver's README
for details):
1. max_sample_query_count: this one is also added in the other PR. In
this change, it is mainly used to configure the cache size for the
reported samples. It would report the sample only if the query
(identified by queryHash and queryPlanHash) is not in the cache

#### New Log Attributes

- `sqlserver.db_name`
- `sqlserver.client_address`
- `sqlserver.client_port`
- `sqlserver.query_start`
- `sqlserver.session_id`
- `sqlserver.session_status`
- `sqlserver.request_status`
- `sqlserver.host_name`
- `sqlserver.command`
- `db.query.text`
- `sqlserver.blocking_session_id`
- `sqlserver.wait_type`
- `sqlserver.wait_time`
- `sqlserver.wait_resource`
- `sqlserver.open_transaction_count`
- `sqlserver.transaction_id`
- `sqlserver.percent_complete`
- `sqlserver.estimated_completion_time`
- `sqlserver.cpu_time`
- `sqlserver.total_elapsed_time`
- `sqlserver.reads`
- `sqlserver.writes`
- `sqlserver.logical_reads`
- `sqlserver.transaction_isolation_level`
- `sqlserver.lock_timeout`
- `sqlserver.deadlock_priority`
- `sqlserver.row_count`
- `sqlserver.query_hash`
- `sqlserver.query_plan_hash`
- `sqlserver.context_info`
- `sqlserver.username`
- `sqlserver.wait_code`
- `sqlserver.wait_category`

#### Additional dependency
- hashicorp/golang-lru/v2
	- License: MPL-2.0
- Link:
[pkg.go.dev/github.com/hashicorp/golang-lru/v2](https://pkg.go.dev/github.com/hashicorp/golang-lru/v2)
	- Already been used in the repo
- DataDog/datadog-agent/pkg/obfuscate
	- License: Apache 2.0
- Link:
[pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate](https://pkg.go.dev/github.com/DataDog/datadog-agent/pkg/obfuscate)
	- Already been used in the repo

#### Example Output
```yaml
resourceLogs:
  - resource:
      attributes:
        - key: db.system.type
          value:
            stringValue: microsoft.sql_server
    scopeLogs:
      - logRecords:
          - attributes:
              - key: sqlserver.db_name
                value:
                  stringValue: master
              - key: sqlserver.client_address
                value:
                  stringValue: "172.19.0.1"
              - key: sqlserver.client_port
                value:
                  intValue: 59286
              - key: sqlserver.query_start
                value:
                  stringValue: 2025-02-12T16:37:54.843+08:00
              - key: sqlserver.session_id
                value:
                  intValue: 60
              - key: sqlserver.session_status
                value:
                  stringValue: running
              - key: sqlserver.request_status
                value:
                  stringValue: running
              - key: sqlserver.host_name
                value:
                  stringValue: DESKTOP-GHAEGRD
              - key: sqlserver.command
                value:
                  stringValue: SELECT
              - key: db.query.text
                value:
                  stringValue: "SELECT DB_NAME ( r.database_id ), ISNULL ( c.client_net_address, ? ), ISNULL ( c.client_tcp_port, ? ), CONVERT ( NVARCHAR, TODATETIMEOFFSET ( r.start_time, DATEPART ( TZOFFSET, SYSDATETIMEOFFSET ( ) ) ), ? ), s.session_id, s.STATUS, r.STATUS, ISNULL ( s.host_name, ? ), r.command, SUBSTRING ( o.TEXT, ( r.statement_start_offset / ? ) + ? ( ( CASE r.statement_end_offset WHEN - ? THEN DATALENGTH ( o.TEXT ) ELSE r.statement_end_offset END - r.statement_start_offset ) / ? ) + ? ), r.blocking_session_id, ISNULL ( r.wait_type, ? ), r.wait_time, r.wait_resource, r.open_transaction_count, r.transaction_id, r.percent_complete, r.estimated_completion_time, r.cpu_time, r.total_elapsed_time, r.reads, r.writes, r.logical_reads, r.transaction_isolation_level, r.LOCK_TIMEOUT, r.DEADLOCK_PRIORITY, r.row_count, r.query_hash, r.query_plan_hash, ISNULL ( r.context_info, CONVERT ( VARBINARY, ? ) ), s.login_name FROM sys.dm_exec_requests r INNER JOIN sys.dm_exec_sessions s ON r.session_id = s.session_id INNER JOIN sys.dm_exec_connections c ON s.session_id = c.session_id CROSS APPLY sys.dm_exec_sql_text ( r.plan_handle )"
              - key: sqlserver.blocking_session_id
                value:
                  intValue: 0
              - key: sqlserver.wait_type
                value:
                  stringValue: ""
              - key: sqlserver.wait_time
                value:
                  intValue: 0
              - key: sqlserver.wait_resource
                value:
                  stringValue: ""
              - key: sqlserver.open_transaction_count
                value:
                  intValue: 0
              - key: sqlserver.transaction_id
                value:
                  intValue: 11089
              - key: sqlserver.percent_complete
                value:
                  doubleValue: 0
              - key: sqlserver.estimated_completion_time
                value:
                  doubleValue: 0
              - key: sqlserver.cpu_time
                value:
                  intValue: 6
              - key: sqlserver.total_elapsed_time
                value:
                  intValue: 6
              - key: sqlserver.reads
                value:
                  intValue: 0
              - key: sqlserver.writes
                value:
                  intValue: 0
              - key: sqlserver.logical_reads
                value:
                  intValue: 38
              - key: sqlserver.transaction_isolation_level
                value:
                  intValue: 2
              - key: sqlserver.lock_timeout
                value:
                  intValue: -1
              - key: sqlserver.deadlock_priority
                value:
                  intValue: 0
              - key: sqlserver.row_count
                value:
                  intValue: 1
              - key: sqlserver.query_hash
                value:
                  stringValue: "307837304133423133304231303438443444"
              - key: sqlserver.query_plan_hash
                value:
                  stringValue: "307831343032313046363442373838434239"
              - key: sqlserver.context_info
                value:
                  stringValue: ""
              - key: sqlserver.username
                value:
                  stringValue: sa
              - key: sqlserver.wait_code
                value:
                  intValue: 0
              - key: sqlserver.wait_category
                value:
                  stringValue: Unknown
            body:
              stringValue: sample
        scope:
          name: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/sqlserverreceiver
          version: v0.0.1

```

#### Link to tracking issue
Part of
open-telemetry#36462

#### Testing
Added

#### Documentation
Updated

---------

Co-authored-by: Chao Weng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants