Skip to content

Conversation

@trosenblatt
Copy link
Contributor

@trosenblatt trosenblatt commented Dec 9, 2025

Description

This PR adds timing metrics to the scan_async return value to enable better observability of scanning performance, particularly for tracking CPU time vs I/O time in async scans.

Related JIRA: SDS-1904

Changes

Core Changes in dd-sds

  1. New ScanMetrics struct - Captures timing information:

    • total_duration: Total time from scan start to completion
    • io_duration: Time spent in I/O operations (async rules making network requests)
    • num_async_rules: Count of async rules executed
  2. New ScanResult struct - Wraps scan results with metrics:

    • matches: Vector of RuleMatch objects (existing functionality)
    • metrics: ScanMetrics with timing information
  3. Updated scan_async return type:

    • Changed from Result<Vec<RuleMatch>, ScannerError>
    • To: Result<ScanResult, ScannerError>
  4. I/O Duration Tracking:

    • Each async rule now tracks its I/O duration
    • Total I/O duration is accumulated from all async rules
    • Enables calculating CPU time as: total_duration - io_duration

Breaking Changes

  • scan_async() and scan_async_with_options() now return ScanResult instead of Vec<RuleMatch>
  • Callers need to access matches via result.matches instead of using the result directly
  • Tests updated to use the new return structure

Usage in sds-shared-library

This change enables the sds-shared-library JNI bindings to report a new metric: scan_cpu_duration_ns

For Synchronous Scans:

  • Measures total wall-clock time including JNI encoding/decoding overhead

For Asynchronous Scans:

  • Calculates CPU time as: (completion_time - start_time) - io_duration
  • Includes JNI overhead (encoding/decoding between Java and Rust)
  • Excludes I/O time (network requests for validators)
  • Excludes queue waiting time

This provides accurate CPU usage metrics for both sync and async scanning patterns, enabling better performance analysis and resource planning.

Testing

  • Updated existing async rule tests to use new ScanResult structure
  • All tests pass with the new return type
  • Verified timing metrics are correctly populated

trosenblatt and others added 5 commits December 9, 2025 16:51
This commit adds comprehensive timing metrics for scanner operations:

1. **I/O Duration Tracking**:
   - Added `io_duration` field to `AsyncRuleInfo` to track time spent in async I/O operations
   - Modified `process_async` to measure duration of async operations using `Instant::now()`
   - `internal_scan` now aggregates I/O duration from all async jobs

2. **CPU Duration Histogram Metric**:
   - Added `cpu_duration` histogram field to `ScannerMetrics` using highcard labels
   - Metric name: `scanning.cpu_duration` (in nanoseconds)
   - Calculated as: `total_duration - io_duration` to exclude I/O wait time
   - Recorded in `internal_scan_with_metrics` after each scan

3. **New Return Types for Async Scan Methods**:
   - Created `ScanMetrics` struct containing:
     - `total_duration`: Total scan time
     - `io_duration`: Time spent in I/O operations
     - `num_async_rules`: Number of async rules executed
   - Created `ScanResult` struct containing matches and metrics
   - Updated `scan_async` and `scan_async_with_options` to return `ScanResult`
   - Synchronous scan methods remain unchanged for backward compatibility

4. **Updated Tests**:
   - Fixed async test assertions to work with new `ScanResult` return type
   - All 293 tests passing

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants