Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

andsel · 2025-12-04T13:37:03Z

Release notes

Exposes batch size metrics for last 1, 5 and 15 minutes.

What does this PR do?

Updates stats API response to expose also 1m, 5m and 15m average batch metrics.

Changed the response map returned by refine_batch_metrics method as result of API query to _node/stats so tha contains the average values of last 1, 5 and 15 minutes for event_count and batch_size. These data is published once they are available from the metric collector.

Why is it important/What is the impact to the user?

This feature permit to the user of Logstash to have the metering of batch average values over some recent time windows.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~ This feature rely on ExtendedFlowMetric which is extensively tested about these time window management. To create a test at API level we should implement something that load for at least the time window duration and check the API response. Test that runs for minutes are not feasible.

Author's Checklist

[ ]

How to test this PR locally

Use the same test harness proposed in #18000, switch pipeline.batch.metrics.sampling_mode to full and monitor for 1, 5, and 15 minutes the result of _node/stats with:

curl http://localhost:9600/_node/stats | jq .pipelines.main.batch

Related issues

Closes Expose the meter of average value of batch's byte size and event count for 1m, 5m 15m windows #17998

Use cases

Screenshots

Logs

…h metrics

github-actions · 2025-12-04T13:37:13Z

🤖 GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
/run exhaustive tests : Run the exhaustive tests Buildkite pipeline.

mergify · 2025-12-04T13:37:40Z

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

andsel · 2025-12-04T16:51:56Z

run exhaustive test

…can't be created (1 minute and more intervals)

docs/static/spec/openapi/logstash-api.yaml

logstash-core/lib/logstash/api/commands/stats.rb

donoghuc

Should we add a few cases in the integration test? Seems like we have a place it would be easy to add

logstash/qa/integration/specs/monitoring_api_spec.rb

Lines 311 to 339 in 3659b6f

    
               Stud.try(max_retry.times, [StandardError, RSpec::Expectations::ExpectationNotMetError]) do 
        
                 # node_stats can fail if the stats subsystem isn't ready 
        
                 result = logstash_service.monitoring_api.node_stats rescue nil 
        
                 expect(result).not_to be_nil 
        
                 # we use fetch here since we want failed fetches to raise an exception 
        
                 # and trigger the retry block 
        
                 batch_stats = result.fetch("pipelines").fetch(pipeline_id).fetch("batch") 
        
                 expect(batch_stats).not_to be_nil 
        
                 expect(batch_stats["event_count"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["average"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["average"]["lifetime"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["average"]["lifetime"]).to be_a_kind_of(Numeric) 
        
                 expect(batch_stats["event_count"]["average"]["lifetime"]).to be > 0 
        
                 expect(batch_stats["event_count"]["current"]).not_to be_nil 
        
                 expect(batch_stats["event_count"]["current"]).to be >= 0 
        
                 expect(batch_stats["byte_size"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["average"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["average"]["lifetime"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["average"]["lifetime"]).to be_a_kind_of(Numeric) 
        
                 expect(batch_stats["byte_size"]["average"]["lifetime"]).to be > 0 
        
                 expect(batch_stats["byte_size"]["current"]).not_to be_nil 
        
                 expect(batch_stats["byte_size"]["current"]).to be >= 0 
        
               end 
        
             end 
        
           end

…PI response.

andsel · 2025-12-09T15:49:22Z

That would be the right point, but verifying at least the 1 minute average flow metric would be that the test need to sleep and wait for 1 minute so that the value pops up in the response. Waiting for 5 or 15 minutes to execute a test would be great waste, WDYT?

donoghuc · 2025-12-09T20:36:11Z

logstash-core/lib/logstash/api/commands/stats.rb

            # reading and retrieve unrelated values
            current_data_point = stats[:batch][:current]
-            {
+            # average return a FlowMetric which and we need to invoke getValue to obtain the map with metric details.


Having a hard time parsing this comment, not sure exactly what the message is supposed to convey.

It's more a reminder for the reader why for this metric we need to invoke the getValue method.
The reason why resides in the fact that the flow metric nested in batch/[event_count | byte_szie]/average, to return the sub-document which contains the lifetime, last_1_minute etc metric values needs to be explicitly queried to return the map with those values:

logstash/logstash-core/src/main/java/org/logstash/instrument/metrics/ExtendedFlowMetric.java

Line 89 in 279171b

public Map<String, Double> getValue() {

Usually the paths drive to single value metric objects, but this a composite that needs explicit query to grab the contained values.

@donoghuc let me know if you want me to reword the sentence so that could sound more meaningful (a good suggestion from a native speaker is very welcome :-) )

donoghuc

ah, i did not consider the time cost of that test 😅

Local testing shows it is working as advertised

➜  ~ curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipelines.main.batch'
{
  "event_count": {
    "current": 0,
    "average": {
      "lifetime": 1,
      "last_1_minute": 1,
      "last_5_minutes": 1,
      "last_15_minutes": 1
    }
  },
  "byte_size": {
    "current": 0,
    "average": {
      "lifetime": 1428,
      "last_1_minute": 1683,
      "last_5_minutes": 1448,
      "last_15_minutes": 1445
    }
  }
}

logstash-core/lib/logstash/api/commands/stats.rb

Co-authored-by: Cas Donoghue <[email protected]>

elasticmachine · 2025-12-11T09:02:01Z

💚 Build Succeeded

Buildkite Build
Commit: ebd0814

History

💚 Build #3931 succeeded aa029b9
💛 Build #3917 was flaky 8f2702c
💔 Build #3912 failed c22c384
💛 Build #3911 was flaky e083753

cc @andsel

Updates stats API response to expose also 1m, 5m and 15m average batc…

e083753

…h metrics

andsel self-assigned this Dec 4, 2025

andsel changed the title ~~Updates stats API response to expose also 1m, 5m and 15m average batc…~~ Exposes average batch metrics at 1, 5 and 15 minutes time window. Dec 4, 2025

andsel added the enhancement label Dec 4, 2025

andsel added 2 commits December 4, 2025 17:04

[Test] Added DSL definition to verify structure of the time windows

f865a53

[Doc] Updated sample in openapi definition to include the time windows

c22c384

andsel marked this pull request as draft December 4, 2025 16:51

[Test] Revomed verification of batch metric fields that in fast test …

8f2702c

…can't be created (1 minute and more intervals)

andsel marked this pull request as ready for review December 5, 2025 10:53

donoghuc reviewed Dec 5, 2025

View reviewed changes

docs/static/spec/openapi/logstash-api.yaml Outdated Show resolved Hide resolved

donoghuc reviewed Dec 5, 2025

View reviewed changes

logstash-core/lib/logstash/api/commands/stats.rb Show resolved Hide resolved

donoghuc requested changes Dec 5, 2025

View reviewed changes

andsel added 2 commits December 9, 2025 15:40

Simplified code to iterate over the time windows while constructing A…

7a9bd66

…PI response.

Fixed doc typos in describing API response

aa029b9

andsel requested a review from donoghuc December 9, 2025 15:49

donoghuc reviewed Dec 9, 2025

View reviewed changes

donoghuc approved these changes Dec 9, 2025

View reviewed changes

donoghuc reviewed Dec 10, 2025

View reviewed changes

logstash-core/lib/logstash/api/commands/stats.rb Outdated Show resolved Hide resolved

Better wording for code comment

ebd0814

Co-authored-by: Cas Donoghue <[email protected]>

andsel merged commit e0acfe7 into elastic:main Dec 11, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

andsel commented Dec 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

andsel commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

donoghuc left a comment

Uh oh!

andsel commented Dec 9, 2025

Uh oh!

donoghuc Dec 9, 2025

Uh oh!

andsel Dec 10, 2025

Uh oh!

andsel Dec 10, 2025

Uh oh!

donoghuc left a comment

Uh oh!

Uh oh!

elasticmachine commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	Stud.try(max_retry.times, [StandardError, RSpec::Expectations::ExpectationNotMetError]) do
	# node_stats can fail if the stats subsystem isn't ready
	result = logstash_service.monitoring_api.node_stats rescue nil
	expect(result).not_to be_nil
	# we use fetch here since we want failed fetches to raise an exception
	# and trigger the retry block
	batch_stats = result.fetch("pipelines").fetch(pipeline_id).fetch("batch")
	expect(batch_stats).not_to be_nil

	expect(batch_stats["event_count"]).not_to be_nil
	expect(batch_stats["event_count"]["average"]).not_to be_nil
	expect(batch_stats["event_count"]["average"]["lifetime"]).not_to be_nil
	expect(batch_stats["event_count"]["average"]["lifetime"]).to be_a_kind_of(Numeric)
	expect(batch_stats["event_count"]["average"]["lifetime"]).to be > 0

	expect(batch_stats["event_count"]["current"]).not_to be_nil
	expect(batch_stats["event_count"]["current"]).to be >= 0

	expect(batch_stats["byte_size"]).not_to be_nil
	expect(batch_stats["byte_size"]["average"]).not_to be_nil
	expect(batch_stats["byte_size"]["average"]["lifetime"]).not_to be_nil
	expect(batch_stats["byte_size"]["average"]["lifetime"]).to be_a_kind_of(Numeric)
	expect(batch_stats["byte_size"]["average"]["lifetime"]).to be > 0

	expect(batch_stats["byte_size"]["current"]).not_to be_nil
	expect(batch_stats["byte_size"]["current"]).to be >= 0
	end
	end
	end

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

Exposes average batch metrics at 1, 5 and 15 minutes time window. #18460

Conversation

andsel commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Uh oh!

github-actions bot commented Dec 4, 2025

🤖 GitHub comments

Uh oh!

mergify bot commented Dec 4, 2025

Uh oh!

andsel commented Dec 4, 2025

Uh oh!

Uh oh!

Uh oh!

donoghuc left a comment

Choose a reason for hiding this comment

Uh oh!

andsel commented Dec 9, 2025

Uh oh!

donoghuc Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

andsel Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

andsel Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

donoghuc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

elasticmachine commented Dec 11, 2025

💚 Build Succeeded

History

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andsel commented Dec 4, 2025 •

edited

Loading