[scraperhelper] Can't run scrapers in parallel #13113

dehaansa · 2025-05-29T05:36:50Z

Component(s)

scraper/scraperhelper

Describe the issue you're reporting

As reported in this issue for the sqlqueryreceiver, the scraperhelper's controller always runs scrapers in series. In the case of the sqlqueryreceiver this means that each query is run sequentially rather than leveraging the connection pool options and running in parallel.

I think there are a few options for how to improve the behavior.

No change to scraperherlper. To get the benefits of a connection pool for example, that logic will need to be embedded inside a single scraper instead of the current pattern in the sqlqueryreceiver of one scraper per query.
Always parallelize scrapers. This might have issues in some cases, if scrapers could potentially conflict with each other. However, it feels like what I would expect the behavior to be as a user especially if the scraperhelper package continues to evolve (some examples in Scraper feedback, and what to do next? #11238)
Configurable parameter in the scraper controller to run scrapers in parallel. The benefits of parallel without potentially breaking some existing uses of the package. I think parallel should be the default, but if we don't want to change existing behavior it could be opt-in.
Configurable parameter(s) in the scraper definition to define if an individual scraper should be run in parallel. This feels excessive to me, but allows for the case where some scrapers must be run exclusive of each other. We could get deep in the weeds here with marking dependencies/conflicts and dividing sets of scrapers that can run in parallel, etc if that's something we want to support.

I'm in favor of changing the behavior to always parallelize (1), however existing uses of the scraper packages will need to be evaluated to be sure this is safe.

dehaansa · 2025-05-29T17:09:32Z

CC @bogdandrutu as you appear to have done recent work on the scraper packages

josepcorrea · 2025-05-30T11:04:57Z

One of the main reasons we believe this is a bug is that the current sequential execution of scrapers can break the expected collection_interval behavior.

For example, if the collection_interval is set to 3 minutes and one SQL query takes 5 minutes to complete, the following queries won’t start until that one finishes. As a result, even lightweight queries that should run every 3 minutes might actually be delayed significantly, leading to inaccurate or outdated metrics.

This behavior defeats the purpose of defining a consistent scrape interval and can be especially problematic in environments where some queries are much heavier than others.

josepcorrea · 2025-06-02T07:29:54Z

Additionally, the max_open_conn property seems somewhat meaningless in this context, since with sequential execution, there is never more than one connection used at a time. This defeats the purpose of tuning connection pool limits for performance.

It's also worth noting that max_open_conn was mentioned as part of a fix in a related issue: open-telemetry/opentelemetry-collector-contrib#39270 — however, it appears that the underlying sequential execution behavior still limits its effectiveness.

andrzej-stencel · 2025-06-03T07:20:00Z

That's correct, the scrapers are currently run sequentially for both logs and metrics. I agree parallel behavior would be desirable in certain circumstances, though not necessarily in all of them.

I'm in favor of 2. Make the controller programmatically configurable to run scrapers in parallel or sequentially. This way we can keep the current sequential behavior of existing scrapers and change the scrapers we wish to parallel, and/or create new scrapers with the behavior best for the scenario.

On the programmatic level, I'd rather either have no default and make it mandatory to choose parallel or sequential behavior, or have sequential as the default. Perhaps this could be a new ControllerOption like WithParallel?

dehaansa · 2025-06-06T04:50:01Z

I put together a POC here if anyone would like to review an implementation of 2 with serialized as default, going to evaluate in contrib tomorrow.

dehaansa linked a pull request Jun 6, 2025 that will close this issue

[scraperhelper] Allow scraper controller to be configured to run scrapers in parallel #13167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[scraperhelper] Can't run scrapers in parallel #13113

[scraperhelper] Can't run scrapers in parallel #13113

dehaansa commented May 29, 2025 •

edited

Loading

dehaansa commented May 29, 2025

Uh oh!

josepcorrea commented May 30, 2025

Uh oh!

josepcorrea commented Jun 2, 2025

Uh oh!

andrzej-stencel commented Jun 3, 2025

Uh oh!

dehaansa commented Jun 6, 2025 •

edited

Loading

Uh oh!

[scraperhelper] Can't run scrapers in parallel #13113

[scraperhelper] Can't run scrapers in parallel #13113

Comments

dehaansa commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Component(s)

Describe the issue you're reporting

dehaansa commented May 29, 2025

Uh oh!

josepcorrea commented May 30, 2025

Uh oh!

josepcorrea commented Jun 2, 2025

Uh oh!

andrzej-stencel commented Jun 3, 2025

Uh oh!

dehaansa commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dehaansa commented May 29, 2025 •

edited

Loading

dehaansa commented Jun 6, 2025 •

edited

Loading