feat(minor): Support thresholds update for `range` and `relative` thresholds [4/4] #331

MahdiBM · 2025-07-15T19:40:22Z

Description

Only the last 2 commits are related to this PR. The other commit is for PR [3/4]. We'll have to merge these in order.

Partially resolves #327.

Usecase CI file: https://github.com/MahdiBM/swift-dns/blob/mmbm-try-new-bench-ci/.github/workflows/update-benchmark-thresholds.yml

Waiting on your feedback first. When changes are settled I can add the docs as well.

Feel free to review on your own schedule.

How Has This Been Tested?

Manually in my PRs. I'll be happy to add unit tests where you see appropriate as well.

Minimal checklist:

I have performed a self-review of my own code
I have added DocC code-level documentation for any public interfaces exported by the package
I have added unit and/or integration tests that prove my fix is effective or that my feature works

codecov · 2025-07-15T19:43:00Z

Codecov Report

❌ Patch coverage is 60.17897% with 534 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.30%. Comparing base (3db567f) to head (75ac53b).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
Sources/Benchmark/BenchmarkThresholds.swift	0.55%	180 Missing ⚠️
Sources/Benchmark/BenchmarkResult.swift	54.55%	80 Missing ⚠️
Sources/Benchmark/BenchmarkMetric+Defaults.swift	44.71%	47 Missing ⚠️
Sources/Benchmark/Benchmark.swift	44.26%	34 Missing ⚠️
...ark/MallocStats/MallocStatsProducer+jemalloc.swift	52.78%	34 Missing ⚠️
.../Benchmark/Benchmark+ConvenienceInitializers.swift	0.00%	32 Missing ⚠️
...urces/Benchmark/BenchmarkThresholds+Defaults.swift	21.95%	32 Missing ⚠️
Sources/Benchmark/OutputSuppressor.swift	0.00%	26 Missing ⚠️
Sources/Benchmark/BenchmarkRunner.swift	57.14%	21 Missing ⚠️
Sources/Benchmark/BenchmarkExecutor.swift	82.14%	15 Missing ⚠️
... and 7 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
- Coverage   69.48%   65.30%   -4.18%     
==========================================
  Files          33       33              
  Lines        3938     4331     +393     
==========================================
+ Hits         2736     2828      +92     
- Misses       1202     1503     +301

Files with missing lines	Coverage Δ
Sources/Benchmark/ARCStats/ARCStats.swift	`100.00% <100.00%> (ø)`
Sources/Benchmark/ARCStats/ARCStatsProducer.swift	`85.29% <100.00%> (+0.92%)`	⬆️
Sources/Benchmark/BenchmarkClock.swift	`30.14% <100.00%> (-3.58%)`	⬇️
Sources/Benchmark/Blackhole.swift	`25.00% <ø> (ø)`
Sources/Benchmark/MallocStats/MallocStats.swift	`100.00% <100.00%> (ø)`
Sources/Benchmark/Progress/ProgressElements.swift	`92.54% <100.00%> (+0.11%)`	⬆️
Tests/BenchmarkTests/BenchmarkMetricsTests.swift	`97.75% <100.00%> (-1.12%)`	⬇️
Tests/BenchmarkTests/BenchmarkResultTests.swift	`100.00% <100.00%> (ø)`
Tests/BenchmarkTests/BenchmarkRunnerTests.swift	`92.68% <100.00%> (+0.38%)`	⬆️
Tests/BenchmarkTests/BenchmarkTests.swift	`97.37% <100.00%> (+0.31%)`	⬆️
... and 19 more

Files with missing lines	Coverage Δ
Sources/Benchmark/ARCStats/ARCStats.swift	`100.00% <100.00%> (ø)`
Sources/Benchmark/ARCStats/ARCStatsProducer.swift	`85.29% <100.00%> (+0.92%)`	⬆️
Sources/Benchmark/BenchmarkClock.swift	`30.14% <100.00%> (-3.58%)`	⬇️
Sources/Benchmark/Blackhole.swift	`25.00% <ø> (ø)`
Sources/Benchmark/MallocStats/MallocStats.swift	`100.00% <100.00%> (ø)`
Sources/Benchmark/Progress/ProgressElements.swift	`92.54% <100.00%> (+0.11%)`	⬆️
Tests/BenchmarkTests/BenchmarkMetricsTests.swift	`97.75% <100.00%> (-1.12%)`	⬇️
Tests/BenchmarkTests/BenchmarkResultTests.swift	`100.00% <100.00%> (ø)`
Tests/BenchmarkTests/BenchmarkRunnerTests.swift	`92.68% <100.00%> (+0.38%)`	⬆️
Tests/BenchmarkTests/BenchmarkTests.swift	`97.37% <100.00%> (+0.31%)`	⬆️
... and 19 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc7e44d...75ac53b. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MahdiBM · 2025-07-15T19:43:03Z

Plugins/BenchmarkCommandPlugin/BenchmarkCommandPlugin.swift

+        if !skipLoadingBenchmarksFlagIsValid {
+            print("")
+            print(
+                "Flag --skip-loading-benchmark-targets is only valid for 'thresholds check' operations."


As discussed in #327 and in other older issues, the only reason you might need to build benchmark targets in thresholds check command is to load the threshold-tolerances that are in code in configuration.thresholds.
For users that have moved to using static threshold files with range/relative, this is pointless and only a waste of time.
So I've added the option to skip building benchmark targets for thresholds check.

Checking the benchmark runs, this has reduced the "compare" step's runtime in CI from 30 to 15s, so a 15s saving.
The package itself is very lightweight. Only that it has 2 benchmark targets to build, which isn't abnormal but does mean 2 targets need to get built.

In a pretty chunky project that step takes 68s for 1 benchmark target. I'd assume at least 30s of that will be shaved off with this change.

That CI takes 18 minutes though. These are all nice savings but aren't a big deal.

MahdiBM · 2025-07-15T19:45:56Z

Plugins/BenchmarkTool/BenchmarkTool+Export.swift

+                    for values in results {
+                        outputResults[values.metric] = .absolute(
+                            Int(values.statistics.histogram.valueAtPercentile(90.0))
+                        )
+                    }


Old path of just replacing the static threshold files with the new benchmark results.

If we have a runNumber of 1 (We need to run the benchmarks multiple times for range/relative so it can be more than 1) or if the user has simply not requested any relative/range tolerances, then we do what we always did.

MahdiBM · 2025-07-15T19:46:19Z

Plugins/BenchmarkTool/BenchmarkTool+Export.swift

+                    /// If it's not the first run and any of relative/range are specified, then
+                    /// merge the new results with the existing thresholds.


Informational comments.

MahdiBM · 2025-07-15T19:48:48Z

Sources/Benchmark/BenchmarkThresholds.swift

+                if !self.checkValid() {
+                    print(
+                        """
+                        Warning: Got invalid relative threshold values. base: \(self.base), tolerancePercentage: \(self.tolerancePercentage).
+                        These must satisfy the following conditions:
+                        - base must be non-negative
+                        - tolerancePercentage must be finite
+                        - tolerancePercentage must be non-negative
+                        - tolerancePercentage must be less than or equal to 100
+                        """
+                    )
+                }


Less preconditions, more warnings. As mentioned in the other PRs, preconditions are hard to debug unless you attach lldb. In CI they'll look like silent failures and users might think this package is buggy.

MahdiBM · 2025-07-15T19:49:09Z

Sources/Benchmark/BenchmarkThresholds.swift

+                let diff = Double(value - base)
+                let deviation = (base == 0) ? 0 : (diff / Double(base) * 100)


because we accept 0s now too.

MahdiBM · 2025-08-18T10:06:15Z

For the record i can't come up with a reason why someone would like to have both "range" and "relative" thresholds together in the static threshold files in the same object. We can remove that and make it "absolute" OR "range" OR "relative" instead of "absolute" OR "range and/or relative" (by "or" here I don't mean it in a logical sense of and/or).

graphite-app · 2025-11-13T11:17:45Z

Plugins/BenchmarkCommandPlugin/BenchmarkCommandPlugin.swift

+        let results: [Result<Void, Error>] = (0..<max(totalRunCount, 1))
+            .map { runIdx in


This code creates an interim array using Array(0..<max(totalRunCount, 1)).map instead of using direct iteration. This violates the anti-pattern rule that states to avoid creating interim arrays when a direct loop can be used instead. The code should be refactored to use a standard for loop like 'for runIdx in 0..<max(totalRunCount, 1) { ... }' to avoid the unnecessary array creation.

Suggested change

let results: [Result<Void, Error>] = (0..<max(totalRunCount, 1))

.map { runIdx in

var results: [Result<Void, Error>] = []

for runIdx in 0..<max(totalRunCount, 1) {

Spotted by Graphite Agent (based on custom rule: Avoid iteration anti pattern for arrays)

Is this helpful? React 👍 or 👎 to let us know.

…ptions-bak' into mmbm-range-relative-thresholds-options-bak

graphite-app · 2025-12-03T16:13:59Z

Plugins/BenchmarkTool/BenchmarkTool+Export+InfluxCSVFormatter.swift

+            let dataTypeHeader =
+                "#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime\n"


This line is longer than 140 characters and uses a string literal. According to the API Design Guidelines, when lines are more than 140 characters and use a string literal, you should suggest a line broken version using triple-quote ("""...""") Swift string literals instead. The dataTypeHeader string should be broken into multiple lines using triple-quote syntax for better readability.

Suggested change

let dataTypeHeader =

"#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime\n"

let dataTypeHeader = """

#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime

"""

Spotted by Graphite Agent (based on custom rule: Swift API Guidelines)

Is this helpful? React 👍 or 👎 to let us know.

graphite-app · 2025-12-03T16:14:02Z

Plugins/BenchmarkCommandPlugin/BenchmarkCommandPlugin.swift

+                return Result<Void, Error> {
+                    try withCStrings(args) { cArgs in
+                        /// We'll decrement this in the success path
+                        allFailureCount += 1
+
+                        if debug > 0 {
+                            print("To debug, start \(benchmarkToolName) in LLDB using:")
+                            print("lldb \(benchmarkTool.string)")
+                            print("")
+                            print("Then launch \(benchmarkToolName) with:")
+                            print("run \(args.dropFirst().joined(separator: " "))")
+                            print("")
+                            return
+                        }
+
+                        var pid: pid_t = 0
+                        var status = posix_spawn(&pid, benchmarkTool.string, nil, nil, cArgs, environ)
+
+                        if status == 0 {
+                            if waitpid(pid, &status, 0) != -1 {
+                                // Ok, this sucks, but there is no way to get a C support target for plugins and
+                                // the way the status is extracted portably is with macros - so we just need to
+                                // reimplement the logic here in Swift according to the waitpid man page to
+                                // get some nicer feedback on failure reason.
+                                guard let waitStatus = ExitCode(rawValue: (status & 0xFF00) >> 8) else {
+                                    print("One or more benchmarks returned an unexpected return code \(status)")
+                                    throw MyError.benchmarkUnexpectedReturnCode
+                                }
+                                switch waitStatus {
+                                case .success:
+                                    allFailureCount -= 1


Logic bug with allFailureCount when debug mode is enabled. The counter is incremented at line 524 before each operation, then decremented at line 551 only on success. However, when debug > 0 (line 526), the function returns early at line 533 without decrementing the counter. This causes allFailureCount to be greater than 0 even though the Result is .success. Later at line 585, the code will incorrectly think there were failures and attempt to find the first failure (line 593-605), which will throw MyError.unknownFailure because no actual failures exist in the results array.

// Fix: Decrement counter before returning in debug mode if debug > 0 { allFailureCount -= 1 // Add this line print("To debug, start \(benchmarkToolName) in LLDB using:") // ... rest of debug output return }

Suggested change

return Result<Void, Error> {

try withCStrings(args) { cArgs in

/// We'll decrement this in the success path

allFailureCount += 1

if debug > 0 {

print("To debug, start \(benchmarkToolName) in LLDB using:")

print("lldb \(benchmarkTool.string)")

print("")

print("Then launch \(benchmarkToolName) with:")

print("run \(args.dropFirst().joined(separator: " "))")

print("")

return

}

var pid: pid_t = 0

var status = posix_spawn(&pid, benchmarkTool.string, nil, nil, cArgs, environ)

if status == 0 {

if waitpid(pid, &status, 0) != -1 {

// Ok, this sucks, but there is no way to get a C support target for plugins and

// the way the status is extracted portably is with macros - so we just need to

// reimplement the logic here in Swift according to the waitpid man page to

// get some nicer feedback on failure reason.

guard let waitStatus = ExitCode(rawValue: (status & 0xFF00) >> 8) else {

print("One or more benchmarks returned an unexpected return code \(status)")

throw MyError.benchmarkUnexpectedReturnCode

}

switch waitStatus {

case .success:

allFailureCount -= 1

return Result<Void, Error> {

try withCStrings(args) { cArgs in

/// We'll decrement this in the success path

allFailureCount += 1

if debug > 0 {

allFailureCount -= 1

print("To debug, start \(benchmarkToolName) in LLDB using:")

print("lldb \(benchmarkTool.string)")

print("")

print("Then launch \(benchmarkToolName) with:")

print("run \(args.dropFirst().joined(separator: " "))")

print("")

return

}

var pid: pid_t = 0

var status = posix_spawn(&pid, benchmarkTool.string, nil, nil, cArgs, environ)

if status == 0 {

if waitpid(pid, &status, 0) != -1 {

// Ok, this sucks, but there is no way to get a C support target for plugins and

// the way the status is extracted portably is with macros - so we just need to

// reimplement the logic here in Swift according to the waitpid man page to

// get some nicer feedback on failure reason.

guard let waitStatus = ExitCode(rawValue: (status & 0xFF00) >> 8) else {

print("One or more benchmarks returned an unexpected return code \(status)")

throw MyError.benchmarkUnexpectedReturnCode

}

switch waitStatus {

case .success:

allFailureCount -= 1

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

MahdiBM commented Jul 15, 2025

View reviewed changes

MahdiBM mentioned this pull request Jul 15, 2025

Support a Range as threshold-tolerance #327

Open

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch 23 times, most recently from 4303330 to f507fa2 Compare July 21, 2025 16:27

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch 4 times, most recently from 340009f to 1dabba7 Compare August 10, 2025 14:31

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch 2 times, most recently from c076d38 to 75ac53b Compare August 17, 2025 19:54

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 75ac53b to 0a95908 Compare August 26, 2025 16:15

graphite-app bot reviewed Nov 13, 2025

View reviewed changes

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 0a95908 to 58b1962 Compare December 3, 2025 15:50

MahdiBM added 4 commits December 3, 2025 19:29

Run swift-format

4f950d6

Implement relative and range thresholds from static files

e5d361c

Add --skip-loading-benchmark-targets flag

4729c14

Implement running 'benchmark thresholds update' multiple times

2e1b93c

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 58b1962 to 064e0c3 Compare December 3, 2025 16:04

Merge remote-tracking branch 'origin/mmbm-range-relative-thresholds-o…

89835de

…ptions-bak' into mmbm-range-relative-thresholds-options-bak

MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 064e0c3 to 89835de Compare December 3, 2025 16:05

graphite-app bot reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(minor): Support thresholds update for `range` and `relative` thresholds [4/4] #331

feat(minor): Support thresholds update for `range` and `relative` thresholds [4/4] #331

Uh oh!

MahdiBM commented Jul 15, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 15, 2025 •

edited

Loading

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM Jul 15, 2025

Uh oh!

MahdiBM commented Aug 18, 2025

Uh oh!

graphite-app bot Nov 13, 2025

Uh oh!

graphite-app bot Dec 3, 2025

Uh oh!

graphite-app bot Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

		/// If it's not the first run and any of relative/range are specified, then
		/// merge the new results with the existing thresholds.

		let diff = Double(value - base)
		let deviation = (base == 0) ? 0 : (diff / Double(base) * 100)

		let results: [Result<Void, Error>] = (0..<max(totalRunCount, 1))
		.map { runIdx in

		let dataTypeHeader =
		"#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime\n"

feat(minor): Support thresholds update for range and relative thresholds [4/4] #331

Are you sure you want to change the base?

feat(minor): Support thresholds update for range and relative thresholds [4/4] #331

Uh oh!

Conversation

MahdiBM commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Minimal checklist:

Uh oh!

codecov bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

MahdiBM commented Aug 18, 2025

Uh oh!

graphite-app bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

graphite-app bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

graphite-app bot Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

feat(minor): Support thresholds update for `range` and `relative` thresholds [4/4] #331

feat(minor): Support thresholds update for `range` and `relative` thresholds [4/4] #331

MahdiBM commented Jul 15, 2025 •

edited

Loading

codecov bot commented Jul 15, 2025 •

edited

Loading