Skip to content

Conversation

@MahdiBM
Copy link
Contributor

@MahdiBM MahdiBM commented Jul 15, 2025

Description

Only the last 2 commits are related to this PR. The other commit is for PR [3/4]. We'll have to merge these in order.

Partially resolves #327.

Usecase CI file: https://github.com/MahdiBM/swift-dns/blob/mmbm-try-new-bench-ci/.github/workflows/update-benchmark-thresholds.yml

Waiting on your feedback first. When changes are settled I can add the docs as well.

Feel free to review on your own schedule.

How Has This Been Tested?

Manually in my PRs. I'll be happy to add unit tests where you see appropriate as well.

Minimal checklist:

  • I have performed a self-review of my own code
  • I have added DocC code-level documentation for any public interfaces exported by the package
  • I have added unit and/or integration tests that prove my fix is effective or that my feature works

@codecov
Copy link

codecov bot commented Jul 15, 2025

Codecov Report

❌ Patch coverage is 60.17897% with 534 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.30%. Comparing base (3db567f) to head (75ac53b).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
Sources/Benchmark/BenchmarkThresholds.swift 0.55% 180 Missing ⚠️
Sources/Benchmark/BenchmarkResult.swift 54.55% 80 Missing ⚠️
Sources/Benchmark/BenchmarkMetric+Defaults.swift 44.71% 47 Missing ⚠️
Sources/Benchmark/Benchmark.swift 44.26% 34 Missing ⚠️
...ark/MallocStats/MallocStatsProducer+jemalloc.swift 52.78% 34 Missing ⚠️
.../Benchmark/Benchmark+ConvenienceInitializers.swift 0.00% 32 Missing ⚠️
...urces/Benchmark/BenchmarkThresholds+Defaults.swift 21.95% 32 Missing ⚠️
Sources/Benchmark/OutputSuppressor.swift 0.00% 26 Missing ⚠️
Sources/Benchmark/BenchmarkRunner.swift 57.14% 21 Missing ⚠️
Sources/Benchmark/BenchmarkExecutor.swift 82.14% 15 Missing ⚠️
... and 7 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #331      +/-   ##
==========================================
- Coverage   69.48%   65.30%   -4.18%     
==========================================
  Files          33       33              
  Lines        3938     4331     +393     
==========================================
+ Hits         2736     2828      +92     
- Misses       1202     1503     +301     
Files with missing lines Coverage Δ
Sources/Benchmark/ARCStats/ARCStats.swift 100.00% <100.00%> (ø)
Sources/Benchmark/ARCStats/ARCStatsProducer.swift 85.29% <100.00%> (+0.92%) ⬆️
Sources/Benchmark/BenchmarkClock.swift 30.14% <100.00%> (-3.58%) ⬇️
Sources/Benchmark/Blackhole.swift 25.00% <ø> (ø)
Sources/Benchmark/MallocStats/MallocStats.swift 100.00% <100.00%> (ø)
Sources/Benchmark/Progress/ProgressElements.swift 92.54% <100.00%> (+0.11%) ⬆️
Tests/BenchmarkTests/BenchmarkMetricsTests.swift 97.75% <100.00%> (-1.12%) ⬇️
Tests/BenchmarkTests/BenchmarkResultTests.swift 100.00% <100.00%> (ø)
Tests/BenchmarkTests/BenchmarkRunnerTests.swift 92.68% <100.00%> (+0.38%) ⬆️
Tests/BenchmarkTests/BenchmarkTests.swift 97.37% <100.00%> (+0.31%) ⬆️
... and 19 more
Files with missing lines Coverage Δ
Sources/Benchmark/ARCStats/ARCStats.swift 100.00% <100.00%> (ø)
Sources/Benchmark/ARCStats/ARCStatsProducer.swift 85.29% <100.00%> (+0.92%) ⬆️
Sources/Benchmark/BenchmarkClock.swift 30.14% <100.00%> (-3.58%) ⬇️
Sources/Benchmark/Blackhole.swift 25.00% <ø> (ø)
Sources/Benchmark/MallocStats/MallocStats.swift 100.00% <100.00%> (ø)
Sources/Benchmark/Progress/ProgressElements.swift 92.54% <100.00%> (+0.11%) ⬆️
Tests/BenchmarkTests/BenchmarkMetricsTests.swift 97.75% <100.00%> (-1.12%) ⬇️
Tests/BenchmarkTests/BenchmarkResultTests.swift 100.00% <100.00%> (ø)
Tests/BenchmarkTests/BenchmarkRunnerTests.swift 92.68% <100.00%> (+0.38%) ⬆️
Tests/BenchmarkTests/BenchmarkTests.swift 97.37% <100.00%> (+0.31%) ⬆️
... and 19 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc7e44d...75ac53b. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

if !skipLoadingBenchmarksFlagIsValid {
print("")
print(
"Flag --skip-loading-benchmark-targets is only valid for 'thresholds check' operations."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #327 and in other older issues, the only reason you might need to build benchmark targets in thresholds check command is to load the threshold-tolerances that are in code in configuration.thresholds.
For users that have moved to using static threshold files with range/relative, this is pointless and only a waste of time.
So I've added the option to skip building benchmark targets for thresholds check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the benchmark runs, this has reduced the "compare" step's runtime in CI from 30 to 15s, so a 15s saving.
The package itself is very lightweight. Only that it has 2 benchmark targets to build, which isn't abnormal but does mean 2 targets need to get built.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a pretty chunky project that step takes 68s for 1 benchmark target. I'd assume at least 30s of that will be shaved off with this change.

That CI takes 18 minutes though. These are all nice savings but aren't a big deal.

Comment on lines +289 to +295
for values in results {
outputResults[values.metric] = .absolute(
Int(values.statistics.histogram.valueAtPercentile(90.0))
)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old path of just replacing the static threshold files with the new benchmark results.

If we have a runNumber of 1 (We need to run the benchmarks multiple times for range/relative so it can be more than 1) or if the user has simply not requested any relative/range tolerances, then we do what we always did.

Comment on lines +296 to +299
/// If it's not the first run and any of relative/range are specified, then
/// merge the new results with the existing thresholds.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Informational comments.

Comment on lines +63 to +75
if !self.checkValid() {
print(
"""
Warning: Got invalid relative threshold values. base: \(self.base), tolerancePercentage: \(self.tolerancePercentage).
These must satisfy the following conditions:
- base must be non-negative
- tolerancePercentage must be finite
- tolerancePercentage must be non-negative
- tolerancePercentage must be less than or equal to 100
"""
)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less preconditions, more warnings. As mentioned in the other PRs, preconditions are hard to debug unless you attach lldb. In CI they'll look like silent failures and users might think this package is buggy.

Comment on lines 80 to 82
let diff = Double(value - base)
let deviation = (base == 0) ? 0 : (diff / Double(base) * 100)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we accept 0s now too.

@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch 23 times, most recently from 4303330 to f507fa2 Compare July 21, 2025 16:27
@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch 4 times, most recently from 340009f to 1dabba7 Compare August 10, 2025 14:31
@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch 2 times, most recently from c076d38 to 75ac53b Compare August 17, 2025 19:54
@MahdiBM
Copy link
Contributor Author

MahdiBM commented Aug 18, 2025

For the record i can't come up with a reason why someone would like to have both "range" and "relative" thresholds together in the static threshold files in the same object. We can remove that and make it "absolute" OR "range" OR "relative" instead of "absolute" OR "range and/or relative" (by "or" here I don't mean it in a logical sense of and/or).

@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 75ac53b to 0a95908 Compare August 26, 2025 16:15
Comment on lines +502 to +506
let results: [Result<Void, Error>] = (0..<max(totalRunCount, 1))
.map { runIdx in
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code creates an interim array using Array(0..<max(totalRunCount, 1)).map instead of using direct iteration. This violates the anti-pattern rule that states to avoid creating interim arrays when a direct loop can be used instead. The code should be refactored to use a standard for loop like 'for runIdx in 0..<max(totalRunCount, 1) { ... }' to avoid the unnecessary array creation.

Suggested change
let results: [Result<Void, Error>] = (0..<max(totalRunCount, 1))
.map { runIdx in
var results: [Result<Void, Error>] = []
for runIdx in 0..<max(totalRunCount, 1) {

Spotted by Graphite Agent (based on custom rule: Avoid iteration anti pattern for arrays)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 0a95908 to 58b1962 Compare December 3, 2025 15:50
@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 58b1962 to 064e0c3 Compare December 3, 2025 16:04
…ptions-bak' into mmbm-range-relative-thresholds-options-bak
@MahdiBM MahdiBM force-pushed the mmbm-range-relative-thresholds-options branch from 064e0c3 to 89835de Compare December 3, 2025 16:05
Comment on lines +57 to +58
let dataTypeHeader =
"#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is longer than 140 characters and uses a string literal. According to the API Design Guidelines, when lines are more than 140 characters and use a string literal, you should suggest a line broken version using triple-quote ("""...""") Swift string literals instead. The dataTypeHeader string should be broken into multiple lines using triple-quote syntax for better readability.

Suggested change
let dataTypeHeader =
"#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime\n"
let dataTypeHeader = """
#datatype tag,tag,tag,tag,tag,tag,tag,tag,tag,double,double,double,long,long,dateTime
"""

Spotted by Graphite Agent (based on custom rule: Swift API Guidelines)

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +521 to +551
return Result<Void, Error> {
try withCStrings(args) { cArgs in
/// We'll decrement this in the success path
allFailureCount += 1

if debug > 0 {
print("To debug, start \(benchmarkToolName) in LLDB using:")
print("lldb \(benchmarkTool.string)")
print("")
print("Then launch \(benchmarkToolName) with:")
print("run \(args.dropFirst().joined(separator: " "))")
print("")
return
}

var pid: pid_t = 0
var status = posix_spawn(&pid, benchmarkTool.string, nil, nil, cArgs, environ)

if status == 0 {
if waitpid(pid, &status, 0) != -1 {
// Ok, this sucks, but there is no way to get a C support target for plugins and
// the way the status is extracted portably is with macros - so we just need to
// reimplement the logic here in Swift according to the waitpid man page to
// get some nicer feedback on failure reason.
guard let waitStatus = ExitCode(rawValue: (status & 0xFF00) >> 8) else {
print("One or more benchmarks returned an unexpected return code \(status)")
throw MyError.benchmarkUnexpectedReturnCode
}
switch waitStatus {
case .success:
allFailureCount -= 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic bug with allFailureCount when debug mode is enabled. The counter is incremented at line 524 before each operation, then decremented at line 551 only on success. However, when debug > 0 (line 526), the function returns early at line 533 without decrementing the counter. This causes allFailureCount to be greater than 0 even though the Result is .success. Later at line 585, the code will incorrectly think there were failures and attempt to find the first failure (line 593-605), which will throw MyError.unknownFailure because no actual failures exist in the results array.

// Fix: Decrement counter before returning in debug mode
if debug > 0 {
    allFailureCount -= 1  // Add this line
    print("To debug, start \(benchmarkToolName) in LLDB using:")
    // ... rest of debug output
    return
}
Suggested change
return Result<Void, Error> {
try withCStrings(args) { cArgs in
/// We'll decrement this in the success path
allFailureCount += 1
if debug > 0 {
print("To debug, start \(benchmarkToolName) in LLDB using:")
print("lldb \(benchmarkTool.string)")
print("")
print("Then launch \(benchmarkToolName) with:")
print("run \(args.dropFirst().joined(separator: " "))")
print("")
return
}
var pid: pid_t = 0
var status = posix_spawn(&pid, benchmarkTool.string, nil, nil, cArgs, environ)
if status == 0 {
if waitpid(pid, &status, 0) != -1 {
// Ok, this sucks, but there is no way to get a C support target for plugins and
// the way the status is extracted portably is with macros - so we just need to
// reimplement the logic here in Swift according to the waitpid man page to
// get some nicer feedback on failure reason.
guard let waitStatus = ExitCode(rawValue: (status & 0xFF00) >> 8) else {
print("One or more benchmarks returned an unexpected return code \(status)")
throw MyError.benchmarkUnexpectedReturnCode
}
switch waitStatus {
case .success:
allFailureCount -= 1
return Result<Void, Error> {
try withCStrings(args) { cArgs in
/// We'll decrement this in the success path
allFailureCount += 1
if debug > 0 {
allFailureCount -= 1
print("To debug, start \(benchmarkToolName) in LLDB using:")
print("lldb \(benchmarkTool.string)")
print("")
print("Then launch \(benchmarkToolName) with:")
print("run \(args.dropFirst().joined(separator: " "))")
print("")
return
}
var pid: pid_t = 0
var status = posix_spawn(&pid, benchmarkTool.string, nil, nil, cArgs, environ)
if status == 0 {
if waitpid(pid, &status, 0) != -1 {
// Ok, this sucks, but there is no way to get a C support target for plugins and
// the way the status is extracted portably is with macros - so we just need to
// reimplement the logic here in Swift according to the waitpid man page to
// get some nicer feedback on failure reason.
guard let waitStatus = ExitCode(rawValue: (status & 0xFF00) >> 8) else {
print("One or more benchmarks returned an unexpected return code \(status)")
throw MyError.benchmarkUnexpectedReturnCode
}
switch waitStatus {
case .success:
allFailureCount -= 1

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Support a Range as threshold-tolerance

1 participant