Skip to content

Conversation

@mcopik
Copy link
Collaborator

@mcopik mcopik commented Aug 9, 2022

The features that we're planning to implement here are:

  • Building C++ functions with CMake.
  • Building and handling functions' Docker containers for C++.
  • Providing base images with AWS SDK, Boost, and Redis support.
  • Test sleep benchmarks.

Summary by CodeRabbit

  • New Features

    • First-class C++ support: new C++ benchmarks, packaging/build flow, container images and installer; CLI/tooling now accepts "cpp".
    • New C++ runtime wrappers for storage, Redis, S3 and DynamoDB; per-invocation timing surfaced in results.
  • Refactor

    • Unified Language/Architecture enums across packaging and deployment flows.
  • Chores

    • Updated Docker build tooling, CI orb, docs and configs; safer storage stop and cache handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@mcopik mcopik mentioned this pull request Aug 9, 2022
6 tasks
@mcopik mcopik mentioned this pull request Apr 26, 2023
@coderabbitai
Copy link

coderabbitai bot commented Jun 18, 2025

Walkthrough

Adds first-class C++ support across benchmarks and tooling: C++ benchmark implementations and wrappers, an AWS C++ Lambda handler, C++ storage/Redis/DynamoDB wrappers, Dockerfiles and installer for C++ dependency images, a C++ dependency model, Language/Architecture enums, packaging/build refactors, and test/regression adjustments.

Changes

Cohort / File(s) Change Summary
Microbenchmark (sleep)
benchmarks/000.microbenchmarks/010.sleep/config.json, benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp
Enabled cpp in config and added a C++ implementation that sleeps for the requested seconds and returns JSON.
AWS C++ runtime & wrappers
benchmarks/wrappers/aws/cpp/handler.cpp, benchmarks/wrappers/aws/cpp/key-value.*, benchmarks/wrappers/aws/cpp/redis.*, benchmarks/wrappers/aws/cpp/storage.*, benchmarks/wrappers/aws/cpp/utils.*
New AWS Lambda C++ handler and wrappers (S3/DynamoDB/Redis) implementing timing, retries/backoff, streaming, cold-start/container metadata and helper time util.
Multimedia thumbnailer (C++)
benchmarks/200.multimedia/210.thumbnailer/cpp/*
New OpenCV C++ thumbnailer: image download, resize, encode/upload, UUID output naming and timing metrics.
Image-recognition (C++)
benchmarks/400.inference/411.image-recognition/cpp/*
Added LibTorch/OpenCV-based recognition helpers and partial C++ handler scaffolding (model loading, preprocessing, recognition functions).
Scientific benchmarks (C++)
benchmarks/500.scientific/501.graph-pagerank/cpp/*, benchmarks/500.scientific/503.graph-bfs/cpp/*
Added igraph-based C++ implementations for PageRank and BFS, returning timing/measurement JSON.
C++ support registration
config/systems.json, benchmarks/*/config.json
Registered cpp language in systems config, added cpp_dependencies to many benchmarks and listed C++ deployment files.
Dockerfiles & installer
dockerfiles/aws/cpp/Dockerfile.build, dockerfiles/aws/cpp/Dockerfile.function, dockerfiles/aws/cpp/Dockerfile.dependencies-*, dockerfiles/cpp_installer.sh, dockerfiles/aws/nodejs/Dockerfile.build
New multi-stage Dockerfiles for building C++ dependencies and function images, plus a C++ installer script; small change in Node.js build deps.
C++ dependency model
sebs/cpp_dependencies.py
New CppDependencies model and CppDependencyConfig with helpers to compute required deps, emit CMake snippets and generate Dockerfile fragments.
Core typing & enums
sebs/types.py
Added Language and Architecture enums with (de)serialization utilities used across packaging and runtimes.
Packaging & build refactors
sebs/benchmark.py, sebs/aws/aws.py, sebs/faas/container.py, sebs/faas/system.py, sebs/faas/function.py, sebs/azure/azure.py, sebs/gcp/gcp.py, sebs/openwhisk/openwhisk.py, sebs/local/local.py
Replaced string-based language handling with Language enum, added C++ packaging (CMake-based), changed package_code signatures/return shapes, threaded optional container_uri through create/update flows, and adjusted runtime resolution.
Cache & regression/test harness
sebs/cache.py, sebs/regression.py
Cache handling made aware of code_location vs container URI; regression extended with C++ benchmark list, AWSTestSequenceCpp and added per-test metadata (architecture, deployment_type, benchmark).
Tools & image builder
tools/build_docker_images.py
Added dependency-image build support, new CLI args (--type dependencies, --type-tag, --parallel), added cpp option and dependency build loops.
New experiment configs
config/cpp.json, config/nodejs.json, config/python.json
Added runtime/experiment configs for C++ and updated Node/Python experiment definitions.
Docs & CI
docs/storage.md, .circleci/config.yml, README.md, docs/build.md, docs/benchmarks.md
Storage docs updated, CircleCI Python orb bumped (→ 2.1), README Python requirement raised to 3.10+, build docs updated with C++ packaging and Docker image flow.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as Developer CLI
    participant SeBS as SeBS packager
    participant Builder as Docker builder
    participant Registry as Image registry
    participant Cloud as Cloud API
    participant Lambda as Managed runtime
    participant Handler as C++ Lambda handler
    participant Storage as S3/DynamoDB/Redis

    CLI->>SeBS: package/build --language cpp
    SeBS->>SeBS: generate CMakeLists.txt & deployment package
    SeBS->>Builder: build dependency images & function image
    Builder->>Registry: push images (optional)
    SeBS->>Cloud: create_function(runtime="provided.al2", package/container)
    CLI->>Lambda: invoke function
    Lambda->>Handler: start (cold-start metadata)
    Handler->>Handler: parse payload, call user C++ function
    Handler->>Storage: S3/DynamoDB/Redis operations
    Handler-->>Lambda: return JSON result with measurements
    Lambda->>CLI: response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Areas needing focused review:
    • sebs/cpp_dependencies.py: correctness of CMake fragments, runtime_paths and Dockerfile generation.
    • Dockerfiles (function + dependencies): multi-stage assembly, libraries (LibTorch/OpenCV/igraph/Boost/hiredis) and file copy/runtime paths.
    • sebs/types.py: enum (de)serialization and the Architecture.deserialize implementation (check for typos/undefined variables).
    • Packaging/signature changes in sebs/benchmark.py, sebs/faas/*, sebs/aws/aws.py: API shape changes, cache interactions and return-value expectations.
    • AWS C++ handler and wrappers: JSON shaping, timing/reporting, retry/backoff correctness and resource cleanup.
    • tools/build_docker_images.py: dependency build loops, tag composition and parallel/workers handling.
    • sebs/cache.py and regression/test harness: cache invariants, container vs code package semantics and AWSTestSequenceCpp registration.

Possibly related PRs

Poem

"I nibble CMake leaves at dawn,
Docker burrows hum and spawn,
Enums hop in, binaries compile,
Benchmarks tumble, tests run in style,
The rabbit hops — build done with a grin." 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.96% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding C++ benchmark support. It directly reflects the core objective and is clear without being overly verbose.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cpp_benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

🔭 Outside diff range comments (4)
benchmarks/wrappers/aws/cpp/redis.hpp (1)

1-27: Add include guards and improve class design.

The header file lacks include guards and has several design issues that should be addressed:

  1. Missing include guards: The header should have include guards to prevent multiple inclusions
  2. Constructor should be explicit: Single-parameter constructors should be explicit to prevent implicit conversions
  3. Missing const correctness: Methods that don't modify state should be const
  4. Raw pointer management: Consider RAII principles for redisContext*
+#ifndef REDIS_HPP
+#define REDIS_HPP
+
 #include <cstdint>
 #include <string>

 #include <aws/core/utils/memory/stl/AWSString.h>

 #include <hiredis/hiredis.h>

 class Redis
 {
   redisContext* _context;
 public:

-  Redis(std::string redis_hostname, int redis_port);
+  explicit Redis(const std::string& redis_hostname, int redis_port);
   ~Redis();

-  bool is_initialized();
+  bool is_initialized() const;

-  uint64_t download_file(Aws::String const &key, int &required_retries, bool with_backoff);
+  uint64_t download_file(const Aws::String& key, int& required_retries, bool with_backoff) const;

-  uint64_t upload_file(Aws::String const &key, int size, char* pBuf);
+  uint64_t upload_file(const Aws::String& key, int size, const char* pBuf);

-  uint64_t delete_file(std::string const &key);
+  uint64_t delete_file(const std::string& key) const;

 };

+#endif // REDIS_HPP
benchmarks/wrappers/aws/cpp/key-value.hpp (1)

1-32: Add include guards and improve interface design.

The header file has several areas for improvement:

  1. Missing include guards: Add them to prevent multiple inclusions
  2. Const correctness: Methods that don't modify state should be const
  3. Parameter const-correctness: Input parameters should be const references
  4. Comment about non-copyable/non-movable: Should be enforced with deleted copy/move operators
+#ifndef KEY_VALUE_HPP
+#define KEY_VALUE_HPP
+
 #include <cstdint>
 #include <string>
 #include <initializer_list>
 #include <memory>

 #include <aws/core/utils/memory/stl/AWSString.h>
 #include <aws/dynamodb/DynamoDBClient.h>

 class KeyValue
 {
-  // non-copyable, non-movable
   std::shared_ptr<Aws::DynamoDB::DynamoDBClient> _client;
 public:

   KeyValue();
+  
+  // non-copyable, non-movable
+  KeyValue(const KeyValue&) = delete;
+  KeyValue& operator=(const KeyValue&) = delete;
+  KeyValue(KeyValue&&) = delete;
+  KeyValue& operator=(KeyValue&&) = delete;

-uint64_t download_file(Aws::String const &bucket,
-                        Aws::String const &key,
+uint64_t download_file(const Aws::String& bucket,
+                        const Aws::String& key,
                         int& required_retries,
                         double& read_units,
-                        bool with_backoff = false);
+                        bool with_backoff = false) const;

-uint64_t upload_file(Aws::String const &bucket,
-                        Aws::String const &key,
+uint64_t upload_file(const Aws::String& bucket,
+                        const Aws::String& key,
                           double& write_units,
                           int size,
-                          unsigned char* pBuf);
+                          const unsigned char* pBuf);

 };

+#endif // KEY_VALUE_HPP
benchmarks/wrappers/aws/cpp/storage.hpp (1)

1-29: Add include guards and improve interface consistency.

The header file needs several improvements:

  1. Missing include guards: Add them for safety
  2. Interface consistency: Consider aligning parameter patterns with other wrapper classes
  3. Const correctness: Download method should be const
+#ifndef STORAGE_HPP
+#define STORAGE_HPP
+
 #include <cstdint>

 #include <aws/core/utils/memory/stl/AWSString.h>
 #include <aws/s3/S3Client.h>

 class Storage
 {
   Aws::S3::S3Client _client;
 public:

-  Storage(Aws::S3::S3Client && client):
+  explicit Storage(Aws::S3::S3Client&& client):
     _client(client)
   {}

   static Storage get_client();

-  uint64_t download_file(Aws::String const &bucket,
-                          Aws::String const &key,
+  uint64_t download_file(const Aws::String& bucket,
+                          const Aws::String& key,
                           int &required_retries,
-                          bool report_dl_time);
+                          bool report_dl_time) const;

-  uint64_t upload_random_file(Aws::String const &bucket,
-                          Aws::String const &key,
-                          int size, char* pBuf);
+  uint64_t upload_random_file(const Aws::String& bucket,
+                          const Aws::String& key,
+                          int size, const char* pBuf);

 };

+#endif // STORAGE_HPP
tools/build_docker_images.py (1)

51-151: CRITICAL: Resolve Git merge conflicts before merging.

The file contains unresolved Git merge conflict markers (<<<<<<<, =======, >>>>>>>) that cause syntax errors and prevent the code from running. These conflicts must be resolved manually.

The conflicts appear to be between:

  • HEAD version with dot notation: f"{args.type}.{args.type_tag}"
  • Branch a9f3c27 with hyphen notation: f"{args.type}-{args.type_tag}"

Please resolve these conflicts by choosing the appropriate format and removing all conflict markers.

🧹 Nitpick comments (18)
dockerfiles/aws/nodejs/Dockerfile.build (1)

5-5: Optimize yum layer by cleaning cache.

To slim down the final image, combine the install and cleanup steps into a single RUN and clear the yum cache:

-RUN yum install -y cmake curl libcurl libcurl-devel
+RUN yum install -y cmake curl libcurl libcurl-devel && \
+    yum clean all && \
+    rm -rf /var/cache/yum
dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1)

7-8: Consider adding error handling and cleanup.

The build process could benefit from explicit error handling and cleanup to ensure robust builds and smaller image sizes.

-RUN   git clone https://github.com/redis/hiredis.git && cd hiredis\
-      && PREFIX=/opt make -j${WORKERS} install
+RUN set -e && \
+    git clone --depth 1 --branch v1.2.0 https://github.com/redis/hiredis.git && \
+    cd hiredis && \
+    PREFIX=/opt make -j${WORKERS} install && \
+    cd .. && \
+    rm -rf hiredis
benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp (2)

7-7: Remove unused include.

The iostream header is included but never used in this file.

 #include <thread>
-#include <iostream>

16-17: Remove commented-out code.

Commented-out code should be removed to keep the codebase clean. If this alternative implementation is needed for reference, consider documenting it properly or moving it to documentation.

-  //std::string res_json = "{ \"result\": " + std::to_string(sleep) + "}";
-  //return aws::lambda_runtime::invocation_response::success(res_json, "application/json");
dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1)

8-10: Consider improving build robustness and cleanup.

The build process could benefit from better error handling and cleanup to reduce image size and improve reliability.

-RUN   git clone https://github.com/awslabs/aws-lambda-cpp.git\
-      && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/opt -B aws-lambda-cpp/build -S aws-lambda-cpp\
-      && cmake3 --build aws-lambda-cpp/build --parallel ${WORKERS} --target install
+RUN set -e && \
+    git clone --depth 1 --branch v0.2.8 https://github.com/awslabs/aws-lambda-cpp.git && \
+    cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/opt \
+           -B aws-lambda-cpp/build -S aws-lambda-cpp && \
+    cmake3 --build aws-lambda-cpp/build --parallel ${WORKERS} --target install && \
+    rm -rf aws-lambda-cpp
dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1)

9-9: Consider adding integrity verification and optimizing cmake configuration.

The cmake configuration looks appropriate, but consider adding integrity checks and optimization flags.

       && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;dynamodb;sqs" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/opt/ ..\
+      && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;dynamodb;sqs" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/opt/ -DCPP_STANDARD=17 -DCMAKE_CXX_FLAGS="-O3" ..\
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

13-15: Remove commented code.

The commented lines should be removed as they add confusion and don't serve any purpose in the final implementation.

-#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
-#      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
-#      && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
config/systems.json (1)

125-137: Consider explicit architecture specification for consistency.

The C++ configuration uses "all" for architecture while Python and Node.js specify explicit architectures ("x64", "arm64"). This inconsistency might cause confusion and could limit architecture-specific optimizations.

Consider updating to be consistent with other languages:

       "cpp": {
         "base_images": {
-          "all": "amazon/aws-lambda-provided:al2.2022.04.27.09"
+          "x64": "amazon/aws-lambda-provided:al2.2022.04.27.09",
+          "arm64": "amazon/aws-lambda-provided:al2.2022.04.27.09"
         },
         "dependencies": [
             "runtime", "sdk", "boost", "hiredis"
         ],
-        "versions": ["all"],
+        "versions": ["all"],
         "images": ["build"],
dockerfiles/aws/cpp/Dockerfile.build (1)

18-21: Optimize script copying and permission setting.

Consider combining COPY and chmod operations to reduce layers.

-COPY docker/entrypoint.sh /sebs/entrypoint.sh
-COPY docker/cpp_installer.sh /sebs/installer.sh
-RUN chmod +x /sebs/entrypoint.sh
-RUN chmod +x /sebs/installer.sh
+COPY --chmod=755 docker/entrypoint.sh /sebs/entrypoint.sh
+COPY --chmod=755 docker/cpp_installer.sh /sebs/installer.sh
benchmarks/wrappers/aws/cpp/redis.hpp (1)

11-11: Consider using smart pointers for automatic resource management.

The raw redisContext* pointer requires manual memory management. Consider using std::unique_ptr<redisContext, void(*)(redisContext*)> with a custom deleter for automatic cleanup.

+#include <memory>
+
 class Redis
 {
-  redisContext* _context;
+  std::unique_ptr<redisContext, void(*)(redisContext*)> _context;
 public:
benchmarks/wrappers/aws/cpp/storage.hpp (1)

18-26: Consider interface consistency across wrapper classes.

The method signatures differ from other wrapper classes (Redis, KeyValue). Consider standardizing parameter patterns for consistency, especially for retry handling and timing measurements.

For example, the Redis class uses int& required_retries and bool with_backoff, while this class uses bool report_dl_time. Consider aligning these interfaces for consistency across the AWS wrapper ecosystem.

benchmarks/wrappers/aws/cpp/key-value.cpp (1)

19-20: Consider making configuration more flexible.

The hardcoded CA file path and commented region configuration reduce flexibility across different environments.

Consider reading these from environment variables:

-  //config.region = "eu-central-1";
-  config.caFile = "/etc/pki/tls/certs/ca-bundle.crt";
+  const char* region = std::getenv("AWS_REGION");
+  if (region) {
+    config.region = region;
+  }
+  const char* caFile = std::getenv("AWS_CA_BUNDLE");
+  config.caFile = caFile ? caFile : "/etc/pki/tls/certs/ca-bundle.crt";
benchmarks/wrappers/aws/cpp/handler.cpp (1)

23-23: Fix typo in comment.

"Gateaway" should be "Gateway".

-  // HTTP trigger with API Gateaway sends payload as a serialized JSON
+  // HTTP trigger with API Gateway sends payload as a serialized JSON
benchmarks/wrappers/aws/cpp/storage.cpp (3)

46-50: Clarify the NOP operation purpose.

The current implementation to prevent compiler optimization is unusual. Consider a simpler approach or document why this specific pattern is needed.

             // Perform NOP on result to prevent optimizations
             std::stringstream ss;
             ss << s.rdbuf();
-            std::string first(" ");
-            ss.get(&first[0], 1);
+            // Simply access the stream to prevent optimization
+            volatile auto size = ss.tellp();
+            (void)size;  // Suppress unused variable warning

59-63: Remove commented code or document why it's preserved.

The commented backoff logic should either be removed or documented if it's intentionally kept for future use.

         } else {
             retries += 1;
-            //int sleep_time = retries;
-            //if (retries > 100) {
-            //    sleep_time = retries * 2;
-            //}
-            //std::this_thread::sleep_for(std::chrono::milliseconds(sleep_time));
+            // TODO: Consider adding backoff delay for failed downloads
         }

75-78: Simplify the comment about bufferstream usage.

The comment is quite long and could be more concise while still explaining the rationale.

-    /**
-     * We use Boost's bufferstream to wrap the array as an IOStream. Usign a light-weight streambuf wrapper, as many solutions 
-     * (e.g. https://stackoverflow.com/questions/13059091/creating-an-input-stream-from-constant-memory) on the internet
-     * suggest does not work because the S3 SDK relies on proper functioning tellp(), etc... (for instance to get the body length).
-     */
+    // Use Boost's bufferstream to wrap the buffer as an IOStream.
+    // The S3 SDK requires proper tellp() functionality, so simple streambuf wrappers don't work.
sebs/aws/aws.py (1)

671-677: Simplify the conditional structure.

The elif after return is unnecessary and can be simplified for better readability.

 def cloud_runtime(self, language: Language, language_version: str):
     if language in [Language.NODEJS, Language.PYTHON]:
         return ("{}{}".format(language, language_version),)
-    elif language == Language.CPP:
+    if language == Language.CPP:
         return "provided.al2"
-    else:
-        raise NotImplementedError()
+    raise NotImplementedError()
sebs/benchmark.py (1)

18-19: Clean up duplicate import.

Remove the duplicate Language import on line 19.

 from sebs.utils import find_benchmark, project_absolute_path, LoggingBase
-from sebs.types import BenchmarkModule, Language
 from sebs.types import Language
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d68834 and 9a96d72.

📒 Files selected for processing (30)
  • benchmarks/000.microbenchmarks/010.sleep/config.json (1 hunks)
  • benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/handler.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.hpp (1 hunks)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.build (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1 hunks)
  • dockerfiles/aws/nodejs/Dockerfile.build (1 hunks)
  • dockerfiles/cpp_installer.sh (1 hunks)
  • sebs.py (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/azure/azure.py (4 hunks)
  • sebs/benchmark.py (11 hunks)
  • sebs/faas/container.py (9 hunks)
  • sebs/faas/function.py (9 hunks)
  • sebs/faas/system.py (2 hunks)
  • sebs/gcp/gcp.py (4 hunks)
  • sebs/openwhisk/openwhisk.py (3 hunks)
  • sebs/types.py (2 hunks)
  • tools/build_docker_images.py (5 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
sebs/faas/container.py

[refactor] 143-143: Too many arguments (7/5)

(R0913)


[refactor] 143-143: Too many positional arguments (7/5)

(R0917)


[refactor] 143-143: Too many local variables (19/15)

(R0914)

tools/build_docker_images.py

[error] 51-51: Parsing failed: 'invalid syntax (build_docker_images, line 51)'

(E0001)

sebs/aws/aws.py

[refactor] 672-677: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

🪛 Ruff (0.11.9)
sebs/faas/function.py

251-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

tools/build_docker_images.py

51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


52-52: SyntaxError: Unexpected indentation


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-54: SyntaxError: Expected a statement


54-54: SyntaxError: Unexpected indentation


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected ',', found name


59-59: SyntaxError: Expected ',', found name


59-59: SyntaxError: Expected ',', found name


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


111-111: SyntaxError: Unexpected indentation


136-136: SyntaxError: unindent does not match any outer indentation level


136-136: SyntaxError: Expected a statement


136-137: SyntaxError: Expected a statement


137-137: SyntaxError: Unexpected indentation


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-139: SyntaxError: Expected a statement


140-140: SyntaxError: Unexpected indentation


149-149: SyntaxError: unindent does not match any outer indentation level


149-149: SyntaxError: Expected a statement


149-150: SyntaxError: Expected a statement


150-150: SyntaxError: Unexpected indentation


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected ',', found name


151-151: SyntaxError: Expected ',', found name


151-151: SyntaxError: Expected ',', found name


152-152: SyntaxError: Unexpected indentation


152-152: SyntaxError: Expected a statement


152-152: SyntaxError: Expected a statement


152-153: SyntaxError: Expected a statement


153-153: SyntaxError: Unexpected indentation

sebs/benchmark.py

19-19: Redefinition of unused Language from line 18

Remove definition: Language

(F811)

🔇 Additional comments (33)
dockerfiles/aws/nodejs/Dockerfile.build (1)

4-5: Update comment and review package dependencies for user management.

The # useradd, groupmod comment no longer matches the installed packages—you're now installing C++ build tools (cmake, curl, libcurl, libcurl-devel) but not shadow-utils (which provides useradd/groupmod). If you still rely on gosu to switch users, either re-add shadow-utils to the install line or remove/update the comment to reflect the new purpose.

sebs/types.py (1)

45-57: Well-structured Architecture enum implementation.

The Architecture enum with its serialize/deserialize methods follows good patterns and provides type safety for the codebase.

benchmarks/000.microbenchmarks/010.sleep/config.json (1)

4-4: LGTM - Consistent language addition.

The addition of "cpp" to the languages array is consistent with the existing configuration pattern and enables C++ benchmark support.

sebs.py (1)

67-67: LGTM - Correct CLI integration.

The addition of "cpp" to the language choices properly integrates C++ support into the command-line interface.

sebs/faas/system.py (1)

16-16: LGTM! Excellent type safety improvement.

The addition of the Language enum import and updating the method signature to use strongly-typed Language instead of string-based language_name significantly improves type safety and code maintainability. This change aligns well with the broader refactoring across the codebase.

Also applies to: 174-174

dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1)

9-9: Approve the CMake configuration choices.

The CMake configuration looks well-thought-out:

  • Release build type for optimized binaries
  • BUILD_SHARED_LIBS=OFF for static linking (appropriate for Lambda)
  • Custom install prefix to /opt for multi-stage builds
  • Parallel builds using ${WORKERS} for build efficiency
config/systems.json (1)

129-131: Verify dependency order and completeness.

The dependencies list looks comprehensive, but ensure the order matches any build dependencies (e.g., runtime before SDK).

#!/bin/bash
# Verify that all dependency Dockerfiles exist for the listed dependencies
echo "Checking for dependency Dockerfiles..."
for dep in runtime sdk boost hiredis; do
    dockerfile="dockerfiles/aws/cpp/Dockerfile.dependencies-${dep}"
    if [ -f "$dockerfile" ]; then
        echo "✓ Found: $dockerfile"
    else
        echo "✗ Missing: $dockerfile"
    fi
done
sebs/gcp/gcp.py (3)

27-27: Good refactoring for type safety.

The import of the Language enum improves type safety and consistency across the codebase.


142-147: Type-safe dictionary access implementation looks correct.

The refactoring properly updates both CONFIG_FILES and HANDLER dictionaries to use Language enum keys, and all access patterns use the typed language parameter.


128-148: I didn’t locate the Python Language enum – let’s search for its class definition and inspect it for a CPP member:

#!/bin/bash
# Locate and display the Python enum for Language
echo "Searching for Language enum in .py files..."
rg -n "class Language" -g "*.py"

echo -e "\nShowing definition and first 20 lines after declaration:"
file=$(rg -l "class Language" -g "*.py" | head -n1)
if [[ -n "$file" ]]; then
  start_line=$(grep -n "class Language" "$file" | head -n1 | cut -d: -f1)
  sed -n "${start_line},$((start_line+20))p" "$file"
else
  echo "❌ Language enum definition file not found."
fi
dockerfiles/aws/cpp/Dockerfile.build (2)

23-26: Dependency aggregation looks correct.

The sequential copying from all dependency stages properly aggregates the built libraries and headers into /opt.


30-31: ```shell
#!/bin/bash
echo "Locating Dockerfile.build files..."
fd --type f 'Dockerfile.build'

echo
echo "Printing lines 1–60 of dockerfiles/aws/cpp/Dockerfile.build:"
sed -n '1,60p' dockerfiles/aws/cpp/Dockerfile.build

echo
echo "Looking for any COPY commands or /sebs references:"
grep -R -n 'COPY|sebs' dockerfiles/aws/cpp/Dockerfile.build


</details>
<details>
<summary>sebs/openwhisk/openwhisk.py (3)</summary>

`21-21`: **Good addition of type safety with Language enum import.**

The import of `Language` enum enhances type safety and consistency across the codebase.

---

`100-100`: **Parameter type change improves type safety.**

Changing from `language_name: str` to `language: Language` provides compile-time type checking and prevents invalid language strings.

---

`111-111`: **Correct usage of language.value for backward compatibility.**

Using `language.value` maintains compatibility with downstream methods expecting string identifiers.

</details>
<details>
<summary>sebs/azure/azure.py (3)</summary>

`27-27`: **Good addition of Language enum import for type safety.**

The import enhances type consistency across cloud provider modules.

---

`122-122`: **Parameter type change improves type safety.**

The change from string to `Language` enum parameter provides better type checking.

---

`155-155`: **Dictionary lookup now uses proper enum type.**

Good use of the Language enum for dictionary access instead of string.

</details>
<details>
<summary>benchmarks/wrappers/aws/cpp/key-value.hpp (1)</summary>

`13-13`: **Good use of shared_ptr for AWS client management.**

Using `shared_ptr` for the DynamoDB client is appropriate as it handles the AWS SDK object lifecycle correctly.

</details>
<details>
<summary>benchmarks/wrappers/aws/cpp/storage.hpp (2)</summary>

`12-14`: **Good use of move semantics in constructor.**

The constructor correctly uses move semantics to efficiently transfer ownership of the S3Client, avoiding unnecessary copies.

---

`16-16`: **Static factory method follows good design patterns.**

The `get_client()` static factory method is a good design choice for encapsulating client creation logic.

</details>
<details>
<summary>sebs/faas/container.py (3)</summary>

`12-12`: **LGTM: Type-safe language handling.**

The import of the `Language` enum improves type safety and consistency across the codebase.

---

`143-151`: **LGTM: Method signature update for type safety.**

The change from `language_name: str` to `language: Language` parameter improves type safety and aligns with the broader refactoring to use strongly-typed language identifiers.

---

`164-164`: **LGTM: Consistent use of Language enum.**

The usage of `language.value` to extract the string representation is correct and consistent throughout the method.




Also applies to: 188-188, 201-201

</details>
<details>
<summary>sebs/aws/aws.py (3)</summary>

`25-25`: **LGTM: Import addition aligns with type safety improvements.**

The import of the `Language` enum from `sebs.types` supports the refactoring to use strongly-typed language identifiers throughout the codebase.

---

`121-121`: **LGTM: Parameter type improved with enum.**

Changing the parameter type from string to `Language` enum enhances type safety and prevents invalid language values.

---

`139-175`: ```bash
#!/bin/bash
set -eo pipefail

# 1. List all files under benchmarks/ to inspect their extensions
find benchmarks/ -type f

# 2. Search for any 'zip' commands or references to 'benchmark.zip' in those files
grep -R --include="*.sh" --include="Makefile" --include="*.py" -nE "zip|benchmark\.zip" benchmarks/ || true

# 3. Search for any build directory creations or references
grep -R -nE "mkdir.*build|cmake|make" benchmarks/ || true
sebs/faas/function.py (3)

12-12: LGTM: Centralized enum imports improve consistency.

Importing Language and Architecture from the shared sebs.types module enhances modularity and ensures consistent enum usage across the codebase.


221-227: LGTM: Enhanced JSON parsing for C++ triggers.

The enhanced parsing logic correctly handles AWS C++ trigger responses that may have nested JSON bodies, improving compatibility with the new C++ runtime.


290-292: LGTM: Consistent use of enum deserialization.

The update to use Language.deserialize() maintains consistency with the typed enum approach throughout the codebase.

sebs/benchmark.py (3)

7-7: LGTM: Appropriate import for CMake script formatting.

The textwrap import is correctly added to support the dedenting of the CMake script in the new C++ packaging method.


410-450: Well-structured C++ packaging method with future improvement opportunities.

The CMake script generation is well-implemented and correctly handles C++ Lambda runtime dependencies. The FIXME comments appropriately identify areas for future enhancement.

Consider addressing the FIXME comments in future iterations:

  • Generic configuration for different cloud providers
  • Optional dependency configuration
  • Dynamic dependency resolution

629-629: LGTM: Enhanced error logging for debugging.

Adding stderr decoding to the Docker build error logging provides better diagnostic information when builds fail.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
sebs/aws/aws.py (1)

676-682: Simplify the conditional structure.

The elif after return is unnecessary and can be simplified for better readability.

 def cloud_runtime(self, language: Language, language_version: str):
     if language in [Language.NODEJS, Language.PYTHON]:
         return ("{}{}".format(language, language_version),)
-    elif language == Language.CPP:
+    if language == Language.CPP:
         return "provided.al2"
-    else:
-        raise NotImplementedError()
+    raise NotImplementedError()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a96d72 and b5fbb12.

📒 Files selected for processing (5)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (16 hunks)
  • sebs/faas/container.py (6 hunks)
  • sebs/faas/function.py (3 hunks)
  • sebs/openwhisk/openwhisk.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • sebs/openwhisk/openwhisk.py
  • sebs/faas/function.py
🧰 Additional context used
🪛 Pylint (3.3.7)
sebs/aws/aws.py

[refactor] 677-682: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

sebs/faas/container.py

[refactor] 137-137: Too many arguments (7/5)

(R0913)


[refactor] 137-137: Too many positional arguments (7/5)

(R0917)


[refactor] 137-137: Too many local variables (19/15)

(R0914)

🔇 Additional comments (15)
sebs/faas/container.py (3)

12-12: LGTM: Good type safety improvement.

The addition of the Language enum import supports the transition from string-based language handling to a more type-safe approach.


102-106: LGTM: Improved code formatting.

The parameter formatting improves readability without changing functionality.


140-140: LGTM: Proper enum usage throughout the method.

The method correctly uses Language enum type for the parameter and language.value when string representation is needed for Docker operations and file paths.

Also applies to: 158-158, 181-181, 193-193

sebs/aws/aws.py (3)

25-25: LGTM: Type safety improvement.

Adding the Language enum import supports better type safety throughout the codebase.


121-121: LGTM: Proper enum adoption.

The method signature correctly uses the Language enum, and the CONFIG_FILES dictionary properly maps enum values to their respective configuration files.

Also applies to: 144-146


148-180: LGTM: Well-structured C++ packaging logic.

The conditional logic properly handles different languages. The C++ implementation assumes the build script creates a zip file in the build directory, which aligns with typical CMake build patterns for AWS Lambda.

sebs/benchmark.py (9)

7-7: LGTM: Required imports for C++ support.

The textwrap import is needed for the CMakeLists.txt generation, and the Language enum import supports the type safety improvements.

Also applies to: 18-18


30-30: LGTM: Proper type annotation.

The type annotation correctly uses the Language enum for better type safety.


180-180: LGTM: Consistent enum usage in hash operations.

The hash methods now properly use the Language enum, maintaining consistency with the broader refactoring.

Also applies to: 253-253


257-282: LGTM: Comprehensive C++ file support.

The file handling dictionaries properly include C++ file types (*.cpp, *.hpp, dependencies.json) and integrate them consistently with existing Python and Node.js support.


327-334: LGTM: Updated copy logic for C++ files.

The copy_code method correctly handles C++ files using the Language enum.


413-454: Well-implemented C++ CMakeLists.txt generation.

The CMakeLists.txt generation includes all necessary dependencies for AWS Lambda C++ runtime:

  • AWS Lambda runtime
  • Boost libraries
  • AWS SDK (S3, DynamoDB)
  • Hiredis for Redis support

The FIXME comments appropriately mark areas for future enhancement (generic deployment support, optional dependencies).


541-546: LGTM: C++ package file configuration.

Correctly identifies CMakeLists.txt as the package file for C++ builds, consistent with CMake build systems.


632-632: LGTM: Improved error logging.

Adding stderr output to error logging provides better debugging information when Docker builds fail.


643-643: LGTM: Updated method signature for type safety.

The build method now properly uses the Language enum in its callable parameter type, maintaining consistency with the broader refactoring.

Also applies to: 684-684

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 24

♻️ Duplicate comments (1)
benchmarks/wrappers/aws/cpp/storage.cpp (1)

16-24: Configure AWS region from environment variable.

The config.region is not set, which may cause the client to fall back to a default region. As noted in past reviews, the region should be configurable from the AWS_REGION environment variable for consistency.

Apply this diff to set the region from the environment:

 sebs::Storage sebs::Storage::get_client() {
   Aws::Client::ClientConfiguration config;
+  const char* region = std::getenv("AWS_REGION");
+  if (region) {
+    config.region = region;
+  }
+  config.caFile = "/etc/pki/tls/certs/ca-bundle.crt";
 
   char const TAG[] = "LAMBDA_ALLOC";
🧹 Nitpick comments (28)
benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp (1)

5-27: Consider adding input validation.

The function doesn't validate that in is non-empty or that width and height are positive. While OpenCV may handle some edge cases, explicit validation would catch issues earlier and provide clearer error messages.

Consider adding validation at the start of the function:

void thumbnailer(cv::Mat &in, int64_t width, int64_t height, cv::Mat &out)
{
  if (in.empty()) {
    throw std::invalid_argument("Input image is empty");
  }
  if (width <= 0 || height <= 0) {
    throw std::invalid_argument("Width and height must be positive");
  }
  // ... rest of implementation
}
sebs/cpp_dependencies.py (2)

56-61: Consider using a custom exception type.

The generic Exception works but a custom exception type would make error handling more specific for callers who want to catch C++ dependency errors separately.

Consider defining a custom exception:

+class CppDependencyError(Exception):
+    """Exception raised for unknown C++ dependencies."""
+    pass
+
 class CppDependencies(str, Enum):
     # ... existing code ...
     
     @staticmethod
     def deserialize(val: str) -> CppDependencies:
         for member in CppDependencies:
             if member.value == val:
                 return member
-        raise Exception(f"Unknown C++ dependency type {val}")
+        raise CppDependencyError(f"Unknown C++ dependency type {val}")

22-54: Cache the dependency dictionary to avoid recreating it.

The _dependency_dictionary method creates a new dictionary on each call. Since the configuration is static, this can be cached as a class attribute.

Apply this diff to cache the dictionary:

 class CppDependencies(str, Enum):
     """
     Enum for C++ dependencies used in the benchmarks.
     """
     TORCH = "torch"
     OPENCV = "opencv"
     IGRAPH = "igraph"
     BOOST = "boost"
     HIREDIS = "hiredis"
     
-    @staticmethod
-    def _dependency_dictionary() -> dict[str, CppDependencyConfig]:
-        return {
+    _DEPENDENCY_DICT: dict[str, CppDependencyConfig] = {
             CppDependencies.TORCH: CppDependencyConfig(
                 docker_img="dependencies-torch.aws.cpp.all",
                 cmake_package="Torch",
                 cmake_libs="${TORCH_LIBRARIES} -lm",
                 cmake_dir="${TORCH_INCLUDE_DIRS}"
             ),
             # ... rest of dictionary ...
         }
     
     @staticmethod
     def to_cmake_list(dependency: CppDependencies) -> str:
         """
         Returns the CMake target for the given C++ dependency.
         """
         if dependency not in CppDependencies:
             raise ValueError(f"Unknown C++ dependency {dependency}")
-        dependency_config = CppDependencies._dependency_dictionary()[dependency]
+        dependency_config = CppDependencies._DEPENDENCY_DICT[dependency]
         # ... rest of method ...
dockerfiles/aws/cpp/Dockerfile.dependencies-torch (1)

18-18: Add checksum verification for downloaded files.

The Dockerfile downloads LibTorch and torchvision without verifying checksums or signatures. Adding verification would protect against compromised downloads and ensure build reproducibility.

Consider adding SHA256 verification:

# For LibTorch
RUN curl -L -o torchlib.zip https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.7.0%2Bcpu.zip && \
    echo "EXPECTED_SHA256  torchlib.zip" | sha256sum -c -

# For torchvision
RUN curl -LO https://github.com/pytorch/vision/archive/refs/tags/v${TORCHVISION_VERSION}.zip && \
    echo "EXPECTED_SHA256  v${TORCHVISION_VERSION}.zip" | sha256sum -c -

Also applies to: 33-35

dockerfiles/aws/cpp/Dockerfile.dependencies-igraph (2)

10-15: Avoid redundant CMake installs; pick one source.

You install cmake3 via yum and then overlay a tarball. Keep only one to reduce build time and ambiguity on PATH.

Option A (use distro cmake3): remove the Kitware tarball block. Option B (use Kitware): drop cmake3 from yum list:

-    yum install -y cmake3 wget unzip curl git gcc gcc-c++ \
+    yum install -y wget unzip curl git gcc gcc-c++ \
                     make tar gzip libpng-devel zlib-devel \
                     libjpeg-turbo-devel python3-devel python3-pip && \

Also applies to: 17-21


5-6: Robust parallelism and toolchain hints.

Default WORKERS can be empty; prefer nproc, and export paths so builds can find /opt artifacts later.

Apply this diff:

-ARG WORKERS
-ENV WORKERS=${WORKERS}
+ARG WORKERS
+ENV WORKERS=${WORKERS}
@@
-    make -j${WORKERS} && \
+    make -j"$(test -n "$WORKERS" && echo "$WORKERS" || nproc)" && \
     make install
@@
 FROM ${BASE_IMAGE}
 
 WORKDIR /app
+ENV CMAKE_PREFIX_PATH=/opt \
+    PKG_CONFIG_PATH=/opt/lib64/pkgconfig:/opt/lib/pkgconfig \
+    LD_LIBRARY_PATH=/opt/lib64:/opt/lib:${LD_LIBRARY_PATH}
+# Optional on AL2:
+# RUN ldconfig

Also applies to: 34-35

benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (2)

30-31: Units mismatch: variables named “..._ms” hold microseconds.

Either rename or convert to milliseconds. Suggest converting to ms to match names.

Apply this diff:

-    graph_generation_time_ms = (timeSinceEpochMicrosec() - start_time);
+    graph_generation_time_ms = (timeSinceEpochMicrosec() - start_time) / 1000;
@@
-    compute_pr_time_ms = (timeSinceEpochMicrosec() - start_time);
+    compute_pr_time_ms = (timeSinceEpochMicrosec() - start_time) / 1000;

Also applies to: 41-42


4-6: Include for fabs and tighten error reporting.

Ensure fabs is declared; optional: print eigenvalue to aid debugging.

Apply this diff:

-#include <cfloat>
+#include <cfloat>
+#include <cmath>
@@
-  if (fabs(value - 1.0) > 32*DBL_EPSILON) {
-      fprintf(stderr, "PageRank failed to converge.\n");
+  if (std::fabs(value - 1.0) > 32*DBL_EPSILON) {
+      fprintf(stderr, "PageRank failed to converge, eigenvalue=%f\n", (double)value);
       igraph_vector_destroy(&pagerank);
       igraph_destroy(&graph);
       return 1;
   }

Also applies to: 43-47

config/systems.json (1)

126-133: Consider updating the base image and scoping heavy deps.

Base image tag al2.2022.04.27.09 is old; torch/opencv add significant size. If feasible, move “torch/opencv” into optional, per-benchmark dependencies to slim generic images.

If you want, I can propose a split of dependencies into “core” (runtime, sdk, boost, hiredis, igraph) and “extras” (opencv, torch) and update the dependency Docker build accordingly.

benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp (3)

15-24: Seed once; current code seeds twice when seed exists.

Simplify to set seed value then seed RNG once.

Apply this diff:

-    if (request.ValueExists("seed")) {
-        seed = request.GetInteger("seed");
-        igraph_rng_seed(igraph_rng_default(), seed);
-    } else {
+    if (request.ValueExists("seed")) {
+        seed = request.GetInteger("seed");
+    } else {
         std::random_device rd;
         seed = rd();
     }
-    igraph_rng_seed(igraph_rng_default(), seed);
+    igraph_rng_seed(igraph_rng_default(), seed);

41-62: Check igraph return codes and report errors.

Add error handling for igraph_bfs and barabasi to avoid silent failures.

Example:

-    igraph_bfs(
+    if (igraph_bfs(
         &graph,
         0,          // root vertex
         nullptr,    // roots
         IGRAPH_ALL, // neimode
         true,       // unreachable
         nullptr,    // restricted
         &order,     // order
         nullptr,    // rank
         nullptr,    // father
         nullptr,    // pred
         nullptr,    //succ,
         nullptr,    // dist
         nullptr,    // callback
-        nullptr     // extra
-    );
+        nullptr     // extra
+    ) != IGRAPH_SUCCESS) {
+        igraph_vector_int_destroy(&order);
+        igraph_destroy(&graph);
+        Aws::Utils::Json::JsonValue err;
+        err.WithString("error", "igraph_bfs failed");
+        return err;
+    }

73-79: Use a consistent key: “measurements” (plural) like other benchmarks.

Current key is “measurement”.

Apply this diff:

-    Aws::Utils::Json::JsonValue measurement;
-    measurement.WithInt64("graph_generating_time", graph_generating_time);
-    measurement.WithInt64("compute_time", process_time);
-    result.WithObject("measurement", std::move(measurement));
+    Aws::Utils::Json::JsonValue measurements;
+    measurements.WithInt64("graph_generating_time", graph_generating_time);
+    measurements.WithInt64("compute_time", process_time);
+    result.WithObject("measurements", std::move(measurements));
benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp (3)

51-53: Use vector and direct imdecode.

vector + cv::Mat(vectordata) is suboptimal and type-mismatched for OpenCV.

Apply this diff:

-  std::vector<char> vectordata(body_str.begin(), body_str.end());
-  cv::Mat image = imdecode(cv::Mat(vectordata), 1);
+  std::vector<unsigned char> vectordata(body_str.begin(), body_str.end());
+  cv::Mat image = cv::imdecode(vectordata, cv::IMREAD_COLOR);
+  if (image.empty()) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Failed to decode input image.");
+    return error;
+  }

88-93: Remove unused copy and rely on WithInt64 for metrics.

Avoid extra Aws::String allocation; use consistent 64-bit setters.

Apply this diff:

-  Aws::String upload_data(out_buffer.begin(), out_buffer.end());
-
   uint64_t upload_time = client_.upload_random_file(
       bucket_name, key_name, true,
       reinterpret_cast<char *>(out_buffer.data()), out_buffer.size());
@@
-  measurements.WithInteger("download_time", download_time);
-  measurements.WithInteger("upload_time", upload_time);
-  measurements.WithInteger("compute_time", computing_time);
-  measurements.WithInteger("download_size", vectordata.size());
-  measurements.WithInteger("upload_size", out_buffer.size());
+  measurements.WithInt64("download_time", download_time);
+  measurements.WithInt64("upload_time", upload_time);
+  measurements.WithInt64("compute_time", computing_time);
+  measurements.WithInt64("download_size", static_cast<long long>(vectordata.size()));
+  measurements.WithInt64("upload_size", static_cast<long long>(out_buffer.size()));

Also applies to: 94-99


63-65: Check imencode success and non-empty output.

Guard against failed encoding before upload.

Apply this diff:

-  cv::imencode(".jpg", image2, out_buffer);
+  if (!cv::imencode(".jpg", image2, out_buffer) || out_buffer.empty()) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Failed to encode thumbnail.");
+    return error;
+  }
benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp (3)

17-17: Unused storage client.

Either use it or remove to avoid warnings.


9-13: Drop unused includes.

fstream, random, vector appear unused; remove to speed up builds.


2-3: Build error: AWS headers not found.

Ensure AWS SDK dev headers are installed and include paths are set in CMake/Docker (e.g., aws-sdk-cpp with s3/core, target_link_libraries + target_include_directories).

Can you confirm the Docker image used here already installs aws-sdk-cpp and exposes headers for compilation?

benchmarks/wrappers/aws/cpp/storage.hpp (2)

12-13: Encapsulation: avoid public client field.

Make _client private and expose a client() accessor to prevent external mutation.

-class Storage
-{
-public:
-  Aws::S3::S3Client _client;
+class Storage {
+public:
+  const Aws::S3::S3Client& client() const { return _client; }
+  Aws::S3::S3Client& client() { return _client; }
+private:
+  Aws::S3::S3::S3Client _client;

2-2: Clang 'cstdint' not found.

Likely tool env issue; ensure proper C++ toolchain in the build image (libstdc++/libc++ headers).

benchmarks/400.inference/411.image-recognition/cpp/main.cpp (4)

56-65: Set eval mode after loading the model.

Use inference settings to avoid training-time layers behavior.

             module = torch::jit::load("./resnet50.pt");
+            module.eval();

78-86: Disable autograd during inference.

Wrap forward in no-grad.

-        torch::Tensor out_tensor = module.forward({input_tensor}).toTensor();
+        torch::NoGradGuard no_grad;
+        torch::Tensor out_tensor = module.forward({input_tensor}).toTensor();

111-116: Return a meaningful status until S3 I/O + inference path is wired.

Avoid returning an empty object.

-    Aws::Utils::Json::JsonValue val;
-    Aws::Utils::Json::JsonValue result;
-    Aws::Utils::Json::JsonValue measurements;
-
-    return val;
+    Aws::Utils::Json::JsonValue resp;
+    resp.WithString("status", "not_implemented");
+    return resp;

I can add S3 download (Storage::download_file), model caching to /tmp, and inference wiring next.


1-2: Build error: AWS headers not found.

Confirm the C++ base image installs aws-sdk-cpp and proper include paths.

sebs/aws/aws.py (1)

135-141: Confirm ECR build accepts Language enum.

If build_base_image expects a string, convert via language.value.

tools/build_docker_images.py (1)

102-117: Harden 'dependencies' branch against unknown language keys.

Add a defensive check to avoid KeyError.

-    elif args.type == "dependencies":
-        if args.language:
-            if "dependencies" in system_config["languages"][args.language]:
-                language_config = system_config["languages"][args.language]
+    elif args.type == "dependencies":
+        if args.language:
+            if args.language not in system_config["languages"]:
+                raise RuntimeError(f"Unknown language: {args.language}")
+            if "dependencies" in system_config["languages"][args.language]:
+                language_config = system_config["languages"][args.language]
benchmarks/wrappers/aws/cpp/storage.cpp (1)

95-95: Fix typo in comment.

Line 95 has a typo: "Usign" should be "Using".

-  * We use Boost's bufferstream to wrap the array as an IOStream. Usign a
+  * We use Boost's bufferstream to wrap the array as an IOStream. Using a
sebs/benchmark.py (1)

648-648: Consider using logging.exception for better error context.

When logging errors inside an exception handler, logging.exception() automatically includes the stack trace, providing more debugging context than logging.error().

-                    self.logging.error(f"Stderr: {e.stderr.decode()}")
+                    self.logging.exception(f"Package build failed. Stderr: {e.stderr.decode()}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef25201 and 5a1557d.

📒 Files selected for processing (21)
  • benchmarks/200.multimedia/210.thumbnailer/config.json (1 hunks)
  • benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp (1 hunks)
  • benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp (1 hunks)
  • benchmarks/400.inference/411.image-recognition/config.json (1 hunks)
  • benchmarks/400.inference/411.image-recognition/cpp/main.cpp (1 hunks)
  • benchmarks/500.scientific/501.graph-pagerank/config.json (1 hunks)
  • benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (1 hunks)
  • benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp (1 hunks)
  • benchmarks/500.scientific/503.graph-bfs/config.json (1 hunks)
  • benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.hpp (1 hunks)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.build (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-igraph (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-opencv (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-torch (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (16 hunks)
  • sebs/cpp_dependencies.py (1 hunks)
  • tools/build_docker_images.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dockerfiles/aws/cpp/Dockerfile.build
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
PR: spcl/serverless-benchmarks#99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
🧬 Code graph analysis (9)
benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp (2)
benchmarks/wrappers/aws/cpp/storage.cpp (2)
  • get_client (16-24)
  • get_client (16-16)
benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp (2)
  • thumbnailer (5-27)
  • thumbnailer (5-5)
benchmarks/wrappers/aws/cpp/storage.cpp (3)
benchmarks/wrappers/aws/cpp/storage.hpp (4)
  • Storage (14-16)
  • bucket (20-24)
  • bucket (34-35)
  • bucket (37-41)
benchmarks/wrappers/aws/cpp/key-value.cpp (2)
  • download_file (28-75)
  • download_file (28-30)
benchmarks/wrappers/aws/cpp/utils.cpp (2)
  • timeSinceEpochMicrosec (6-10)
  • timeSinceEpochMicrosec (6-6)
sebs/aws/aws.py (3)
sebs/types.py (1)
  • Language (25-35)
sebs/benchmark.py (3)
  • language (153-154)
  • language_version (161-162)
  • benchmark (101-102)
sebs/utils.py (1)
  • execute (43-53)
benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp (2)
benchmarks/wrappers/aws/cpp/storage.cpp (2)
  • get_client (16-24)
  • get_client (16-16)
benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (2)
  • graph_pagerank (8-55)
  • graph_pagerank (8-9)
sebs/benchmark.py (3)
sebs/cpp_dependencies.py (2)
  • CppDependencies (12-86)
  • to_cmake_list (64-86)
sebs/utils.py (2)
  • find_benchmark (144-147)
  • project_absolute_path (18-19)
sebs/types.py (2)
  • BenchmarkModule (5-7)
  • Language (25-35)
sebs/cpp_dependencies.py (1)
sebs/benchmark.py (1)
  • deserialize (68-79)
benchmarks/400.inference/411.image-recognition/cpp/main.cpp (1)
benchmarks/wrappers/aws/cpp/storage.cpp (2)
  • get_client (16-24)
  • get_client (16-16)
tools/build_docker_images.py (1)
sebs/config.py (1)
  • version (67-68)
benchmarks/wrappers/aws/cpp/storage.hpp (1)
benchmarks/wrappers/aws/cpp/storage.cpp (1)
  • client (22-22)
🪛 Clang (14.0.6)
benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp

[error] 2-2: 'aws/core/Aws.h' file not found

(clang-diagnostic-error)

benchmarks/wrappers/aws/cpp/storage.cpp

[error] 2-2: 'memory' file not found

(clang-diagnostic-error)

benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp

[error] 1-1: 'aws/core/utils/json/JsonSerializer.h' file not found

(clang-diagnostic-error)

benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp

[error] 2-2: 'igraph.h' file not found

(clang-diagnostic-error)

benchmarks/500.scientific/501.graph-pagerank/cpp/main.cpp

[error] 2-2: 'aws/core/Aws.h' file not found

(clang-diagnostic-error)

benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp

[error] 2-2: 'opencv2/opencv.hpp' file not found

(clang-diagnostic-error)

benchmarks/400.inference/411.image-recognition/cpp/main.cpp

[error] 1-1: 'aws/core/Aws.h' file not found

(clang-diagnostic-error)

benchmarks/wrappers/aws/cpp/storage.hpp

[error] 2-2: 'cstdint' file not found

(clang-diagnostic-error)

🪛 Ruff (0.14.1)
sebs/aws/aws.py

160-160: Function call with shell=True parameter identified, security issue

(S604)

sebs/benchmark.py

262-262: Probable use of insecure hash functions in hashlib: md5

(S324)


648-648: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

sebs/cpp_dependencies.py

61-61: Create your own exception

(TRY002)


61-61: Avoid specifying long messages outside the exception class

(TRY003)


69-69: Avoid specifying long messages outside the exception class

(TRY003)

tools/build_docker_images.py

117-117: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (14)
benchmarks/500.scientific/503.graph-bfs/config.json (1)

4-6: LGTM!

The addition of C++ language support and igraph dependency is correctly configured.

benchmarks/200.multimedia/210.thumbnailer/config.json (1)

4-6: LGTM!

The C++ language support with opencv and boost dependencies is properly configured.

benchmarks/400.inference/411.image-recognition/config.json (1)

4-6: LGTM!

C++ support with torch and opencv dependencies is correctly configured.

benchmarks/500.scientific/501.graph-pagerank/config.json (1)

4-6: LGTM!

C++ language support with igraph dependency is properly configured.

dockerfiles/aws/cpp/Dockerfile.dependencies-opencv (1)

8-10: Verify OpenCV version is appropriate.

OpenCV 4.5.0 was released in December 2020. Ensure this version meets security and functional requirements, or consider upgrading to a more recent release for potential bug fixes and improvements.

dockerfiles/aws/cpp/Dockerfile.dependencies-torch (1)

18-18: LibTorch and torchvision versions are outdated and warrant verification.

LibTorch 1.7.0 and torchvision 0.8.0 were both released on October 27, 2020, making them exactly 5 years old. The current stable release as of October 27, 2025 is PyTorch 2.9. While the review comment's factual assessment is correct, the appropriate action depends on your specific build and security requirements. Verify that these pinned versions satisfy current security standards and compatibility needs, or upgrade to a more recent release if feasible.

config/systems.json (2)

124-138: CPP architecture gating confirmed—implicit rejection via unsupported versions.

The toolchain does reject arm64 for CPP, but through implicit gating. When CPP+arm64 is attempted, supported_language_versions() in sebs/config.py returns an empty list (since config/systems.json defines CPP base_images only for x64), causing get_function() in sebs/faas/system.py to raise an exception before deployment. All invocation paths converge on this check, ensuring rejection happens.

However, the error message ("Unsupported version") is misleading—it doesn't clarify that the issue is missing architecture support. Consider adding explicit language-architecture compatibility validation with a clearer error message early in the deployment flow.


126-129: I need to clarify the registry path discrepancy before finalizing the response.

Correct the registry path inconsistency in the multi-OS image suggestion.

The AL2023 variant suggestion is valid for future-proofing, but the example has a critical flaw: the AL2 image uses amazon/aws-lambda-provided while the AL2023 image uses public.ecr.aws/lambda/provided. AWS official documentation recommends public.ecr.aws/lambda/provided as the canonical path. Both should use the same registry for consistency.

Corrected example:

"base_images": {
  "x64": {
    "all": {
      "al2": "public.ecr.aws/lambda/provided:al2",
      "al2023": "public.ecr.aws/lambda/provided:al2023"
    }
  }
}
benchmarks/wrappers/aws/cpp/storage.cpp (1)

64-82: LGTM!

This overload has proper error handling with logging and returns appropriate failure indicators.

sebs/benchmark.py (5)

13-14: LGTM!

The import of CppDependencies is necessary for the C++ benchmark support being added.


28-40: LGTM!

The integration of cpp_dependencies into BenchmarkConfig follows the existing pattern for other configuration fields and properly handles the optional nature of the field.

Also applies to: 66-79


260-289: Good refactor to use Language enum.

The migration from string-based language keys to the Language enum improves type safety and consistency across the codebase.

Note: The static analysis tool flags MD5 usage at line 262, but MD5 is appropriate here for cache invalidation purposes (non-cryptographic hashing).


557-562: LGTM!

The PACKAGE_FILES dictionary correctly maps C++ to CMakeLists.txt, following the same pattern as Python and Node.js.


742-747: LGTM!

The formatting adjustment improves readability by clearly showing self as the first parameter.

Comment on lines +7 to +27
try
{
// Calculate thumbnail size while maintaining aspect ratio
int orig_width = in.cols;
int orig_height = in.rows;

double scale_w = static_cast<double>(width) / orig_width;
double scale_h = static_cast<double>(height) / orig_height;
double scale = std::min(scale_w, scale_h); // Use smaller scale to fit within bounds

int new_width = static_cast<int>(orig_width * scale);
int new_height = static_cast<int>(orig_height * scale);

// Resize image (equivalent to PIL's thumbnail method)
cv::resize(in, out, cv::Size(new_width, new_height), cv::INTER_LINEAR);
}
catch (const cv::Exception &e)
{
std::cerr << "OpenCV error: " << e.what() << std::endl;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Rethrow exceptions to avoid swallowing errors.

The catch block logs the error but doesn't rethrow, leaving the out parameter in an undefined state. The caller has no way to detect that the operation failed, which could lead to silent failures downstream.

Apply this diff to rethrow the exception:

   catch (const cv::Exception &e)
   {
     std::cerr << "OpenCV error: " << e.what() << std::endl;
+    throw;
   }
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/210.thumbnailer/cpp/function.hpp around lines 7 to
27, the catch block logs OpenCV errors but swallows the exception leaving `out`
undefined and the caller unaware of failure; after logging the error, rethrow
the exception (use `throw;` to preserve the original exception) so the failure
propagates to the caller and they can handle it appropriately.

Comment on lines +31 to +34
auto image_name = request.GetObject("object").GetString("key");
auto width = request.GetObject("object").GetInteger("width");
auto height = request.GetObject("object").GetInteger("height");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate presence and types of object.key/width/height.

Avoid crashes when fields are missing or wrong types.

Apply this diff:

-  auto image_name = request.GetObject("object").GetString("key");
-  auto width = request.GetObject("object").GetInteger("width");
-  auto height = request.GetObject("object").GetInteger("height");
+  if (!request.ValueExists("object") ||
+      !request.GetObject("object").ValueExists("key") ||
+      !request.GetObject("object").ValueExists("width") ||
+      !request.GetObject("object").ValueExists("height")) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Missing object.key/width/height.");
+    return error;
+  }
+  auto image_name = request.GetObject("object").GetString("key");
+  auto width = request.GetObject("object").GetInteger("width");
+  auto height = request.GetObject("object").GetInteger("height");
+  if (width <= 0 || height <= 0) {
+    Aws::Utils::Json::JsonValue error;
+    error.WithString("error", "Width/height must be positive.");
+    return error;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto image_name = request.GetObject("object").GetString("key");
auto width = request.GetObject("object").GetInteger("width");
auto height = request.GetObject("object").GetInteger("height");
if (!request.ValueExists("object") ||
!request.GetObject("object").ValueExists("key") ||
!request.GetObject("object").ValueExists("width") ||
!request.GetObject("object").ValueExists("height")) {
Aws::Utils::Json::JsonValue error;
error.WithString("error", "Missing object.key/width/height.");
return error;
}
auto image_name = request.GetObject("object").GetString("key");
auto width = request.GetObject("object").GetInteger("width");
auto height = request.GetObject("object").GetInteger("height");
if (width <= 0 || height <= 0) {
Aws::Utils::Json::JsonValue error;
error.WithString("error", "Width/height must be positive.");
return error;
}
🤖 Prompt for AI Agents
In benchmarks/200.multimedia/210.thumbnailer/cpp/main.cpp around lines 31 to 34,
the code assumes request.GetObject("object") always contains "key", "width", and
"height" with correct types; add defensive checks to validate existence and
types before using them: verify the top-level "object" exists and is an object,
check that "key" exists and is a string, and that "width" and "height" exist and
are integers (or convertible), and if any check fails return or throw a
descriptive error (or set an error response) instead of proceeding; use the
appropriate API (HasMember/isString/isInt or equivalent) to test types and
extract values safely.

Comment on lines +37 to +43
int w = image.size().width, h = image.size().height;
cv::Size scale((int)256 * ((float)w) / h, 256);
cv::resize(image, image, scale);
w = image.size().width, h = image.size().height;
image = image(cv::Range(16, 240), cv::Range(80, 304));
image.convertTo(image, CV_32FC3, 1.0f / 255.0f);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unsafe fixed ROI crop; center-crop robustly after resizing the shorter side.

Current indices can go out of bounds (e.g., width 256). Replace with size-aware crop.

-    int w = image.size().width, h = image.size().height;
-    cv::Size scale((int)256 * ((float)w) / h, 256);
-    cv::resize(image, image, scale);
-    w = image.size().width, h = image.size().height;
-    image = image(cv::Range(16, 240), cv::Range(80, 304));
-    image.convertTo(image, CV_32FC3, 1.0f / 255.0f);
+    int w = image.cols, h = image.rows;
+    int short_side = std::min(w, h);
+    cv::Size scaled(
+        static_cast<int>(std::round(w * 256.0 / short_side)),
+        static_cast<int>(std::round(h * 256.0 / short_side)));
+    cv::resize(image, image, scaled);
+    w = image.cols; h = image.rows;
+    int x = std::max(0, std::min((w - kIMAGE_SIZE) / 2, w - kIMAGE_SIZE));
+    int y = std::max(0, std::min((h - kIMAGE_SIZE) / 2, h - kIMAGE_SIZE));
+    image = image(cv::Rect(x, y, kIMAGE_SIZE, kIMAGE_SIZE));
+    image.convertTo(image, CV_32FC3, 1.0f / 255.0f);

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +96 to +109
auto bucket_obj = request.GetObject("bucket");
if (!bucket_obj.IsObject())
{
Aws::Utils::Json::JsonValue error;
error.WithString("error", "Bucket object is not valid.");
return error;
}

auto bucket_name = bucket_obj.GetString("bucket");
auto input_prefix = bucket_obj.GetString("input");
auto model_prefix = bucket_obj.GetString("model");
auto key = request.GetObject("object").GetString("input");
auto model_key = request.GetObject("object").GetString("model");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate request schema before GetString() calls.

Guard against missing keys to prevent exceptions.

-    auto bucket_obj = request.GetObject("bucket");
-    if (!bucket_obj.IsObject())
+    if (!request.ValueExists("bucket") || !request.GetObject("bucket").IsObject())
     {
         Aws::Utils::Json::JsonValue error;
         error.WithString("error", "Bucket object is not valid.");
         return error;
     }
-
-    auto bucket_name = bucket_obj.GetString("bucket");
-    auto input_prefix = bucket_obj.GetString("input");
-    auto model_prefix = bucket_obj.GetString("model");
-    auto key = request.GetObject("object").GetString("input");
-    auto model_key = request.GetObject("object").GetString("model");
+    auto bucket_obj = request.GetObject("bucket");
+    if (!bucket_obj.ValueExists("bucket") || !bucket_obj.ValueExists("input") || !bucket_obj.ValueExists("model")
+        || !request.ValueExists("object")
+        || !request.GetObject("object").ValueExists("input")
+        || !request.GetObject("object").ValueExists("model")) {
+        Aws::Utils::Json::JsonValue error;
+        error.WithString("error", "Missing required keys: bucket.bucket/input/model and object.input/model.");
+        return error;
+    }
+    auto bucket_name = bucket_obj.GetString("bucket");
+    auto input_prefix = bucket_obj.GetString("input");
+    auto model_prefix = bucket_obj.GetString("model");
+    auto key = request.GetObject("object").GetString("input");
+    auto model_key = request.GetObject("object").GetString("model");
🤖 Prompt for AI Agents
In benchmarks/400.inference/411.image-recognition/cpp/main.cpp around lines 96
to 109, the code calls GetString() on JSON members without checking they exist
or are strings; add guards that verify bucket_obj has the keys "bucket",
"input", and "model" and that request.GetObject("object") has "input" and
"model", and confirm each member IsString()/HasMember() (or equivalent) before
calling GetString(); if any check fails, build and return an
Aws::Utils::Json::JsonValue error with a clear message indicating which key is
missing or of the wrong type.

Comment on lines +17 to +23
uint64_t start_time = timeSinceEpochMicrosec();
igraph_barabasi_game(
/* graph= */ &graph,
/* n= */ size,
/* power= */ 1,
/* m= */ 10,
/* outseq= */ NULL,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Boundary checks for Barabási–Albert parameters.

m=10 requires size > 10; for smaller graphs igraph may error. Guard inputs or scale m.

Apply this diff:

-  igraph_rng_seed(igraph_rng_default(), seed);
+  igraph_rng_seed(igraph_rng_default(), seed);
   {
     uint64_t start_time = timeSinceEpochMicrosec();
-    igraph_barabasi_game(
+    int m = size > 1 ? std::min(10, size - 1) : 1;
+    igraph_barabasi_game(
       /* graph=    */ &graph,
       /* n=        */ size,
       /* power=    */ 1,
-      /* m=        */ 10,
+      /* m=        */ m,

Also applies to: 29-35

🤖 Prompt for AI Agents
In benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp around lines
17-23 (and similarly at 29-35), the call uses m=10 unconditionally which will
fail when size <= 10; compute a safe m before calling igraph_barabasi_game
(e.g., cap m to size-1 and ensure it is at least 1: m = (size > 1) ? min(10,
(int)size - 1) : 1), then pass that m to igraph_barabasi_game; apply the same
defensive m calculation to the second occurrence at lines 29-35.

Comment on lines +158 to +166
# FIXME: use zipfile
# create zip with hidden directory but without parent directory
execute("zip -qu -r9 {}.zip * .".format(benchmark), shell=True, cwd=directory)
benchmark_archive = "{}.zip".format(os.path.join(directory, benchmark))
self.logging.info("Created {} archive".format(benchmark_archive))

bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
mbytes = bytes_size / 1024.0 / 1024.0
self.logging.info("Zip archive size {:2f} MB".format(mbytes))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid shell zip (S604) and fix archive sizing path.

Use zipfile; also benchmark_archive already has directory prefix.

-            # FIXME: use zipfile
-            # create zip with hidden directory but without parent directory
-            execute("zip -qu -r9 {}.zip * .".format(benchmark), shell=True, cwd=directory)
-            benchmark_archive = "{}.zip".format(os.path.join(directory, benchmark))
-            self.logging.info("Created {} archive".format(benchmark_archive))
-
-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            benchmark_archive = os.path.join(directory, f"{benchmark}.zip")
+            from zipfile import ZipFile, ZIP_DEFLATED
+            with ZipFile(benchmark_archive, "w", compression=ZIP_DEFLATED) as zf:
+                for entry in os.listdir(directory):
+                    if entry == os.path.basename(benchmark_archive):
+                        continue
+                    zf.write(os.path.join(directory, entry), arcname=entry)
+            self.logging.info("Created {} archive".format(benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)
             mbytes = bytes_size / 1024.0 / 1024.0
             self.logging.info("Zip archive size {:2f} MB".format(mbytes))
🧰 Tools
🪛 Ruff (0.14.1)

160-160: Function call with shell=True parameter identified, security issue

(S604)

🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 158-166, replace the shell zip call and
incorrect size path with Python's zipfile: open a ZipFile in write mode, walk
the target directory and add files (including hidden files) using
arcname=os.path.relpath(path, start=directory) to avoid including the parent
directory, and ensure you close the archive (use context manager). Set
benchmark_archive to the absolute path of the created .zip (do not prepend
directory twice) and compute bytes_size with os.path.getsize(benchmark_archive);
then log the created archive path and its size in MB.

Comment on lines +171 to +176
benchmark_archive = os.path.join(directory, "build", "benchmark.zip")
self.logging.info("Created {} archive".format(benchmark_archive))

bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
mbytes = bytes_size / 1024.0 / 1024.0
self.logging.info("Zip archive size {:2f} MB".format(mbytes))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix size path for C++ archive.

Don’t re-join directory; path already absolute.

-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)
🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 171 to 176, the code incorrectly calls
os.path.getsize(os.path.join(directory, benchmark_archive)) even though
benchmark_archive is already an absolute path; replace that double-join with
os.path.getsize(benchmark_archive) so the correct file is sized, then compute
mbytes and log as before.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sebs/benchmark.py (1)

492-503: C++ dependency installation wiring is correct; tighten error logging around volumes.

  • Mapping PACKAGE_FILES by Language and using CMakeLists.txt for C++ makes the Docker install step language‑agnostic and enables the CMake‑based C++ builds.
  • Logging the container’s stderr (e.stderr.decode()) is very helpful for diagnosing build failures, nice improvement.

One minor correctness issue: when docker_copy_build_files is True, volumes is never defined, but the exception handler still logs Docker mount volumes: {volumes}, which will raise UnboundLocalError and hide the real Docker error. Initializing volumes: Dict[str, Dict[str, str]] = {} before the if not check_flag block (or guarding the log) would fix this. You might also consider self.logging.exception("Package build failed!") to record the traceback alongside stderr.

Also applies to: 558-565, 647-651

♻️ Duplicate comments (2)
sebs/benchmark.py (2)

657-661: build callback now passes Language; ensure all callers’ signatures match.

Updating deployment_build_step’s type to accept Language and actually passing self.language is consistent with the enum migration and keeps the early‑return behavior for cached/container builds intact. This change is breaking for any caller that still expects a str language name; previous review comments already pointed out sebs/local/local.py in that context, so just double‑check all call sites have been updated.

Also applies to: 699-707, 726-731


420-471: C++ CMakeLists generation is reasonable, but fix the logging string and TODOs.

The generated CMakeLists correctly pulls in the AWS Lambda C++ runtime and appends per‑dependency snippets via CppDependencies.to_cmake_list, then writes a dedented script. Two follow‑ups:

  1. The logging uses a literal "{benchmark}" placeholder, so the benchmark name is never interpolated. Reuse the earlier suggestion to switch this to an f‑string:
-        self.logging.info(
-            "Dependencies for CPP benchmark {benchmark} are "
-            + str(len(self._benchmark_config._cpp_dependencies))
-            + " dependencies."
-        )
+        self.logging.info(
+            f"Dependencies for CPP benchmark {self.benchmark} are "
+            f"{len(self._benchmark_config._cpp_dependencies)} dependencies."
+        )
  1. The CMake fragment is AWS‑specific (aws-lambda-runtime, AWSSDK, PkgConfig/hiredis) and always links HIREDIS regardless of whether it appears in cpp_dependencies. That matches today’s “AWS‑only C++” reality but contradicts the # FIXME comments; when you generalize C++ to other platforms or want truly optional hiredis, this will need to be revisited.
🧹 Nitpick comments (6)
sebs/cpp_dependencies.py (4)

6-14: Consider using @dataclass for CppDependencyConfig.

CppDependencyConfig is a pure data holder; converting it to a @dataclass would remove boilerplate and make it clearer that it has no behavior.


27-58: Dependency dictionary and enum wiring look correct.

Enum values ("torch", "opencv", "igraph", "boost", "hiredis") match the docker image name prefixes and CMake package names, and _dependency_dictionary() wires them consistently. Rebuilding the dict on each call is fine for this tiny set; if this becomes hot, you could cache it in a class variable.


60-65: Use a more specific exception type for deserialize.

Raising a bare Exception here is functional but coarse. Prefer ValueError or a small custom exception (e.g., CppDependencyError) to make error handling clearer and to address the TRY002/TRY003 lints.


67-95: to_cmake_list generation is fine; membership check is slightly redundant.

The emitted CMake snippets (find_package, optional include dirs, link libraries) look correct and compatible with the configs. Given that the parameter is already typed as CppDependencies, the if dependency not in CppDependencies guard is defensive but rarely needed; you can keep it or drop it for brevity.

sebs/benchmark.py (2)

7-8: BenchmarkConfig C++ dependency plumbing looks good; consider unifying Language imports.

Adding cpp_dependencies with a None default and normalizing to an empty list is correct, and the deserialize path properly maps JSON "cpp_dependencies" to CppDependencies. One small cleanup: this file imports Language from sebs.types, but deserialize still re‑imports Language from sebs.faas.function; if those ever diverge, you’ll get subtle bugs. It would be cleaner to use the same Language source everywhere.

Also applies to: 13-14, 20-21, 27-41, 68-79


184-188: Language‑aware hashing and copy logic for C++ look consistent.

Passing self.language into hash_directory and keying FILES/WRAPPERS/copy rules by Language (including the C++ patterns and dependencies.json) correctly extends the hash and copy behavior to C++. MD5 is acceptable here since it’s only used for cache invalidation, but if you ever reuse this hash for any security‑sensitive purpose, consider switching to SHA‑256.

Also applies to: 260-289, 332-340

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1557d and 9fb0dd3.

📒 Files selected for processing (4)
  • config/systems.json (1 hunks)
  • sebs/benchmark.py (16 hunks)
  • sebs/cpp_dependencies.py (1 hunks)
  • sebs/gcp/gcp.py (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • config/systems.json
  • sebs/benchmark.py
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • config/systems.json
🧬 Code graph analysis (3)
sebs/gcp/gcp.py (2)
sebs/types.py (1)
  • Language (25-35)
sebs/benchmark.py (1)
  • language (153-154)
sebs/cpp_dependencies.py (1)
sebs/benchmark.py (1)
  • deserialize (68-79)
sebs/benchmark.py (2)
sebs/cpp_dependencies.py (3)
  • CppDependencies (16-95)
  • deserialize (61-65)
  • to_cmake_list (68-95)
sebs/types.py (1)
  • Language (25-35)
🪛 Ruff (0.14.6)
sebs/cpp_dependencies.py

65-65: Create your own exception

(TRY002)


65-65: Avoid specifying long messages outside the exception class

(TRY003)


73-73: Avoid specifying long messages outside the exception class

(TRY003)

sebs/benchmark.py

262-262: Probable use of insecure hash functions in hashlib: md5

(S324)


649-649: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (4)
config/systems.json (1)

124-138: AWS C++ language entry looks consistent with design (AWS‑only, x64‑only).

The new aws.languages.cpp block aligns with the rest of the schema and with the decision to support C++ only on AWS. Dependency names and deployment file list look coherent with the C++ wrappers and CMakeLists (handler, key‑value, storage, redis, utils; main.cpp stays benchmark‑specific). The only thing to keep in mind is that there’s no arm64 base image for C++, so higher‑level config/CLI should prevent selecting cpp with arm64 for AWS.

sebs/gcp/gcp.py (1)

27-28: Enum‑based language handling for GCP packaging looks correct.

Switching package_code to accept language: Language and keying CONFIG_FILES/HANDLER by Language.PYTHON/Language.NODEJS keeps behavior intact while improving type‑safety. The absence of a Language.CPP entry is appropriate since GCP doesn’t support C++ in this project; any accidental use would fail fast with a KeyError.

Also applies to: 125-148, 149-159

sebs/benchmark.py (2)

472-481: C++ branch in add_deployment_package is wired correctly.

Routing Language.CPP to add_deployment_package_cpp keeps the existing behavior for Python/NodeJS untouched while enabling C++ packaging. The else: raise NotImplementedError guard will still catch any future unsupported languages.


743-803: prepare_input signature and internals remain sound after refactor.

The new signature (self, system_resources, size, replace_existing=False) is clearer, and the storage/nosql handling logic is unchanged apart from minor formatting. Passing _benchmark_data_path, storage bucket info, and optional nosql_func into generate_input still matches the expected BenchmarkModuleInterface.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

7-12: Fix compiler mismatch and add checksum verification for the downloaded archive.

There are two critical issues:

  1. Compiler mismatch (line 10): The yum install on line 7 installs gcc-c++, but the code attempts to use gcc10-c++, which is not installed. The which gcc10-c++ command will fail and break the build. Either install gcc10-c++ explicitly via yum (if available), or use the installed gcc or g++ as a fallback.

  2. Missing integrity verification (line 8): The downloaded Boost archive lacks checksum verification, creating a supply-chain security risk. Verify the archive against a known SHA256 checksum before extraction.

Apply this diff to fix both issues:

-RUN curl -LO https://archives.boost.io/release/1.79.0/source/boost_1_79_0.tar.gz\
-      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0\
-      && echo "using gcc : : $(which gcc10-c++) ;"  >> tools/build/src/user-config.jam\
+RUN curl -LO https://archives.boost.io/release/1.79.0/source/boost_1_79_0.tar.gz\
+      && echo "d6eef4b4cacb2183f2bf265a5a03a354c3cebf82140bfe5b4ae831dcf72e4e8 boost_1_79_0.tar.gz" | sha256sum -c -\
+      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0\
+      && GCC_CMD=$(which g++) && echo "using gcc : : ${GCC_CMD} ;"  >> tools/build/src/user-config.jam\

This change:

  • Adds SHA256 checksum verification before extraction.
  • Replaces the hardcoded gcc10-c++ with a flexible lookup of the available g++ compiler.
🧹 Nitpick comments (2)
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

3-4: Set a default value for the WORKERS variable.

Line 4 sets WORKERS from the build argument, but without a default value it may be unset at runtime, potentially causing issues with the build parallelism in line 12. Provide a sensible default (e.g., 4 or based on CPU count).

 ARG WORKERS
-ENV WORKERS=${WORKERS}
+ENV WORKERS=${WORKERS:-4}
sebs/types.py (1)

48-54: Consider adding __str__ method for consistency with Language enum.

Language defines __str__ returning self.value, but Architecture only has serialize(). If both enums are used similarly (e.g., in string formatting or logging), adding __str__ would provide consistent behavior.

     def serialize(self) -> str:
         return self.value
+
+    def __str__(self) -> str:
+        return self.value
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fb0dd3 and c6d5e0e.

📒 Files selected for processing (2)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1 hunks)
  • sebs/types.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
🧬 Code graph analysis (1)
sebs/types.py (2)
sebs/benchmark.py (2)
  • deserialize (68-79)
  • serialize (291-292)
sebs/faas/function.py (13)
  • deserialize (37-40)
  • deserialize (53-56)
  • deserialize (71-74)
  • deserialize (113-116)
  • deserialize (159-167)
  • deserialize (267-268)
  • deserialize (281-282)
  • deserialize (312-316)
  • deserialize (395-396)
  • serialize (262-263)
  • serialize (277-278)
  • serialize (318-319)
  • serialize (382-391)
🪛 Ruff (0.14.7)
sebs/types.py

35-35: Create your own exception

(TRY002)


35-35: Avoid specifying long messages outside the exception class

(TRY003)


60-60: Create your own exception

(TRY002)


60-60: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
sebs/types.py (1)

25-38: LGTM!

The Language enum is well-implemented with proper deserialization logic and string representation. The deserialize method correctly uses val in the exception message for debugging purposes.

Comment on lines +13 to +15
#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
# && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
# && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove or document the commented-out alternative build path.

Lines 13–15 contain commented-out code that uses an alternative Boost download URL from boostorg.jfrog.io. Clarify the intent: either remove this dead code or add an explanatory comment documenting why the alternative approach is retained (e.g., for future fallback or testing).

-#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
-#      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
-#      && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
# && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
# && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
🤖 Prompt for AI Agents
In dockerfiles/aws/cpp/Dockerfile.dependencies-boost around lines 13 to 15,
there is a commented-out alternative Boost build block using boostorg.jfrog.io;
either delete these three commented lines to remove dead code or replace them
with a brief explanatory comment stating why the alternate URL/build is
preserved (e.g., fallback for network issues, historical reference, or testing),
including the Boost version and rationale for keeping it so future maintainers
understand intent.

Comment on lines +55 to +60
@staticmethod
def deserialize(val: str) -> Architecture:
for member in Architecture:
if member.value == val:
return member
raise Exception(f"Unknown architecture type {member}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix variable name in exception message.

The exception references member instead of val. After the loop completes, member will be Architecture.ARM (the last member), so the error message will incorrectly report "Unknown architecture type arm64" instead of the actual unknown input value.

     @staticmethod
     def deserialize(val: str) -> Architecture:
         for member in Architecture:
             if member.value == val:
                 return member
-        raise Exception(f"Unknown architecture type {member}")
+        raise Exception(f"Unknown architecture type {val}")
🧰 Tools
🪛 Ruff (0.14.7)

60-60: Create your own exception

(TRY002)


60-60: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In sebs/types.py around lines 55 to 60, the deserialize method raises an
Exception that interpolates the loop variable `member` instead of the input
`val`, causing incorrect error text; update the raise to reference `val` (e.g.,
raise Exception(f"Unknown architecture type {val}")) so the actual unknown input
is reported (optionally consider using ValueError for a more appropriate
exception type).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
sebs/aws/aws.py (3)

175-185: Critical: cloud_runtime still missing .value for enum string conversion.

Despite past review comments claiming this was fixed, lines 179 and 183 still don't use .value to extract the string from the Language enum. The f-string f"{language}{language_version}" will render as "Language.PYTHON3.10" instead of "python3.10".

Apply this diff to fix the runtime string construction:

         if language == Language.NODEJS:
-            return f"{language}{language_version}.x"
+            return f"{language.value}{language_version}.x"
         elif language == Language.CPP:
             return "provided.al2"
         elif language in [Language.PYTHON]:
-            return f"{language}{language_version}"
+            return f"{language.value}{language_version}"
         else:
             raise NotImplementedError()

146-163: Major: Shell injection risk and incorrect path handling remain unfixed.

Despite past review comments, multiple issues persist:

  1. Line 148: shell=True with untrusted input (S604) is a security risk
  2. Line 161: Double path join os.path.join(directory, benchmark_archive) where benchmark_archive already includes the directory prefix for C++ (line 158)

For the Python/Node.js path (lines 146-154), replace shell execution with Python's zipfile:

-            # FIXME: use zipfile
-            # create zip with hidden directory but without parent directory
-            execute("zip -qu -r9 {}.zip * .".format(benchmark), shell=True, cwd=directory)
-            benchmark_archive = "{}.zip".format(os.path.join(directory, benchmark))
+            benchmark_archive = os.path.join(directory, f"{benchmark}.zip")
+            from zipfile import ZipFile, ZIP_DEFLATED
+            with ZipFile(benchmark_archive, "w", compression=ZIP_DEFLATED) as zf:
+                for entry in os.listdir(directory):
+                    if entry != os.path.basename(benchmark_archive):
+                        entry_path = os.path.join(directory, entry)
+                        if os.path.isfile(entry_path):
+                            zf.write(entry_path, arcname=entry)
             self.logging.info("Created {} archive".format(benchmark_archive))
 
-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)

For the C++ path (lines 156-163), fix the duplicate path join:

         elif language == Language.CPP:
             # lambda C++ runtime build scripts create the .zip file in build directory
             benchmark_archive = os.path.join(directory, "build", "benchmark.zip")
             self.logging.info("Created {} archive".format(benchmark_archive))
 
-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)

136-144: Major: Self-move risk in function directory creation.

Lines 139-144 can attempt to move the function directory into itself, causing a crash on re-pack.

Apply this diff:

-            os.makedirs(function_dir)
+            os.makedirs(function_dir, exist_ok=True)
             # move all files to 'function' except handler.py
             for file in os.listdir(directory):
-                if file not in package_config:
+                if file not in package_config and file != "function":
                     file = os.path.join(directory, file)
                     shutil.move(file, function_dir)
sebs/benchmark.py (1)

422-469: Critical: Logging statement has syntax error.

Line 460-463 mixes f-string syntax with string concatenation incorrectly, causing a runtime error.

Apply this diff to fix:

         self.logging.info(
-            f"CPP benchmark {self.benchmark} has "
-            + str(len(self._benchmark_config._cpp_dependencies))
-            + " dependencies."
+            f"CPP benchmark {self.benchmark} has "
+            f"{len(self._benchmark_config._cpp_dependencies)} dependencies."
         )

Additionally, consider making the hardcoded source file list at line 429-431 configurable via the benchmark's config.json for flexibility with different file structures.

🧹 Nitpick comments (4)
sebs/cache.py (2)

382-395: Consider using explicit key checks instead of try/except for control flow.

Using exception handling for control flow is generally discouraged in Python. Consider using explicit checks:

-            package_exists = True
-            try:
-                config[deployment_name][language][main_key][key]
-            except KeyError:
-                package_exists = False
-
-                """
-                    We have no such cache entry - fallback.
-                    However, we still have directory, a possible leftover after crash.
-                    Whatever was left, we remove it since we have no information what is there.
-                """
-                if os.path.exists(cached_dir):
-                    shutil.rmtree(cached_dir)
+            package_exists = (
+                deployment_name in config
+                and language in config.get(deployment_name, {})
+                and main_key in config.get(deployment_name, {}).get(language, {})
+                and key in config.get(deployment_name, {}).get(language, {}).get(main_key, {})
+            )
+
+            if not package_exists:
+                """
+                    We have no such cache entry - fallback.
+                    However, we still have directory, a possible leftover after crash.
+                    Whatever was left, we remove it since we have no information what is there.
+                """
+                if os.path.exists(cached_dir):
+                    shutil.rmtree(cached_dir)

396-434: Logic is correct; consider extracting shared code copying logic.

The update path correctly handles both code packages and containers, with appropriate metadata updates. The logic for copying code directories/ZIP files (lines 398-416) is duplicated from add_code_package. While acceptable for this PR, consider extracting this to a private helper method in a follow-up to reduce duplication.

sebs/faas/container.py (1)

217-227: Improve error logging in exception handler.

The error handling correctly logs build failures, but per Ruff TRY400, use logging.exception instead of logging.error in the except block to automatically include stack traces.

Apply this diff:

         except docker.errors.BuildError as e:
-            self.logging.error("Docker build failed!")
+            self.logging.exception("Docker build failed!")
 
             for chunk in e.build_log:
                 if "stream" in chunk:
-                    self.logging.error(chunk["stream"])
+                    self.logging.exception(chunk["stream"])
-            raise e
+            raise
sebs/cpp_dependencies.py (1)

88-92: Optional: Consider custom exception class for consistency.

Ruff suggests creating a custom exception class instead of raising generic Exception with a message. While not critical, this would improve consistency with Python best practices.

Consider creating a custom exception:

+class UnknownDependencyError(ValueError):
+    """Raised when an unknown C++ dependency type is encountered."""
+    pass
+
 @staticmethod
 def deserialize(val: str) -> CppDependencies:
     for member in CppDependencies:
         if member.value == val:
             return member
-    raise Exception(f"Unknown C++ dependency type {val}")
+    raise UnknownDependencyError(val)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6d5e0e and 27e053d.

📒 Files selected for processing (9)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.function (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (19 hunks)
  • sebs/cache.py (2 hunks)
  • sebs/cpp_dependencies.py (1 hunks)
  • sebs/faas/container.py (9 hunks)
  • sebs/faas/system.py (7 hunks)
  • sebs/regression.py (6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • config/systems.json
  • sebs/regression.py
  • sebs/benchmark.py
  • dockerfiles/aws/cpp/Dockerfile.function
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • config/systems.json
  • sebs/regression.py
📚 Learning: 2025-06-19T16:15:01.811Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.

Applied to files:

  • sebs/regression.py
🧬 Code graph analysis (6)
sebs/regression.py (2)
sebs/sebs.py (2)
  • get_deployment (83-142)
  • logging_filename (40-41)
sebs/aws/aws.py (1)
  • initialize (82-94)
sebs/cache.py (1)
sebs/benchmark.py (7)
  • code_package (114-115)
  • code_location (122-132)
  • language_version (168-169)
  • architecture (184-185)
  • container_deployment (188-189)
  • language (160-161)
  • code_size (151-152)
sebs/cpp_dependencies.py (1)
sebs/benchmark.py (1)
  • deserialize (69-80)
sebs/faas/container.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/config.py (1)
  • benchmark_base_images (60-65)
sebs/aws/aws.py (4)
sebs/types.py (1)
  • Language (25-38)
sebs/faas/system.py (1)
  • container_client (71-72)
sebs/aws/container.py (1)
  • ECRContainer (13-79)
sebs/benchmark.py (4)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • benchmark (102-103)
sebs/faas/system.py (3)
sebs/benchmark.py (11)
  • benchmark (102-103)
  • Benchmark (96-963)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • is_cached (135-136)
  • is_cached (139-140)
  • container_deployment (188-189)
  • container_uri (155-157)
  • build (660-807)
  • code_location (122-132)
sebs/faas/container.py (1)
  • DockerContainer (16-236)
sebs/types.py (1)
  • Language (25-38)
🪛 Hadolint (2.14.0)
dockerfiles/aws/cpp/Dockerfile.function

[error] 6-6: unexpected '{'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

🪛 Ruff (0.14.7)
sebs/cpp_dependencies.py

92-92: Create your own exception

(TRY002)


92-92: Avoid specifying long messages outside the exception class

(TRY003)


100-100: Avoid specifying long messages outside the exception class

(TRY003)

sebs/faas/container.py

222-222: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


226-226: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


227-227: Use raise without specifying exception name

Remove exception name

(TRY201)

sebs/aws/aws.py

126-126: Unused method argument: language_version

(ARG002)


127-127: Unused method argument: architecture

(ARG002)


129-129: Unused method argument: is_cached

(ARG002)


148-148: Function call with shell=True parameter identified, security issue

(S604)

sebs/benchmark.py

268-268: Probable use of insecure hash functions in hashlib: md5

(S324)


652-652: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (17)
sebs/cache.py (1)

250-270: LGTM! Clean handling of container vs code package caching.

The conditional logic correctly differentiates between code packages (with a local code location) and container deployments (where code is pushed directly to a registry). Using relative paths for cached locations is a good practice for cache portability.

dockerfiles/aws/cpp/Dockerfile.function (1)

5-6: Static analysis warning is a false positive for template placeholders.

The Hadolint warning about unexpected { characters on line 6 is expected and can be safely ignored. The {{DEPENDENCY_IMAGES}} and {{DEPENDENCY_COPIES}} placeholders are intentional template markers that will be dynamically replaced by sebs/cpp_dependencies.py::CppDependencies.generate_dockerfile() during the build process. This templating approach enables selective inclusion of only the required C++ dependencies for each benchmark.

Also applies to: 20-21

config/systems.json (1)

125-144: LGTM! C++ language configuration is well-structured.

The C++ configuration for AWS is properly aligned with the broader PR changes:

  • Base images correctly reference Amazon Linux 2 Lambda runtime
  • Dependency list matches the CppDependencies enum (runtime, sdk, boost, hiredis, opencv, igraph, torch)
  • Deployment files comprehensively include all necessary C++ wrappers and headers

Based on learnings, C++ support is correctly scoped to AWS only.

sebs/regression.py (5)

31-37: C++ benchmark list properly initialized.

The CPP benchmark list includes appropriate benchmarks that have C++ implementations (sleep, thumbnailer, image-recognition, graph-pagerank, graph-bfs), consistent with the project's C++ support scope.


155-159: Good addition of per-test metadata attributes.

Storing test_architecture, test_deployment_type, and test_benchmark as function attributes enables precise filtering later in the test discovery phase. This is cleaner than parsing test names.


216-236: AWSTestSequenceCpp follows established patterns.

The implementation is consistent with AWSTestSequencePython and AWSTestSequenceNodejs, using the same lock-based initialization and logging approach.


528-536: Consistent use of test metadata for filtering.

The refactored filtering correctly uses the new test_architecture and test_deployment_type attributes instead of parsing test names, improving maintainability.


454-462: The cpp+package exclusion for 411.image-recognition is intentional and justified.

The benchmark's C++ dependencies (torch and opencv) are heavy compiled libraries. AWS Lambda package deployments have strict size limits (~500MB), while container deployments can scale to ~10GB. The Python package.sh script explicitly measures and manages package sizes, including conditional logic to strip and compress torch for specific Python versions (3.8/3.9), confirming that the developers actively address size constraints. The exclusion for cpp+package deployments prevents deployment failures due to package size limits while container deployments can accommodate the larger compiled libraries.

sebs/faas/container.py (1)

132-196: Enhanced build_base_image signature and custom Dockerfile support.

The updated signature correctly:

  1. Accepts Language enum instead of string for type safety
  2. Adds builder_image parameter for the build stage base
  3. Returns size in MB as third tuple element for metrics

The custom Dockerfile detection (lines 176-185) is essential for C++ benchmarks, which require dependency-specific Dockerfile generation via CppDependencies.generate_dockerfile().

sebs/faas/system.py (3)

70-72: Well-designed extension points for container builds.

The container_client property and finalize_container_build method provide clean hooks for platform-specific container build logic without breaking the base abstraction. AWS overrides container_client to return its ECRContainer, while other platforms return None.

Also applies to: 184-187


173-181: API signature consistently updated for Language enum.

The package_code signature change from language_name: str to language: Language strengthens type safety and aligns with the enum-based language handling throughout the codebase. The simplified return type Tuple[str, int] (removing container_uri) is cleaner since container builds are now handled separately.


318-323: Improved logging clarity for container vs package deployments.

The refactored logging (lines 318-323) clearly distinguishes between cached container functions (with container URI) and package functions (with code location), improving debugging experience.

sebs/cpp_dependencies.py (2)

154-199: Dockerfile generation logic correctly handles wildcards and dependencies.

The generate_dockerfile method properly:

  1. Computes required dependencies including core deps (SDK, RUNTIME, BOOST, HIREDIS)
  2. Generates FROM statements with stage aliases matching dependency values
  3. Handles both wildcard paths (line 188-190) and exact paths (line 192) for COPY statements
  4. Correctly splits wildcard paths to get destination directories

The wildcard handling is particularly important for libraries like libaws* that expand to multiple files.


94-122: CMake integration is well-structured.

The to_cmake_list method generates proper CMake commands:

  • find_package with REQUIRED flag
  • Conditional target_include_directories when cmake_dir is set
  • target_link_libraries with appropriate library variables

The use of CMake variables like ${AWSSDK_LINK_LIBRARIES} and ${TORCH_LIBRARIES} follows CMake best practices.

sebs/benchmark.py (3)

267-295: Language enum integration in hash computation is correct.

The hash_directory method properly:

  1. Accepts Language enum parameter for type safety
  2. Uses Language enum values as dictionary keys (lines 270-273, 275-278)
  3. Extracts string value with language.value for path construction (line 289)

This ensures hash consistency across the codebase.


660-807: Build method refactoring correctly separates container and package paths.

The refactored build logic properly:

  1. Returns 4-tuple (rebuilt, code_location, container_deployment, container_uri) for explicit path tracking
  2. Generates custom Dockerfiles for C++ container builds (lines 720-740)
  3. Calls container build via container_client.build_base_image (lines 742-750)
  4. Handles OpenWhisk's requirement for both container and package (lines 762-770)
  5. Falls back to package-only path for non-container deployments (lines 772-793)

The separation between container and package build paths is much cleaner than the previous implementation.


490-501: Good extraction of builder image name logic.

The new builder_image_name helper method reduces duplication between install_dependencies and build methods. Returning both versioned and unversioned names supports the fallback pattern at lines 532-545.

Comment on lines +404 to +409
if os.path.isdir(code_package.code_location):
cached_location = os.path.join(cached_dir, "code")
# could be replaced with dirs_exists_ok in copytree
# available in 3.8
shutil.rmtree(cached_location)
shutil.copytree(src=code_package.code_location, dst=cached_location)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential FileNotFoundError when cached_location doesn't exist.

shutil.rmtree(cached_location) is called unconditionally, but the directory may not exist (e.g., if the package was cached but the code directory was never created, or after a partial cleanup). This will raise FileNotFoundError.

Apply this diff to add an existence check:

                     if os.path.isdir(code_package.code_location):
                         cached_location = os.path.join(cached_dir, "code")
                         # could be replaced with dirs_exists_ok in copytree
                         # available in 3.8
-                        shutil.rmtree(cached_location)
+                        if os.path.exists(cached_location):
+                            shutil.rmtree(cached_location)
                         shutil.copytree(src=code_package.code_location, dst=cached_location)

Alternatively, if Python 3.8+ is the minimum supported version, you can use dirs_exist_ok=True as noted in the comment:

                     if os.path.isdir(code_package.code_location):
                         cached_location = os.path.join(cached_dir, "code")
-                        # could be replaced with dirs_exists_ok in copytree
-                        # available in 3.8
-                        shutil.rmtree(cached_location)
-                        shutil.copytree(src=code_package.code_location, dst=cached_location)
+                        shutil.copytree(src=code_package.code_location, dst=cached_location, dirs_exist_ok=True)
🤖 Prompt for AI Agents
In sebs/cache.py around lines 404 to 409, calling shutil.rmtree(cached_location)
unconditionally can raise FileNotFoundError when cached_location doesn't exist;
update the code to check for existence before removing (e.g., if
os.path.exists(cached_location): shutil.rmtree(cached_location)) or, if Python
3.8+ is guaranteed, use shutil.copytree(..., dirs_exist_ok=True) and remove the
unconditional rmtree call so the copy will overwrite/create as needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
sebs.py (2)

163-169: Use context manager for file reading.

The file is opened without a context manager, which could lead to resource leaks if an exception occurs during JSON parsing.

     if storage_configuration is not None:
 
         for cfg_f in storage_configuration:
             sebs_client.logging.info(f"Loading storage configuration from {cfg_f}")
 
-            cfg = json.load(open(cfg_f, 'r'))
+            with open(cfg_f, 'r') as f:
+                cfg = json.load(f)
             append_nested_dict(config_obj, ["deployment", deployment, "storage"], cfg)

360-372: Unused benchmark_input_size argument.

The benchmark_input_size CLI argument is captured but never used. Currently, the regression_suite hardcodes size="test" (line 110 in regression.py). Should this CLI argument override the test size?

Additionally, prefix unused unpacked variables with underscore to satisfy linters:

-def regression(benchmark_input_size, benchmark_name, storage_configuration, **kwargs):
+def regression(benchmark_input_size, benchmark_name, storage_configuration, **kwargs):
     # for regression, deployment client is initialized locally
     # disable default initialization
-    (config, output_dir, logging_filename, sebs_client, _) = parse_common_params(
+    (config, _output_dir, _logging_filename, sebs_client, _) = parse_common_params(
         initialize_deployment=False, storage_configuration=storage_configuration, **kwargs
     )
sebs/regression.py (1)

217-237: Consider extracting common AWS deployment logic.

The get_deployment method is nearly identical across AWSTestSequencePython, AWSTestSequenceNodejs, and now AWSTestSequenceCpp. The only difference is the class-level lock reference.

This follows the existing pattern in the codebase, so acceptable for this PR. A future refactor could introduce a base AWSTestSequenceBase class with a shared get_deployment implementation.

config/nodejs.json (1)

1-101: Node.js config looks good; consider deduplicating shared experiment schema

The Node.js invocation‑overhead config is consistent with the C++/Python configs and uses sensible defaults for arm64/Node 16. If you expect these three files to stay in lockstep, you might later factor the common experiment/deployment structure into a shared base and keep only language‑specific differences here, to avoid drift.

sebs/openwhisk/openwhisk.py (1)

104-139: Tighten package_code robustness and make language support explicit

The refactor to use Language and return just (archive, size) looks good and fits the shared interface, and limiting CONFIG_FILES to Python/Node.js matches the fact that OpenWhisk doesn’t support C++ here. To make this a bit more robust and clearer:

  • Guard against unsupported languages explicitly instead of relying on a KeyError, e.g.:
-        CONFIG_FILES = {
-            Language.PYTHON: ["__main__.py"],
-            Language.NODEJS: ["index.js"],
-        }
-        package_config = CONFIG_FILES[language]
+        CONFIG_FILES = {
+            Language.PYTHON: ["__main__.py"],
+            Language.NODEJS: ["index.js"],
+        }
+        try:
+            package_config = CONFIG_FILES[language]
+        except KeyError:
+            raise NotImplementedError(f"OpenWhisk packaging not implemented for {language}")
  • Improve the zip subprocess call both for style and failure handling:
-        subprocess.run(
-            ["zip", benchmark_archive] + package_config,
-            stdout=subprocess.DEVNULL,
-            cwd=directory,
-        )
+        subprocess.run(
+            ["zip", benchmark_archive, *package_config],
+            stdout=subprocess.DEVNULL,
+            cwd=directory,
+            check=True,
+        )

This addresses the Ruff hint about list concatenation and ensures a failed zip will surface immediately instead of proceeding with a missing/corrupt archive.

  • If Ruff’s ARG002 warnings about language_version, architecture, and is_cached are noisy but you need to keep the unified signature, consider adding a short comment and # noqa: ARG002 on the definition, or using them in a no‑op tuple assignment, to document that they are intentionally unused for OpenWhisk.

These are all small robustness/clarity tweaks; the overall design and the fact that C++ is intentionally unsupported for OpenWhisk look correct.

Based on learnings, OpenWhisk is expected to support only Python and Node.js here, not C++.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27e053d and 2c206a6.

📒 Files selected for processing (7)
  • config/cpp.json (1 hunks)
  • config/nodejs.json (1 hunks)
  • config/python.json (1 hunks)
  • docs/storage.md (2 hunks)
  • sebs.py (5 hunks)
  • sebs/openwhisk/openwhisk.py (6 hunks)
  • sebs/regression.py (7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • sebs/regression.py
  • sebs/openwhisk/openwhisk.py
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • sebs/openwhisk/openwhisk.py
🧬 Code graph analysis (3)
sebs.py (2)
benchmarks/wrappers/azure/python/storage.py (1)
  • storage (8-59)
benchmarks/wrappers/local/python/storage.py (1)
  • storage (7-57)
sebs/regression.py (2)
sebs/faas/function.py (2)
  • Trigger (178-268)
  • TriggerType (179-189)
sebs/sebs.py (2)
  • get_deployment (83-142)
  • logging_filename (40-41)
sebs/openwhisk/openwhisk.py (3)
sebs/types.py (1)
  • Language (25-38)
sebs/openwhisk/container.py (1)
  • OpenWhiskContainer (9-46)
sebs/aws/aws.py (2)
  • config (46-47)
  • package_code (122-168)
🪛 markdownlint-cli2 (0.18.1)
docs/storage.md

130-130: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.7)
sebs.py

360-360: Unused function argument: benchmark_input_size

(ARG001)


363-363: Unpacked variable output_dir is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


363-363: Unpacked variable logging_filename is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

sebs/openwhisk/openwhisk.py

108-108: Unused method argument: language_version

(ARG002)


109-109: Unused method argument: architecture

(ARG002)


111-111: Unused method argument: is_cached

(ARG002)


123-123: subprocess call: check for execution of untrusted input

(S603)


124-124: Consider ["zip", benchmark_archive, *package_config] instead of concatenation

Replace with ["zip", benchmark_archive, *package_config]

(RUF005)

🔇 Additional comments (9)
sebs.py (2)

67-67: LGTM!

Adding "cpp" to language choices aligns with the PR objective to support C++ benchmarks.


455-477: LGTM!

Good defensive improvement. The guards prevent KeyError when configuration sections are missing, which aligns with the selective section handling in storage_start.

sebs/regression.py (5)

16-38: LGTM!

Benchmark lists are appropriately extended. The benchmarks_cpp list correctly includes benchmarks that have C++ implementations, and "010.sleep" is now consistently available across all languages.


156-160: LGTM!

Clean approach for attaching per-test metadata. This enables proper filtering based on architecture and deployment type without parsing test names.


445-473: LGTM!

The filtering logic correctly:

  1. Excludes all C++ tests for arm64 architecture (line 455-456)
  2. Excludes 411.image-recognition for C++ package deployments (lines 462-463)

This aligns with the retrieved learnings that C++ support is limited to AWS. Based on learnings, C++ is not supported on Azure, GCP, or OpenWhisk.


496-497: LGTM!

C++ test sequence correctly added only for AWS provider. Based on learnings, C++ support is limited to AWS Lambda and is intentionally not available for Azure, GCP, or OpenWhisk.


529-540: LGTM!

Proper usage of the metadata attributes attached during test generation. This enables accurate per-test filtering.

config/cpp.json (1)

1-101: C++ experiment config aligns with AWS-only support

The schema matches the other language configs and correctly pins experiments.deployment and deployment.name to aws while keeping other providers as placeholders. Given that C++ is only supported on AWS in this framework, this separation looks appropriate; just avoid wiring C++ experiments to Azure/OpenWhisk until support is added.

Based on learnings, C++ support is currently limited to AWS only.

config/python.json (1)

1-101: Verify Python architecture token matches what the system expects

Here architecture is set to "x64", while other configs use "arm64". Please double‑check that "x64" is one of the accepted architecture identifiers in your deployment and packaging logic (e.g., vs "x86_64" or other canonical values) so experiments don’t silently misclassify or fall back.

Comment on lines +130 to 131
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove erroneous code block syntax.

Lines 130–131 open and close a code block without specifying a language, creating invalid markdown. This appears to be unintentional. The bash code block should close at line 129.

Apply this diff to remove the erroneous code block:

 healthy: 192.168.0.20:9012
-```
-```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
healthy: 192.168.0.20:9012
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

130-130: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/storage.md around lines 130 to 131, there is an extraneous code-block
closing/backtick sequence that opens and closes an empty code block (invalid
markdown) after the intended bash block; remove the stray three backticks on
lines 130–131 so the bash code block properly ends at line 129 and no
empty/invalid code block remains.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sebs/benchmark.py (1)

904-913: Fix inverted logic in code_package_modify – currently breaks zip-based deployments.

The new implementation:

def code_package_modify(self, filename: str, data: bytes):
    if self.container_deployment or not self.code_package_is_archive():
        assert self.code_location is not None
        self._update_zip(self.code_location, filename, data)
        ...
    else:
        raise NotImplementedError()

modifies the package when either:

  • container_deployment is True, or
  • the package is not an archive,

and raises NotImplementedError for the normal “zip-based deployment” case (non-container + archive). This is the opposite of the intended behavior and will prevent updates to standard zip packages (e.g., AWS zip functions) while attempting to treat container or directory-based packages as zips.

A safer implementation is:

     def code_package_modify(self, filename: str, data: bytes):
-        if self.container_deployment or not self.code_package_is_archive():
+        # Only support modifying zip-based, non-container code packages.
+        if (not self.container_deployment) and self.code_package_is_archive():
             assert self.code_location is not None
             self._update_zip(self.code_location, filename, data)
             new_size = self.code_package_recompute_size() / 1024.0 / 1024.0
             self.logging.info(
                 f"Modified zip package {self.code_location}, new size {new_size} MB"
             )
         else:
-            raise NotImplementedError()
+            raise NotImplementedError(
+                "Code package modification is only supported for zip-based deployments."
+            )
♻️ Duplicate comments (1)
sebs/aws/aws.py (1)

122-169: AWS package_code behavior is fine; shell-based zip is the main remaining concern.

The split between Python/Node.js (local zip creation) and C++ (prebuilt build/benchmark.zip) packaging looks consistent, and returning (benchmark_archive, bytes_size) matches the new interface.

The execute("zip -qu -r9 ...", shell=True, ...) call relies only on maintainer-controlled inputs here, so it’s low risk, but moving to zipfile.ZipFile would avoid shelling out and address the S604 warning. This has been suggested before and can be picked up when you next touch this path.

🧹 Nitpick comments (10)
sebs/local/local.py (2)

114-142: Compute package size from directory contents, not the directory inode.

bytes_size = os.path.getsize(directory) returns the size of the directory entry, not the total size of the packaged code, so the logged “Function size” is misleading. Using the existing Benchmark.directory_size helper here would give a more accurate value.

-        bytes_size = os.path.getsize(directory)
+        from sebs.benchmark import Benchmark
+        bytes_size = Benchmark.directory_size(directory)

282-307: Local correctly forbids container deployment but leaves container_uri unused.

The explicit NotImplementedError when container_deployment is True keeps Local aligned with the stated “no container deployment” support, which is good. Since container_uri isn’t used in either create_function or update_function, you may want to rename it to _container_uri to quiet linters, but behavior is otherwise fine.

sebs/openwhisk/openwhisk.py (3)

104-139: Enum-based packaging for OpenWhisk is fine; subprocess zip is acceptable but could be simplified.

Using Language.PYTHON / Language.NODEJS keys and package_config = CONFIG_FILES[language] is consistent with the new Language enum model. The subprocess.run(["zip", benchmark_archive] + package_config, ...) call operates only on maintainer-controlled paths, so it’s acceptable, though you could switch to ["zip", benchmark_archive, *package_config] for clarity or to Python’s zipfile in the future.


173-185: OpenWhisk now explicitly requires container deployment.

The guard:

if not container_deployment:
    raise RuntimeError("Non-container deployment is not supported in OpenWhisk!")

makes the constraint explicit, which is good. Just ensure all call sites pass container_deployment=True for OpenWhisk (and that tests don’t assume zip-only deployments).

Also note that container_uri is accepted but unused here; that’s harmless but could be renamed to _container_uri to appease linters.


273-300: Clarify error messaging in update_function for non-container deployments.

update_function raises when not container_deployment with a message about code location and container deployment. Since code_location is checked separately just below, you could simplify the first message to focus purely on the unsupported non-container mode, leaving the second check to explain missing code_location.

This is cosmetic only; behavior is correct.

sebs/azure/azure.py (1)

119-177: Azure packaging correctly adopts Language and validates code_location.

Using EXEC_FILES / CONFIG_FILES keyed by Language.PYTHON and Language.NODEJS matches the documented lack of C++ support on Azure while keeping the API future-proof. The _mount_function_code check for code_package.code_location is None is a useful early failure to avoid confusing downstream errors.

container_uri being unused in update_function is fine given Azure doesn’t support container deployment; you can prefix it with _ later if tooling noise becomes an issue.

sebs/benchmark.py (4)

28-80: CPP dependency plumbing into BenchmarkConfig looks good; consider unifying the Language import.

Adding cpp_dependencies: Optional[List[CppDependencies]] and deserializing via CppDependencies.deserialize aligns with the new C++ dependency model. The only minor nit is that BenchmarkConfig.deserialize still imports Language from sebs.faas.function while this module already imports Language from sebs.types; consolidating on the latter would avoid type drift across modules.


422-469: C++ CMake packaging works but duplicates some core dependency logic.

add_deployment_package_cpp generates a CMakeLists.txt that:

  • Adds the fixed set of C++ source files.
  • Always links aws-lambda-runtime, AWSSDK, and hiredis via PkgConfig.
  • Appends per-benchmark dependencies via CppDependencies.to_cmake_list.

This is consistent with “AWS-only C++” support and should work, but it partially duplicates the core dependencies (SDK/runtime/BOOST/HIREDIS) that CppDependencies.get_required_dependencies already models. Longer term, consider driving both the Dockerfile and CMake generation from CppDependencies.get_required_dependencies to keep the dependency set in one place.


660-782: build() ignores container_build_step, which may break deferred packaging for container-based deployments.

In the container-deployment branch:

if container_build_step is not None:
    self._code_location, self._code_size = package_build_step(
        os.path.abspath(self._output_dir),
        self.language,
        self.language_version,
        self.architecture,
        self.benchmark,
        self.is_cached_valid,
    )

the container_build_step argument is never used; package_build_step is called instead. That means any provider intending to use a special, deferred packaging step for container deployments (e.g., via a finalize_container_build hook) cannot, and the extra callback is effectively dead.

If the intent was to allow providers like OpenWhisk to plug in a post-container packaging callback, consider:

-        if container_build_step is not None:
-            self._code_location, self._code_size = package_build_step(
+        if container_build_step is not None:
+            self._code_location, self._code_size = container_build_step(
                 os.path.abspath(self._output_dir),
                 self.language,
                 self.language_version,
                 self.architecture,
                 self.benchmark,
                 self.is_cached_valid,
             )

Please verify how build() is invoked for each system and whether any caller currently relies on container_build_step.


650-653: Using logging.error with decoded stderr is fine; logging.exception is optional.

Logging e.stderr.decode() and the Docker volumes on docker.errors.ContainerError is useful for debugging failed dependency builds. If you want stack traces as well, you could switch the first call to self.logging.exception(...), but that’s optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c206a6 and 5a893a8.

📒 Files selected for processing (6)
  • sebs/aws/aws.py (8 hunks)
  • sebs/azure/azure.py (7 hunks)
  • sebs/benchmark.py (19 hunks)
  • sebs/gcp/gcp.py (6 hunks)
  • sebs/local/local.py (5 hunks)
  • sebs/openwhisk/openwhisk.py (10 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • sebs/openwhisk/openwhisk.py
  • sebs/azure/azure.py
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • sebs/openwhisk/openwhisk.py
  • sebs/benchmark.py
  • sebs/azure/azure.py
🧬 Code graph analysis (6)
sebs/openwhisk/openwhisk.py (3)
sebs/types.py (1)
  • Language (25-38)
sebs/openwhisk/container.py (1)
  • OpenWhiskContainer (9-46)
sebs/benchmark.py (4)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • benchmark (102-103)
sebs/aws/aws.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (3)
  • language (160-161)
  • language_version (168-169)
  • benchmark (102-103)
sebs/benchmark.py (3)
sebs/cpp_dependencies.py (4)
  • CppDependencies (22-199)
  • deserialize (88-92)
  • to_cmake_list (95-122)
  • generate_dockerfile (155-199)
sebs/faas/container.py (2)
  • DockerContainer (16-236)
  • name (19-20)
sebs/types.py (1)
  • Language (25-38)
sebs/gcp/gcp.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (8)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • benchmark (102-103)
  • is_cached (135-136)
  • is_cached (139-140)
  • code_package (114-115)
  • code_location (122-132)
sebs/local/local.py (3)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (5)
  • language (160-161)
  • architecture (184-185)
  • benchmark (102-103)
  • container_uri (155-157)
  • container_deployment (188-189)
sebs/experiments/config.py (2)
  • architecture (37-38)
  • container_deployment (41-42)
sebs/azure/azure.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (4)
  • language (160-161)
  • benchmark (102-103)
  • code_package (114-115)
  • code_location (122-132)
🪛 Ruff (0.14.7)
sebs/openwhisk/openwhisk.py

108-108: Unused method argument: language_version

(ARG002)


109-109: Unused method argument: architecture

(ARG002)


111-111: Unused method argument: is_cached

(ARG002)


123-123: subprocess call: check for execution of untrusted input

(S603)


124-124: Consider ["zip", benchmark_archive, *package_config] instead of concatenation

Replace with ["zip", benchmark_archive, *package_config]

(RUF005)


182-182: Avoid specifying long messages outside the exception class

(TRY003)


224-227: Avoid specifying long messages outside the exception class

(TRY003)


278-278: Unused method argument: container_uri

(ARG002)


281-284: Avoid specifying long messages outside the exception class

(TRY003)


297-300: Avoid specifying long messages outside the exception class

(TRY003)

sebs/aws/aws.py

126-126: Unused method argument: language_version

(ARG002)


127-127: Unused method argument: architecture

(ARG002)


129-129: Unused method argument: is_cached

(ARG002)


148-148: Function call with shell=True parameter identified, security issue

(S604)


338-338: Avoid specifying long messages outside the exception class

(TRY003)

sebs/benchmark.py

268-268: Probable use of insecure hash functions in hashlib: md5

(S324)


652-652: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


936-936: Avoid specifying long messages outside the exception class

(TRY003)

sebs/gcp/gcp.py

129-129: Unused method argument: language_version

(ARG002)


130-130: Unused method argument: architecture

(ARG002)


132-132: Unused method argument: is_cached

(ARG002)


197-197: Avoid specifying long messages outside the exception class

(TRY003)


364-364: Unused method argument: container_uri

(ARG002)


371-371: Avoid specifying long messages outside the exception class

(TRY003)

sebs/local/local.py

118-118: Unused method argument: language_version

(ARG002)


119-119: Unused method argument: architecture

(ARG002)


120-120: Unused method argument: benchmark

(ARG002)


121-121: Unused method argument: is_cached

(ARG002)


287-287: Unused method argument: container_uri

(ARG002)


303-303: Unused method argument: container_uri

(ARG002)

sebs/azure/azure.py

123-123: Unused method argument: language_version

(ARG002)


124-124: Unused method argument: architecture

(ARG002)


126-126: Unused method argument: is_cached

(ARG002)


259-259: Unused method argument: container_uri

(ARG002)


385-385: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (13)
sebs/openwhisk/openwhisk.py (1)

46-76: Container client refactor for OpenWhisk looks consistent.

Using a private _container_client with a typed container_client property keeps this in line with other providers and centralizes container handling cleanly.

sebs/aws/aws.py (2)

53-56: Exposing container_client via a property is a clean API.

The container_client property returning self.ecr_client neatly standardizes container access across systems and keeps the public surface coherent.


320-339: Guarding against missing code packages in update_function is correct.

The explicit:

if package is None:
    raise RuntimeError("Code location is not set for zip deployment")

prevents silent misconfigurations for zip-based deployments and makes failures much easier to diagnose.

sebs/gcp/gcp.py (2)

125-183: GCP packaging correctly shifts to Language and enforces non-container deployment.

Enum-keyed CONFIG_FILES / HANDLER and package_config = CONFIG_FILES[language] are consistent with the new Language model, and renaming handler.py/handler.js to main.py/index.js matches GCP’s expectations. The explicit checks that container deployment is unsupported and that code_location must be set for deployments keep error modes clear.


359-372: Runtime checks for code_location in GCP updates are appropriate.

The guard:

if code_package.code_location is None:
    raise RuntimeError("Code location is not set for GCP deployment")

ensures function updates don’t proceed with an undefined package path, preventing obscure downstream failures during storage upload or patch operations.

sebs/benchmark.py (8)

266-295: Language-aware hashing and wrapper inclusion are sensible.

Extending hash_directory to use Language-keyed FILES/WRAPPERS (including C++ patterns and wrapper files under benchmarks/wrappers/{deployment}/{language}) ensures any language- or deployment-specific code changes invalidate the cache as expected. MD5 is acceptable here since this is purely for change detection, not security.


337-351: C++ file patterns in copy_code are wired correctly.

Including "*.cpp", "*.hpp", and "dependencies.json" for Language.CPP mirrors the hashing logic and ensures all relevant C++ sources and dependency metadata are copied into the build directory.


490-502: builder_image_name cleanly centralizes build image naming.

Returning both an unversioned and SeBS-versioned build image tag based on deployment, language_name, and language_version makes the subsequent Docker logic in install_dependencies and build more readable and consistent.


503-567: Language-aware dependency installation and C++ support look correct.

Using PACKAGE_FILES = {Language.PYTHON: "requirements.txt", Language.NODEJS: "package.json", Language.CPP: "CMakeLists.txt"} and selecting PACKAGE_FILES[self.language] ensures the Docker build images only run when a corresponding dependency file exists. Combined with builder_image_name, this keeps Python/Node.js/C++ handling uniform.


702-782: Container path’s use of package_build_step for OpenWhisk-style code packages should be double-checked.

For container_deployment=True, you both:

  • Build the container image via container_client.build_base_image(...), and
  • Optionally create a code package using package_build_step(...).

That’s fine for AWS (where the extra package is unused), but for providers that require a separate code package in addition to the container (e.g., OpenWhisk), it’s important that the function passed as package_build_step really is the platform-specific packaging routine. Please confirm that OpenWhisk (and any other container-only providers) are wired this way.


795-807: Return tuple from build() matches new packaging/container model.

Returning (True, self._code_location, self._container_deployment, self._container_uri) after updating the cache is consistent with how callers distinguish between package- and container-based flows, and matches the earlier fast-path returns when the cache is valid.


819-877: Input preparation refactor mainly improves clarity.

The updated prepare_input signature and the reformatted allocate_nosql loop:

for (
    name,
    table_properties,
) in self._benchmark_input_module.allocate_nosql().items():

are functionally equivalent to the prior version and keep behavior unchanged. Storage and NoSQL setup still follow the same contract defined by BenchmarkModuleInterface.


919-941: Archive detection and size recomputation helpers are consistent with the new deployment model.

code_package_is_archive and code_package_recompute_size now correctly treat container deployments as non-archives and force an explicit code_location presence for package-based deployments. Once code_package_modify is fixed as above, these helpers will behave as expected.

Comment on lines +495 to +499
billed_time = int(aws_vals["Billed Duration"])
memory = int(aws_vals["Memory Size"])
output.billing.billed_time = billed_time
output.billing.memory = memory
output.billing.gb_seconds = billed_time * memory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "aws.py" -path "*/aws/*" | head -5

Repository: spcl/serverless-benchmarks

Length of output: 52


🏁 Script executed:

git ls-files | grep -E "(sebs|aws)" | head -20

Repository: spcl/serverless-benchmarks

Length of output: 914


🏁 Script executed:

head -20 sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 611


🏁 Script executed:

wc -l sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 89


🏁 Script executed:

sed -n '485,510p' sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 1210


🏁 Script executed:

rg "gb_seconds" --type py

Repository: spcl/serverless-benchmarks

Length of output: 581


🏁 Script executed:

rg -A 5 "class.*[Bb]illing" sebs/ --type py | head -40

Repository: spcl/serverless-benchmarks

Length of output: 307


🏁 Script executed:

grep -r "Billed Duration" sebs/ --type py

Repository: spcl/serverless-benchmarks

Length of output: 187


🏁 Script executed:

grep -B 5 -A 20 "class ExecutionBilling" sebs/faas/function.py

Repository: spcl/serverless-benchmarks

Length of output: 598


🏁 Script executed:

grep -B 5 -A 10 "Billed Duration\|Memory Size" sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 812


🏁 Script executed:

rg "aws_vals\[" sebs/aws/aws.py -A 1 -B 1 | head -50

Repository: spcl/serverless-benchmarks

Length of output: 860


🏁 Script executed:

rg "Billed Duration|Duration" sebs/ tests/ --type py -B 2 -A 2 | head -60

Repository: spcl/serverless-benchmarks

Length of output: 749


🏁 Script executed:

find . -name "*.py" -path "*/test*" | xargs grep -l "billing\|gb_seconds" 2>/dev/null

Repository: spcl/serverless-benchmarks

Length of output: 104


🏁 Script executed:

cat tests/aws/invoke_function_sdk.py

Repository: spcl/serverless-benchmarks

Length of output: 2879


🏁 Script executed:

find . -type f \( -name "*.md" -o -name "*.rst" -o -name "*.txt" \) | xargs grep -i "billed duration\|memory size" 2>/dev/null

Repository: spcl/serverless-benchmarks

Length of output: 192


🏁 Script executed:

rg "Duration.*second|second.*Duration" --type py -i

Repository: spcl/serverless-benchmarks

Length of output: 191


🌐 Web query:

AWS Lambda CloudWatch logs "Billed Duration" units milliseconds seconds

💡 Result:

Billed Duration is reported in milliseconds. [1][2]

Sources:

  • AWS Lambda pricing (duration billed rounded to nearest 1 ms). [1]
  • AWS announcement about 1 ms billing granularity (Dec 2020). [2]

🏁 Script executed:

rg "/ 1000|/ 1024|conversion" sebs/aws/ sebs/faas/ --type py -B 1 -A 1 | head -40

Repository: spcl/serverless-benchmarks

Length of output: 965


🏁 Script executed:

grep -B 3 -A 3 "int(float" sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 574


🏁 Script executed:

python3 << 'EOF'
# Example: 1000 ms (1 second), 128 MB memory
billed_time_ms = 1000
memory_mb = 128

# Current incorrect calculation
incorrect = billed_time_ms * memory_mb
print(f"Current (incorrect): {billed_time_ms} ms * {memory_mb} MB = {incorrect}")
print(f"  Units: ms × MB (dimensionally wrong)")

# Correct calculation
correct = (billed_time_ms / 1000.0) * (memory_mb / 1024.0)
print(f"\nCorrect: ({billed_time_ms} ms / 1000) * ({memory_mb} MB / 1024) = {correct}")
print(f"  Units: seconds × GB = GB-seconds")
print(f"  As int: {int(correct)}")
EOF

Repository: spcl/serverless-benchmarks

Length of output: 156


Fix gb_seconds calculation to convert milliseconds and megabytes to GB-seconds.

The current computation multiplies milliseconds by megabytes, which has incorrect units. AWS billing uses GB-seconds, requiring conversion from the extracted values:

billed_time = int(aws_vals["Billed Duration"])  # in milliseconds
memory = int(aws_vals["Memory Size"])  # in megabytes
output.billing.gb_seconds = billed_time * memory  # wrong: ms × MB

Should be:

billed_time_ms = int(aws_vals["Billed Duration"])
memory_mb = int(aws_vals["Memory Size"])
output.billing.billed_time = billed_time_ms
output.billing.memory = memory_mb
output.billing.gb_seconds = int((billed_time_ms / 1000.0) * (memory_mb / 1024.0))
🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 495 to 499, the gb_seconds calculation
multiplies billed time (milliseconds) by memory (megabytes) producing incorrect
units; convert milliseconds to seconds and megabytes to gigabytes before
multiplying. Set billed_time and memory from aws_vals as integers, then compute
gb_seconds as int((billed_time / 1000.0) * (memory / 1024.0)) and assign those
values to output.billing.billed_time, output.billing.memory, and
output.billing.gb_seconds accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.circleci/config.yml (2)

54-56: Add C++ Docker image builds to test-aws job.

This PR adds C++ benchmark support, but the test-aws job only pulls and caches Python 3.7 and Node.js 12.x Docker images. No C++ images are being built or tested in the CI pipeline.

Add Docker image pulling and saving for C++ builds. For example:

              docker pull mcopik/serverless-benchmarks:build.aws.python.3.7
              docker pull mcopik/serverless-benchmarks:build.aws.nodejs.12.x
+             docker pull mcopik/serverless-benchmarks:build.aws.cpp

Alternatively, if C++ images should be built from Dockerfiles in this repository, ensure they are built and cached in this step.


1-41: Add C++ build and linting jobs to the CI pipeline.

The PR adds C++ benchmarks with CMake builds, but there are no C++-specific jobs in this configuration to:

  1. Build C++ functions with CMake
  2. Verify C++ code formatting/linting
  3. Test the generated Docker containers

Consider adding a dedicated C++ build job that:

  • Installs CMake and C++ compiler dependencies
  • Builds C++ benchmarks
  • Validates Docker container generation
  • Verifies the base images (AWS SDK, Boost, Redis) are correctly configured

Would you like me to suggest a complete C++ build job configuration based on your CMake setup and Dockerfile structure?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a893a8 and 2b1aa03.

📒 Files selected for processing (2)
  • .circleci/config.yml (1 hunks)
  • README.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
🔇 Additional comments (3)
README.md (1)

85-85: Verify Python 3.10+ requirement is necessary.

The Python version bump from 3.7+ to 3.10+ is a breaking change affecting users on Python 3.7–3.9. While this aligns with the broader codebase changes (Language enum, type system), it's unclear whether this version bump is strictly required for C++ support or driven by other dependencies.

Please clarify: Is 3.10+ a hard requirement for the new C++ functionality, or can support for earlier Python versions be maintained? If this is a hard requirement, consider documenting the reason in the README or a migration guide.

.circleci/config.yml (2)

8-10: Verify Python 3.10 support for all dependencies.

Pinning the linting job to Python 3.10 explicitly improves reproducibility. However, ensure that all dependencies (especially AWS SDK, Boost, and Redis libraries being installed on line 21) support Python 3.10.


79-83: Include test-aws job in the main workflow.

The main workflow (lines 79-83) only runs the linting job. The test-aws job is defined but never executed. This means integration tests are not running in the CI pipeline.

Verify whether test-aws should be included in the workflow:

  main:
    jobs:
      - linting
+     - test-aws

Also consider whether the test-aws job should now be renamed to test-aws-python or if a new test-aws-cpp job should be added alongside it.


orbs:
python: circleci/python@1.4.0
python: circleci/python@2.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

circleci python orb 2.1 breaking changes compared to 1.4.0

💡 Result:

Short summary of likely breaking changes when moving circleci/python from 1.4.0 → 2.1

  • Major-version semantics: 2.x is a breaking release — expect incompatible changes to commands/parameters/executors. [1]
  • Default images / Python versions bumped (newer cimg images); some orb steps now run on images that enforce PEP 668 (can break orb commands that try to pip-install into system-managed envs). Update any steps that assume older Python image behavior. [2]
  • Migration from Python wrapper-scripts to shell-based implementations for some orbs/commands (done to avoid PEP 668 issues) — this can change how environment/exit codes/outputs behave and may rename parameters. [2]
  • Executors, command names or parameters may have been renamed or removed (minor/patch should be compatible, but major bumps may remove elements present in 1.4.0). Test and verify usages in your config. [1][3]

Action items

  • Inspect the circleci/python orb page and its changelog/versions in the Orb Registry for exact diffs and element-level changes. [3]
  • Run circleci orb source circleci/[email protected] and circleci/[email protected] (or view the registry source) and diff commands/jobs/executors used by your config.
  • Test in a branch and pin the orb to an exact semver (e.g. circleci/[email protected]) before rolling out.

Sources

  • Orb SemVer / breaking-change guidance. [1]
  • “externally-managed-environment” / PEP 668 note and migration to shell scripts. [2]
  • circleci/python orb registry page (versions & docs). [3]

🏁 Script executed:

# Check the full CircleCI config to understand Python version usage and executor configuration
cat -n .circleci/config.yml | head -100

Repository: spcl/serverless-benchmarks

Length of output: 3258


🏁 Script executed:

# Check if there are any CI logs or test results that show the orb 2.1 is working
git log --oneline --all -20 | head -20

Repository: spcl/serverless-benchmarks

Length of output: 977


🌐 Web query:

circleci python orb 2.1 default Python version executor

💡 Result:

The python/default executor in the circleci/python 2.x orb defaults to Python 3.8 (the executor has a parameter tag with default "3.8"). You can override it by declaring the executor with a tag, e.g.:

executor:
name: python/default
tag: "3.11"

[1][2]

Sources

  • CircleCI Discuss showing executor definition (default tag '3.8'). [1]
  • CircleCI docs / Python orb usage (shows using python/default executor and overriding via tag). [2]

Add explicit Python version to test-aws executor to match linting job.

The Python orb 2.1 introduces breaking changes from 1.4.0, including a bump to newer default images with PEP 668 enforcement. The linting job correctly specifies Python 3.10 explicitly, but the test-aws job uses python/default without a tag parameter (line 42), which defaults to Python 3.8 in orb 2.1. Add an explicit tag: '3.10' to the test-aws executor to ensure consistency and avoid unexpected behavior from implicit version differences.

🤖 Prompt for AI Agents
In .circleci/config.yml around lines 4 to 4 (and the test-aws executor usage
near line 42), the test-aws job relies on the orb's default Python image which
in python orb v2.1 defaults to 3.8 and can cause mismatches with the linting
job; update the test-aws executor declaration to explicitly set the python tag
to '3.10' (e.g., add tag: '3.10' under the python executor) so both jobs use the
same Python 3.10 image and avoid PEP 668/behavior differences from the orb
default.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
docs/build.md (1)

56-56: Replace the weak intensifier "very" with a more precise adjective.

The phrase "very small size limit" relies on an over-used intensifier. Consider using more descriptive language such as "minimal", "strict", or "tight" to better convey the constraint.

Example revision:

-Since OpenWhisk has a very small size limit on code packages, we deploy all functions as Docker images.
+Since OpenWhisk has a strict size limit on code packages, we deploy all functions as Docker images.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b1aa03 and 5ba7dee.

📒 Files selected for processing (2)
  • docs/benchmarks.md (1 hunks)
  • docs/build.md (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:15:01.811Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.

Applied to files:

  • docs/benchmarks.md
  • docs/build.md
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • docs/benchmarks.md
  • docs/build.md
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • docs/benchmarks.md
  • docs/build.md
🪛 LanguageTool
docs/benchmarks.md

[locale-violation] ~32-~32: The phrase ‘In future’ is British English. Did you mean: “In the future”?
Context: ...ue to 30 second timeout of API gateway. In future, we might support Docker-based deployme...

(IN_FUTURE)

docs/build.md

[style] ~56-~56: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...taken in OpenWhisk. Since OpenWhisk has a very small size limit on code packages, we deploy ...

(EN_WEAK_ADJECTIVE)

🔇 Additional comments (3)
docs/build.md (2)

23-46: Excellent documentation of the build workflow.

The new "Code Package Build" section provides clear, step-by-step narrative of the build process with concrete examples (e.g., PyTorch, ffmpeg). The decomposition into logical stages (Query Cache, Prepare Environment, etc.) aligns well with the architecture and will help users understand the framework's build pipeline.


48-60: Platform coverage is clearly documented.

The updated Docker Image Build section accurately reflects C++ support limitations (AWS Lambda only) and provides explicit platform compatibility notes. The distinction between AWS Lambda and OpenWhisk workflows is well-articulated.

docs/benchmarks.md (1)

25-26: ARM limitation note is clear and well-placed.

The note correctly documents that ARM architecture is available only for AWS Lambda and that C++ benchmarks are not supported on ARM. This should be consistent with the benchmark table entries above.


| Type | Benchmark | Languages | Architecture | Description |
| :--- | :---: | :---: | :---: | :---: |
| Webapps | 010.sleep | Python, Node.js, C++ | x64, arm64 | Customizable sleep microbenchmark. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n docs/benchmarks.md | head -40

Repository: spcl/serverless-benchmarks

Length of output: 3857


C++ benchmarks incorrectly list ARM architecture support, contradicting the documentation note.

Lines 6, 10, 14, and 16 claim C++ support on both x64 and arm64. However, the note at line 26 explicitly states "C++ benchmarks are currently not supported on the ARM architecture." Update the table to show x64 only for all C++ benchmarks.

Lines 6, 10, 14, 16:

-| Webapps      | 010.sleep    | Python, Node.js, C++ | x64, arm64 | Customizable sleep microbenchmark. |
+| Webapps      | 010.sleep    | Python, Node.js, C++ | x64 | Customizable sleep microbenchmark. |
-| Multimedia      | 210.thumbnailer    | Python, Node.js, C++ | x64, arm64 | Generate a thumbnail of an image. |
+| Multimedia      | 210.thumbnailer    | Python, Node.js, C++ | x64 | Generate a thumbnail of an image. |
-| Scientific      | 501.graph-pagerank    | Python, C++ | x64, arm64 | PageRank implementation with igraph. |
+| Scientific      | 501.graph-pagerank    | Python, C++ | x64 | PageRank implementation with igraph. |
-| Scientific      | 503.graph-bfs    | Python, C++ | x64, arm64 | Breadth-first search (BFS) implementation with igraph. |
+| Scientific      | 503.graph-bfs    | Python, C++ | x64 | Breadth-first search (BFS) implementation with igraph. |

Also fix line 32: change "In future" to "In the future".

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In docs/benchmarks.md around lines 6, 10, 14, and 16, the table rows listing C++
benchmarks currently show "x64, arm64" but the document note states C++ is not
supported on ARM; update each of those C++ benchmark rows to list only "x64" as
the architecture. Also on line 32 change the phrase "In future" to "In the
future".

> [!WARNING]
> [!WARNING]
> Benchmark `411.image-recognition` does not work on AWS with Python 3.9 due to excessive code size. While it is possible to ship the benchmark by zipping `torchvision` and `numpy` (see `benchmarks/400.inference/411.image-recognition/python/package.sh`), this significantly affects cold startup. On the lowest supported memory configuration of 512 MB, the cold startup can reach 30 seconds, making HTTP trigger unusable due to 30 second timeout of API gateway. In future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix British English phrasing.

Line 32 uses "In future" which is British English. Align with American English conventions used elsewhere in the codebase.

-In future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
+In the future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> Benchmark `411.image-recognition` does not work on AWS with Python 3.9 due to excessive code size. While it is possible to ship the benchmark by zipping `torchvision` and `numpy` (see `benchmarks/400.inference/411.image-recognition/python/package.sh`), this significantly affects cold startup. On the lowest supported memory configuration of 512 MB, the cold startup can reach 30 seconds, making HTTP trigger unusable due to 30 second timeout of API gateway. In future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
> Benchmark `411.image-recognition` does not work on AWS with Python 3.9 due to excessive code size. While it is possible to ship the benchmark by zipping `torchvision` and `numpy` (see `benchmarks/400.inference/411.image-recognition/python/package.sh`), this significantly affects cold startup. On the lowest supported memory configuration of 512 MB, the cold startup can reach 30 seconds, making HTTP trigger unusable due to 30 second timeout of API gateway. In the future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
🧰 Tools
🪛 LanguageTool

[locale-violation] ~32-~32: The phrase ‘In future’ is British English. Did you mean: “In the future”?
Context: ...ue to 30 second timeout of API gateway. In future, we might support Docker-based deployme...

(IN_FUTURE)

🤖 Prompt for AI Agents
In docs/benchmarks.md around line 32, the phrase "In future" uses British
English; replace it with American English by changing it to "In the future" so
the wording aligns with the rest of the codebase; update the sentence
accordingly and run a quick grep to ensure no other occurrences remain.

docs/build.md Outdated
An alternative to uploading a code package is to deploy a Docker image. This option is not supported on every platform, i.e., Azure Functions and Google Cloud Functions do not support custom Docker images.

On AWS Lambda, we support deploying functions as Docker images for Python, Node.js, and C++ functions.
We also support building arm64 images for this platform, except for C++ functions (extension to ARM functions is planned in future).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify or remove the speculative future reference.

The phrase "extension to ARM functions is planned in future" is vague and potentially outdated for documentation. Either provide a concrete timeline/status or remove this statement to avoid confusion.

🤖 Prompt for AI Agents
In docs/build.md around line 53, the sentence mentioning "extension to ARM
functions is planned in future" is vague and potentially outdated; either remove
that speculative clause or replace it with a concrete, accurate status (e.g.,
"not currently supported" or "planned for Q3 2025") and, if available, add a
link or short note to an issue/roadmap for tracking. Update the line to state
only the current support status for arm64 images and, if you choose to keep a
future note, provide a specific timeframe or reference to an open issue/roadmap
entry.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/build.md (1)

68-68: Replace vague future-tense statement with current status.

The phrase "extension to ARM functions is planned in the future" is speculative and unhelpful for documentation. State the current limitation clearly instead.

Apply this diff to replace the speculative clause with a concrete statement:

-We support deploying functions as Docker images for Python, Node.js, and C++ functions. We also support building arm64 images for this platform, except for C++ functions (extension to ARM functions is planned in the future).
+We support deploying functions as Docker images for Python, Node.js, and C++ functions. We also support building arm64 images for Python and Node.js; C++ functions are not currently supported on ARM architectures.
🧹 Nitpick comments (1)
docs/build.md (1)

72-72: Strengthen the adjective description.

The intensifier "very" is weak; use a more precise term to convey the constraint.

Apply this diff to improve word choice:

-This platform has a very small size limit on code packages. Thus, we deploy all functions as Docker images.
+This platform has a strict size limit on code packages. Thus, we deploy all functions as Docker images.

Alternatives: "tight" or simply "small" if brevity is preferred. Based on static analysis feedback.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ba7dee and a1464e7.

📒 Files selected for processing (1)
  • docs/build.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • docs/build.md
🪛 LanguageTool
docs/build.md

[style] ~72-~72: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ure). ### OpenWhisk This platform has a very small size limit on code packages. Thus, we d...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
docs/build.md

6-6: Link fragments should be valid

(MD051, link-fragments)

By default, we deploy benchmark code as package uploaded to the serverless platform.
However, on some platforms we use [Docker images](#docker-image-build) instead.
By default, we deploy benchmark code as a package uploaded to the serverless platform. There, we use custom **build images** to install dependencies in an environment resembling the
function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-build) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix invalid link fragment reference.

The link fragment #docker-image-build does not match the actual heading. The section titled "## Docker Image Deployment" (line 39) would generate the anchor #docker-image-deployment.

Apply this diff to correct the link:

-function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-build) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
+function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-deployment) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-build) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-deployment) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

6-6: Link fragments should be valid

(MD051, link-fragments)

🤖 Prompt for AI Agents
In docs/build.md around line 6, the link fragment #[docker-image-build] is
incorrect; replace it with the correct heading anchor #[docker-image-deployment]
so the link points to the "## Docker Image Deployment" section (update the
Markdown link to use #docker-image-deployment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants