Skip to content

Conversation

@Amar3tto
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Amar3tto, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a validation effort to ensure compatibility and functionality with the upcoming Apache Beam 2.69.0 release candidate. It involves a comprehensive update of both Java and Python Beam dependencies, along with necessary build system configurations and code adjustments to align with the new Beam version's APIs. The changes are specifically for testing purposes, as indicated by the [DO NOT MERGE] prefix in the title.

Highlights

  • Apache Beam Version Upgrade: The project's Apache Beam dependencies have been updated to version 2.69.0 for Java and 2.69.0rc3 for Python across various configuration and requirements files.
  • Maven Repository Configuration: The pom.xml now includes a specific Apache Beam release candidate Maven repository to fetch the updated Beam artifacts.
  • Maven Profile Activation Logic: The validateCandidate Maven profile's activation has been changed from activeByDefault=false to jdk=[1.), ensuring it is active for all JDK versions.
  • Python Dependency Refinements: Python requirements files have been updated with the new apache-beam version and corresponding hash changes. The dill dependency has been removed, and google-cloud-secret-manager has been added.
  • Beam API Adaptation in SpannerIO: The SpannerIO.java file has been modified to adapt to potential changes in the Apache Beam OutputReceiver API, specifically by introducing an OutputBuilderSupplier to correctly handle windowed values.
  • Test Case Adjustment: A test case in DataStreamMongoDBToFirestoreTest.java was updated to assert NoSuchMethodError instead of IllegalArgumentException, reflecting changes in the underlying Beam API behavior.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 14.28571% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.99%. Comparing base (7809c6b) to head (c5c1789).
⚠️ Report is 25 commits behind head on main.

Files with missing lines Patch % Lines
...le/cloud/teleport/spanner/spannerio/SpannerIO.java 14.28% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2907      +/-   ##
============================================
- Coverage     50.07%   49.99%   -0.08%     
- Complexity     4971     5350     +379     
============================================
  Files           967      967              
  Lines         59318    59334      +16     
  Branches       6445     6448       +3     
============================================
- Hits          29701    29666      -35     
- Misses        27511    27566      +55     
+ Partials       2106     2102       -4     
Components Coverage Δ
spanner-templates 70.46% <14.28%> (-0.03%) ⬇️
spanner-import-export 68.99% <14.28%> (-0.07%) ⬇️
spanner-live-forward-migration 79.69% <ø> (ø)
spanner-live-reverse-replication 77.42% <ø> (ø)
spanner-bulk-migration 88.21% <ø> (ø)
Files with missing lines Coverage Δ
...le/cloud/teleport/spanner/spannerio/SpannerIO.java 68.38% <14.28%> (-0.95%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Abacn
Copy link
Contributor

Abacn commented Oct 22, 2025

JdbcToBigQueryYamlIT.testJdbcToBigQuery and testJdbcToBigQueryWithoutDriverJars

Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.NullPointerException
	org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
	org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$4$DoFnInvoker.invokeProcessElement(Unknown Source)
org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingParDo(FnApiDoFnRunner.java:638)
org.apache.beam.sdk.util.UnboundedScheduledExecutorService$ScheduledFutureTask.run(UnboundedScheduledExecutorService.java:163)
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException
	java.base/java.util.regex.Matcher.getTextLength(Matcher.java:1770)
	java.base/java.util.regex.Matcher.reset(Matcher.java:416)
	java.base/java.util.regex.Matcher.<init>(Matcher.java:253)
	java.base/java.util.regex.Pattern.matcher(Pattern.java:1134)
	org.apache.beam.sdk.io.FileSystems.parseScheme(FileSystems.java:541)
	org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:663)
	org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:752)
	org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$4.getTempFilePrefix(BatchLoads.java:598)

job id 2025-10-21_19_17_09-8445822678697657959 and 2025-10-21_19_17_09-12248170799157878180

Unfortunately the fix in apache/beam#36564 wasn't complete

but reading the code how could tempLocationRoot be null...

@Amar3tto
Copy link
Collaborator Author

JdbcToBigQueryYamlIT.testJdbcToBigQuery and testJdbcToBigQueryWithoutDriverJars

Error message from worker: org.apache.beam.sdk.util.UserCodeException: java.lang.NullPointerException
	org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
	org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$4$DoFnInvoker.invokeProcessElement(Unknown Source)
org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForWindowObservingParDo(FnApiDoFnRunner.java:638)
org.apache.beam.sdk.util.UnboundedScheduledExecutorService$ScheduledFutureTask.run(UnboundedScheduledExecutorService.java:163)
	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException
	java.base/java.util.regex.Matcher.getTextLength(Matcher.java:1770)
	java.base/java.util.regex.Matcher.reset(Matcher.java:416)
	java.base/java.util.regex.Matcher.<init>(Matcher.java:253)
	java.base/java.util.regex.Pattern.matcher(Pattern.java:1134)
	org.apache.beam.sdk.io.FileSystems.parseScheme(FileSystems.java:541)
	org.apache.beam.sdk.io.FileSystems.matchNewResource(FileSystems.java:663)
	org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.resolveTempLocation(BigQueryHelpers.java:752)
	org.apache.beam.sdk.io.gcp.bigquery.BatchLoads$4.getTempFilePrefix(BatchLoads.java:598)

job id 2025-10-21_19_17_09-8445822678697657959 and 2025-10-21_19_17_09-12248170799157878180

Unfortunately the fix in apache/beam#36564 wasn't complete

but reading the code how could tempLocationRoot be null...

Can we fix these tests or new rc is needed?

@Abacn
Copy link
Contributor

Abacn commented Oct 22, 2025

This one is caused by apache/beam#36139 switched the BQIO implemtation for Beam YAML. The NPE still occurring suggesting it wasn't caused by ValueProvider. The pipeline option is indeed not passed in managed transform combined with YAML. apache/beam#33074 is still at open status

@Abacn
Copy link
Contributor

Abacn commented Oct 22, 2025

bigQueryLoadingTemporaryDirectory isn't effective for YAML templates. Let me see if this can be fixed by a config change.

@tarun-google
Copy link
Contributor

tarun-google commented Oct 22, 2025

temp_location is a requirement after migration to BigQueryManagedIO Batch Writes. we need a fix like this apache/beam#36336

@Abacn
Copy link
Contributor

Abacn commented Oct 22, 2025

It's on Dataflow runner so temp location should be present. In fact in template launcher log I can see

template-container-args: {
...
"environment":{
...
"stagingLocation":"gs://dataflow-staging-us-west2-269744978479/staging",
"tempLocation":"gs://dataflow-staging-us-west2-269744978479/tmp"
  },

Yaml pipeline launcher options

main.py --project=cloud-teleport-testing ...  --template_location=gs://dataflow-staging-us-west2-269744978479/staging/template_launches/2025-10-21_19_17_09-12248170799157878180/job_object --temp_location=gs://dataflow-staging-us-west2-269744978479/tmp 

(there are two temp_location parameter is dubious though)

@tarun-google
Copy link
Contributor

I would check if they are actually being picked up. In Beam repo i had to make fix apache/beam#36302. If its delaying the Release, we can choose to revert Migration and i can fix Templates repo

@Abacn
Copy link
Contributor

Abacn commented Oct 23, 2025

also update here -- this is actually WAI because YAML BQIO switched to use managed transform, which does not upgrade to 2.69.0 yet, see apache/beam#33074 (comment). I think we're good for now

@Amar3tto Amar3tto closed this Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants