Dataflow Template for Firestore to Firestore Documents Copy #3098

pacoavila808 · 2025-12-10T23:54:18Z

Implements a basic Dataflow template for copying documents from one Firestore database to another.

…APSHOT, cleanup flag ordering

gemini-code-assist · 2025-12-10T23:54:39Z

Summary of Changes

Hello @pacoavila808, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new Google Cloud Dataflow template that enables the batch copying of documents from one Firestore database to another. It provides a flexible and robust solution for data migration or replication, allowing users to specify source and destination environments, select specific collections, and maintain data consistency. The template is designed for efficient execution, utilizing Firestore's native querying and batch writing features.

Highlights

New Dataflow Template: Introduces a new Google Cloud Dataflow template specifically designed for copying documents between two Firestore databases.
Configurable Parameters: The template supports various parameters, including source and destination project IDs, database IDs, and optional collection IDs for selective replication. It also allows specifying a read time for consistent data snapshots.
Efficient Data Transfer: The pipeline leverages Firestore's partition query capabilities to efficiently read documents in parallel and prepares them for batch writing to the destination, ensuring data consistency at the end of the process.
Comprehensive Documentation: Includes a detailed README with instructions for building, staging, and running the template, along with guidance for Terraform integration.
Modular Design: The core logic is broken down into dedicated PTransform and DoFn classes for creating partition queries, extracting documents, and preparing write operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

pacoavila808 · 2025-12-11T00:13:07Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new Dataflow batch template for copying documents between two Firestore databases. The implementation is well-structured, using partitioned reads for scalability and breaking down the logic into reusable transforms. The accompanying unit tests for the transforms are thorough.

I've identified a critical issue where the pipeline will fail at submission if run with default autoscaling settings (maxNumWorkers=0). I've also found some areas for improvement regarding documentation, error handling, and making the code more robust and readable. Please see the detailed comments for suggestions.

...-to-firestore/src/main/java/com/google/cloud/teleport/v2/templates/FirestoreToFirestore.java

gemini-code-assist · 2025-12-11T00:15:12Z

...irestore-to-firestore/src/main/java/com/google/cloud/teleport/v2/templates/package-info.java

+/**
+ * Templates for streaming data from DataStream to MongoDB.
+ *
+ * <p>This package contains a template for streaming data from DataStream to MongoDB using the
+ * MongoDB IO connector. The template allows direct writing of BSON documents.
+ */


The package documentation appears to be copied from another template and is incorrect. It refers to 'DataStream to MongoDB' instead of describing the Firestore to Firestore template.

/** * Templates for batch copying data from one Firestore database to another. * * <p>This package contains a template for copying documents between Firestore databases. */

gemini-code-assist · 2025-12-11T00:15:12Z

...-to-firestore/src/main/java/com/google/cloud/teleport/v2/templates/FirestoreToFirestore.java

+        groupName = "Destination",
+        order = 4,
+        description = "Destination Project ID",
+        helpText = "The destination project to write to. Defaults to the source project if not set",


The help text is missing a period at the end. This causes a formatting issue in the generated README.md file where the example text runs directly into the preceding sentence.

Suggested change

helpText = "The destination project to write to. Defaults to the source project if not set",

helpText = "The destination project to write to. Defaults to the source project if not set.",

gemini-code-assist · 2025-12-11T00:15:12Z

...ore/src/main/java/com/google/cloud/teleport/v2/transforms/CreatePartitionQueryRequestFn.java

+                );
+            PartitionQueryRequest request = PartitionQueryRequest.newBuilder()
+                .setParent(DocumentRootName.of(projectId, databaseId).toString())
+                .setPartitionCount(partitionCount - 1)


The logic partitionCount - 1 is confusing. The partitionCount in PartitionQueryRequest is documented as 'The desired number of partitions to return'. Subtracting 1 here is not intuitive and makes the code prone to errors if the calling code doesn't ensure partitionCount is greater than 1.

Please add a comment explaining why 1 is subtracted. If there's no strong reason, consider removing the subtraction and adjusting the calling code to pass the desired number of partitions directly. This would make the logic clearer.

gemini-code-assist · 2025-12-11T00:15:12Z

...tore-to-firestore/src/main/java/com/google/cloud/teleport/v2/transforms/PrepareWritesFn.java

+    String originalName = doc.getName();
+    String path = originalName.substring(originalName.indexOf("/documents/") + 1);


This string manipulation is brittle. If originalName does not contain /documents/, indexOf will return -1, and substring will be called with 0. This would use the entire originalName as the path, leading to a malformed destination document name.

It's safer to validate that /documents/ exists in the name before proceeding.

Suggested change

String originalName = doc.getName();

String path = originalName.substring(originalName.indexOf("/documents/") + 1);

String originalName = doc.getName();

int documentsPathIndex = originalName.indexOf("/documents/");

if (documentsPathIndex < 0) {

throw new IllegalArgumentException("Invalid document name format: " + originalName);

}

String path = originalName.substring(documentsPathIndex + 1);

gemini-code-assist · 2025-12-11T00:15:13Z

...firestore/src/test/java/com/google/cloud/teleport/v2/templates/FirestoreToFirestoreTest.java

+  // or mocking of FirestoreIO's builders and transforms.
+  // This test just checks if the pipeline can be constructed without errors.
+  @Test
+  public void testPipelineConstruction() {


This test is quite basic and mainly serves as a placeholder. It constructs a partial pipeline but doesn't run it or assert any behavior. While the comment acknowledges that fully testing FirestoreIO is complex, the test could be improved to provide more value.

Consider using the TestPipeline rule and running it to verify that the pipeline can be fully constructed with the provided options without errors. You could also use PAssert to verify the intermediate PCollections before they are passed to FirestoreIO, which would provide more confidence in the pipeline's logic.

…ort/v2/templates/FirestoreToFirestore.java Add default for maxNumWorkers Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

pacoavila808 added 27 commits December 2, 2025 22:23

Initial version of FirestoreToFirestore copy pipeline

9ec04fa

reformat

5fb2571

Fix pipeline description in pom.xml

9763fb2

Cleanup options, refactor transforms

9d94e5a

Cleanup comments

50a5fe2

remove classic-templates dependency

82f663e

Fix style issues

d5321f1

Revert changes to DatastoreConverters

59d6028

Add logging and UncaughtExceptionLogger

4c4cd0a

Wrap in try catch

f0314d1

Don't wait for pipeline to finish

0fa179f

Set collection id and clean up workers parameter

977a77f

Fix database id bug

25bf927

Add project and database to read fn, Update beam version to 2.71.0-SN…

8963782

…APSHOT, cleanup flag ordering

Spotless apply

ae06e74

Fix database parameters provided

9e9c2a2

Remove unused parameter. Comment cleanup.

848deab

Generate readme

2c51f39

Cleanup docs

10e5e24

Add support for multiple collections

832680f

Cleanup tests

7ea150b

regen docs

83e6822

Default to all collections if no flag provided

b2b8408

Rename variables

305319c

Default to source project if not provided

743a1b3

Require database id

337d3ec

Add options validation

fd47b33

pull-request-size bot added the size/XXL label Dec 10, 2025

regen docs

ace3332

cleanup

9d25a62

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

pacoavila808 and others added 8 commits December 11, 2025 00:16

Generate terraform template

accaad4

Update v2/firestore-to-firestore/src/main/java/com/google/cloud/telep…

80a1c78

…ort/v2/templates/FirestoreToFirestore.java Add default for maxNumWorkers Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Fix comments in package-info

c105ebe

Address gemini comments

a125cd8

Add integration test

e51ccfc

working integration test with fixed database ids

71e4983

Provision databases as part of test setup

93ddde4

cleanup

7de98db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataflow Template for Firestore to Firestore Documents Copy #3098

Dataflow Template for Firestore to Firestore Documents Copy #3098

Uh oh!

pacoavila808 commented Dec 10, 2025

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Uh oh!

pacoavila808 commented Dec 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

gemini-code-assist bot Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	helpText = "The destination project to write to. Defaults to the source project if not set",
	helpText = "The destination project to write to. Defaults to the source project if not set.",

		String originalName = doc.getName();
		String path = originalName.substring(originalName.indexOf("/documents/") + 1);

-    String originalName = doc.getName();
-    String path = originalName.substring(originalName.indexOf("/documents/") + 1);
+    String originalName = doc.getName();
+    int documentsPathIndex = originalName.indexOf("/documents/");
+    if (documentsPathIndex < 0) {
+      throw new IllegalArgumentException("Invalid document name format: " + originalName);
+    }
+    String path = originalName.substring(documentsPathIndex + 1);

Dataflow Template for Firestore to Firestore Documents Copy #3098

Are you sure you want to change the base?

Dataflow Template for Firestore to Firestore Documents Copy #3098

Uh oh!

Conversation

pacoavila808 commented Dec 10, 2025

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

pacoavila808 commented Dec 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant