feat(storage): calculate crc32c by default and pass checksum in trailing and per-chunk request #13205

krishnamd-jkp · 2025-10-22T12:13:13Z

calculate crc32c by default and pass checksum in trailing and per-chunk request

gemini-code-assist · 2025-10-22T12:13:31Z

Summary of Changes

Hello @krishnamd-jkp, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant improvement to data integrity within the storage client by enabling automatic CRC32C checksum calculation for all write operations by default. This ensures that data is verified during transmission, reducing the risk of corruption. Additionally, it provides flexibility for advanced users by allowing them to disable this automatic checksumming if their specific use case requires it, without preventing the use of manually provided checksums.

Highlights

Default CRC32C Calculation: The system now automatically calculates CRC32C checksums for data being written by default, enhancing data integrity.
Checksum Transmission: Checksums are now consistently passed in both trailing requests and per-chunk requests during data writes.
Disable CRC32C Option: A new DisableCRC32C option has been introduced, allowing users to opt-out of the automatic CRC32C calculation and validation if needed, while still supporting user-provided checksums.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces default CRC32C checksum calculation for gRPC uploads, which is a great enhancement for data integrity. The implementation includes options to disable this behavior or provide user-defined checksums. The changes are well-structured and include new unit tests for the checksum logic. I have a few suggestions to improve code clarity and reduce duplication, which I've detailed in the comments.

storage/client.go

storage/grpc_writer.go

storage/grpc_writer_test.go

storage/grpc_writer.go

storage/grpc_writer_test.go

storage/writer.go

storage/grpc_writer_test.go

krishnamd-jkp · 2025-10-23T18:16:30Z

Adhoc benchmarking results:

Metrics for 10 files 64MB each with 10 workers running parallel with test execution of 5 minutes

Without warmup
release 1.57.0
Total throughput extrapolated to 100MB: 0.22/GiB/s
Median throughput extrapolated to 100MB: 0.26 GiB/s
Median upload time: 3.7661s
P10 upload time: 3.1413s
P90 upload time: 6.1835s

PR
Total throughput extrapolated to 100MB: 0.27/GiB/s
Median throughput extrapolated to 100MB: 0.24 GiB/s
Median upload time: 3.633s
P10 upload time: 2.894s
P90 upload time: 6.295s

With 5 min warmup
release 1.57.0
Total throughput extrapolated to 100MB: 0.29 GiB/s
Median throughput extrapolated to 100MB: 0.31 GiB/s
Median upload time: 3.1737s
P10 upload time: 2.8136s
P90 upload time: 4.3056s

PR
Total throughput extrapolated to 100MB: 0.25 GiB/s
Median throughput extrapolated to 100MB: 0.30 GiB/s
Median upload time: 3.2925s
P10 upload time: 2.8041s
P90 upload time: 5.9242s

tritone

Couple comments... I would also suggest the following:

Take a look at #12477; do we want to support user-provided trailing checksums as well?
Can we do a CPU profile of the benchmark to understand the overhead of the automatic checksumming?

storage/grpc_writer.go

tritone · 2025-10-23T18:50:44Z

storage/writer.go

 	// point, the checksum will be ignored.
 	SendCRC32C bool

+	// DisableCRC32C disables the automatic CRC32C checksum calculation and


Note that this only works for gRPC, and does not work for unfinalized writes to appendable objects. I would also rename it something like DisableAutoChecksum maybe?

Also, I think the docs for both this and SendCRC32C are a little unclear how they work together. I would want the user to understand:

If they provide a checksum via SendCRC32C, we will not do checksum calculations and send their checksum on the first request to GCS.

If they do not provide a checksum via SendCRC32C, do not set DisableCRC32C, and this is a finalized write via gRPC, we will automatically calculate the checksum and send it on the last message.

If they set DisableCRC32C we will never calculate or send a checksum

Does that sound right? And does it match the actual behavior?

Yeah I think DisableAutoChecksum name works better.
The behavior is -

If DisableAutoChecksum is set, checksum calculation in the writer is disabled i.e., both chunk-wise checksum will be disabled and full object checksum calculation is also disabled in the writer. However, if user configures their checksum, it will be sent on both first and last write

If DisableAutoChecksum is not set, chunk-wise calculation is sent to GCS by the writer. On the final write, the grpc writer prioritizes user's checksum over auto calculated checksum. So on the last write, if user's checksum is provided, writer sends user's checksum to GCS. If user doesn't specify any checksum, auto calculated checksum will be sent to GCS.

Ah, then maybe we should disambiguate between per-message checksums and whole-object checksums? If this is already implemented in other clients, do they offer separate options for each of these, or just one?

Can you check if this clarifies the behavior?

storage/grpc_writer.go

krishnamd-jkp · 2025-10-24T05:24:55Z

@tritone -

Couple comments... I would also suggest the following:

Take a look at feat(storage): send trailing checksums for gRPC resumable uploads #12477; do we want to support user-provided trailing checksums as well?

This PR handles trailing checksums as well. Please check "getObjectChecksums" method.
`

if !finishWrite {
		return nil
}

// send user's checksum on last write op if available
if sendCRC32C {
	return toProtoChecksums(sendCRC32C, attrs)
}

`

krishnamd-jkp · 2025-10-24T10:15:25Z

CPU profile benchmarking -

The profile was captured over a duration of 300.11 seconds, with a total of 137.21 seconds of CPU time sampled.

Top 5 CPU consuming functions -

System Calls (internal/runtime/syscall.Syscall6): 66.58s (48.52%)
Encryption (crypto/.../gcm.gcmAesEnc): 15.00s (10.93%)
Memory Operations (runtime.memmove): 14.66s (10.68%) of the CPU time
Concurrency (runtime.futex): 10.08s (7.35%).
Checksumming (hash/crc32.castagnoliSSE42Triple): 8.38s (6.11%) of the CPU time.

tritone · 2025-10-24T15:45:57Z

CPU profile benchmarking -

The profile was captured over a duration of 300.11 seconds, with a total of 137.21 seconds of CPU time sampled.

Top 5 CPU consuming functions -

System Calls (internal/runtime/syscall.Syscall6): 66.58s (48.52%)

Encryption (crypto/.../gcm.gcmAesEnc): 15.00s (10.93%)

Memory Operations (runtime.memmove): 14.66s (10.68%) of the CPU time

Concurrency (runtime.futex): 10.08s (7.35%).

Checksumming (hash/crc32.castagnoliSSE42Triple): 8.38s (6.11%) of the CPU time.

Cool, I would say this is around what I would have expected. We probably should note in the godoc that SDK auto checksumming has some amount of increased CPU overhead.

tritone · 2025-10-30T19:23:03Z

storage/writer.go

+	// checksum will be sent to GCS for validation by the gRPC writer on final write.
+	//
+	// Note: DisableAutoChecksum must be set to true BEFORE the first call to
+	// Writer.Write(). This flag Works only with gRPC writer.


IMO this is still not super clear. We should communicate that automatic checksumming only works with gRPC, not specifically this flag. The godoc for SendCRC32C should probably be updated as well.

Where did we land on allowing the user to control chunk vs whole object checksums independently only giving one flag? If we want to allow fine-grained control, we could make this field something like *AutoChecksumConfig with separate bools for per-message and per-object.

Also, remove random capitalized words in the godoc (Works, This).

I think giving the user an fine-grained option to disable checksum per-chunk and whole object individually would confuse the users given these are default settings. Made some changes. I think it clarifies this a bit.

storage/writer.go

tritone · 2025-10-30T19:39:52Z

storage/grpc_writer.go

 	case w.writesChan <- cmd:
+		// update fullObjectChecksum on every write and send it on finalWrite
+		if !w.disableAutoChecksum {
+			w.fullObjectChecksum = crc32.Update(w.fullObjectChecksum, crc32cTable, p)


Just wanted to note that you are doing the work twice here for each set of bytes between here and L945. In theory it should be possible to calculate the per-message checksum once per buffer and then use those sums to update the full object checksum as well. It doesn't look like there is an easy interface to do this with in Go, but maybe worth considering if you are trying to save CPU.

In case of no retries, yes, it does seem like the same calculation is being done twice. But in case of retries, the buffer in this line and the buffer in L945 will be out of sync. And I cannot use the checksum in L945 to update the global checksum because we could be using same bytes multiple times in case of retries. So I had to separate these two computations.

storage/writer.go

…ing and per-chunk request

PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v0.7.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/librarian-go@sha256:718167d5c23ed389b41f617b3a00ac839bdd938a6bd2d48ae0c2f1fa51ab1c3d <details><summary>storage: 1.58.0</summary> ## [1.58.0](storage/v1.57.2...storage/v1.58.0) (2025-12-03) ### Features * add object contexts in Go GCS SDK (#13390) ([079c4d9](079c4d96)) * calculate crc32c by default and pass checksum in trailing and per-chunk request (#13205) ([2ab1c77](2ab1c778)) * add support for partial success in ListBuckets (#13320) ([d91e47f](d91e47f2)) ### Bug Fixes * omit empty filter in http list object request (#13434) ([377eb13](377eb13b)) </details> --------- Co-authored-by: Priti Chattopadhyay <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

krishnamd-jkp requested review from a team as code owners October 22, 2025 12:13

product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Oct 22, 2025

gemini-code-assist bot reviewed Oct 22, 2025

View reviewed changes

storage/client.go Outdated Show resolved Hide resolved

storage/grpc_writer.go Outdated Show resolved Hide resolved

storage/grpc_writer.go Outdated Show resolved Hide resolved

krishnamd-jkp force-pushed the crc32c branch from 493d330 to f925f64 Compare October 22, 2025 12:20

krishnamd-jkp added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 22, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 22, 2025

krishnamd-jkp added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 22, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 22, 2025

krishnamd-jkp force-pushed the crc32c branch from 5f9524b to 44fdf30 Compare October 22, 2025 16:45

cpriti-os requested changes Oct 23, 2025

View reviewed changes

krishnamd-jkp requested a review from cpriti-os October 23, 2025 10:41

tritone reviewed Oct 23, 2025

View reviewed changes

tritone mentioned this pull request Oct 23, 2025

feat(storage): send trailing checksums for gRPC resumable uploads #12477

Closed

cpriti-os requested changes Oct 24, 2025

View reviewed changes

storage/grpc_writer.go Outdated Show resolved Hide resolved

krishnamd-jkp requested review from cpriti-os and tritone October 24, 2025 18:32

krishnamd-jkp added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 25, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 25, 2025

tritone reviewed Oct 30, 2025

View reviewed changes

cpriti-os previously approved these changes Nov 5, 2025

View reviewed changes

storage/writer.go Outdated Show resolved Hide resolved

cpriti-os requested a review from tritone November 10, 2025 04:22

krishnamd-jkp added 3 commits November 11, 2025 05:56

feat(storage): calculate crc32c by default and pass checksum in trail…

0056236

…ing and per-chunk request

fix(storage): lock fullObjectChecksum with mutex when accessing

c5ae72b

fix(storage): resolve comments

d43e043

krishnamd-jkp added 2 commits November 11, 2025 05:56

fix(storage): remove unnecessary mutex lock

052d924

fix(storage): clarify comments

17771dd

krishnamd-jkp dismissed cpriti-os’s stale review via 17771dd November 13, 2025 07:44

krishnamd-jkp force-pushed the crc32c branch from ccb7bf2 to 17771dd Compare November 13, 2025 07:44

tritone approved these changes Nov 17, 2025

View reviewed changes

cpriti-os approved these changes Nov 18, 2025

View reviewed changes

krishnamd-jkp merged commit 2ab1c77 into googleapis:main Nov 18, 2025
9 of 10 checks passed

This was referenced Dec 2, 2025

chore: librarian release pull request: 20251202T053138Z #13432

Closed

chore: librarian release pull request: 20251203T063934Z #13438

Merged

feat(storage): calculate crc32c by default and pass checksum in trailing and per-chunk request #13205

feat(storage): calculate crc32c by default and pass checksum in trailing and per-chunk request #13205

Uh oh!

Conversation

krishnamd-jkp commented Oct 22, 2025

Uh oh!

gemini-code-assist bot commented Oct 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krishnamd-jkp commented Oct 23, 2025

Adhoc benchmarking results:

Metrics for 10 files 64MB each with 10 workers running parallel with test execution of 5 minutes

Uh oh!

tritone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tritone Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

krishnamd-jkp Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

tritone Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

krishnamd-jkp Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krishnamd-jkp commented Oct 24, 2025

Uh oh!

krishnamd-jkp commented Oct 24, 2025

Uh oh!

tritone commented Oct 24, 2025

Uh oh!

tritone Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

krishnamd-jkp Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tritone Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

krishnamd-jkp Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

krishnamd-jkp Oct 31, 2025 •

edited

Loading