Skip to content

Conversation

@tgeoghegan
Copy link

StatementCache aims to prevent redundant SQL statement preparation with a simple cache. However in cases where many, many writers are racing to prepare a statement, lots of redundant and expensive work can be done because tokio_postgres::client::Client::prepare_typed is called before StatementCache::insert, and only the latter takes the lock on the cache.

This commit allows entries in the StatementCache map to be either empty, a Semaphore, or a prepared Statement. The first call to prepare_typed() will insert a Semaphore. Then, tasks will wait for a semaphore permit before preparing the query, to ensure that only one task sends a PREPARE statement at once. At both steps in the process, the map will be read via a read lock first, then again via a write lock, before updating the entry.

@bikeshedder bikeshedder added A-postgres Area: PostgreSQL support / deadpool-postgres enhancement New feature or request labels Jul 7, 2025
@bikeshedder
Copy link
Collaborator

Great addition to the library! <3

I'm currently busy with another PR and will merge this once the workspace removal is complete:

@tgeoghegan
Copy link
Author

Cool, thank you for taking a look. Please let me know if there's anything I can do to improve this change or make it easier to merge, such as rebasing after #429 merges.

@bikeshedder bikeshedder force-pushed the main branch 3 times, most recently from 7a9da3d to f713c85 Compare September 2, 2025 11:51
Copy link
Collaborator

@bikeshedder bikeshedder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR needs to be rebased on the latest main branch. Apart from that and a small change request (Either is too generic for my taste) it's a very cool addition.

Adding tests to test for possible race conditions would be the cherry on top.

`StatementCache` aims to prevent redundant SQL statement preparation
with a simple cache. However in cases where many, many writers are
racing to prepare a statement, lots of redundant and expensive work can
be done because `tokio_postgres::client::Client::prepare_typed` is
called before `StatementCache::insert`, and only the latter takes the
lock on the cache.

This commit allows entries in the `StatementCache` map to be either
empty, a Semaphore, or a prepared Statement. The first call to
`prepare_typed()` will insert a `Semaphore`. Then, tasks will wait for a
semaphore permit before preparing the query, to ensure that only one
task sends a `PREPARE` statement at once. At both steps in the process,
the map will be read via a read lock first, then again via a write lock,
before updating the entry.
@tgeoghegan tgeoghegan force-pushed the redundant-statement-prep branch from 965d66f to c9699fe Compare October 30, 2025 15:08
@tgeoghegan
Copy link
Author

tgeoghegan commented Oct 30, 2025

I rebased and addressed your comment in crates/deadpool-postgres/src/lib.rs. Testing for race conditions sounds like a good idea. Does the project have anything in place to do that? If not, how would you want to go about doing it? Perhaps a test that sets up a DB connection and then creates 100s, or 1000s of futures all trying to prepare a statement?

I'm also unsure what to do about the MSRV and "Check re-exported features" check failures. I don't believe my PR changes anything about the Rust version in use or crate features, so perhaps these checks are also failing on main?

@divergentdave
Copy link

Testing for race conditions sounds like a good idea. Does the project have anything in place to do that? If not, how would you want to go about doing it? Perhaps a test that sets up a DB connection and then creates 100s, or 1000s of futures all trying to prepare a statement?

I noticed that the existing test suite assumes a Postgres database is available, and discovers connection parameters through environment variables.

One straightforward approach would be to start many tasks, synchronize them with tokio::sync::Barrier, prepare the same statement in each cache, and collect the results. We could compare the Debug representation of each tokio_postgres::Statement that is returned, in order to see if the statements all have the same name. tokio-postgres assigns automatically-generated names to statements, starting with "s0", with increasing numbers, so this would indirectly tell us how many prepare commands were sent to the database. The downsides are that this would be racy (so you may want to repeat the whole exercise a few times with fresh pools) and that it depends on undocumented behavior of tokio-postgres.

Alternately, a heavyweight approach would be to write a mock Postgres server that speaks the backend's side of the wire protocol, build a pool that connects to it, and try to prepare a statement twice. We could delay the response to the first "Parse" command, and confirm no second "Parse" command arrives on another connection before some timeout. This would take more effort to write the infrastructure for (note that tokio-protocol wouldn't help much, as it is focused on frontend implementations), and if the implementation-specific behavior of tokio-postgres changes over time, it may be necessary to update the mock server to provide fake responses to a different set of messages.

@bikeshedder
Copy link
Collaborator

I rebased and addressed your comment in crates/deadpool-postgres/src/lib.rs. Testing for race conditions sounds like a good idea. Does the project have anything in place to do that? (...)

tbh. I don't really have a great idea for that either.

The project comes with a Dev Container with a ready-to-use PostgreSQL database. That's quite similar to how the CI runs the checks. I do think however that it's quite difficult to design a test for this with an actual PostgreSQL database being used.

I was wondering if the logic behind this statement cache optimization could be extracted into a testable synchronization primitive. This primitive could be tested without requiring a running PostgreSQL at all.

I'm also unsure what to do about the MSRV and "Check re-exported features" check failures. I don't believe my PR changes anything about the Rust version in use or crate features, so perhaps these checks are also failing on main?

You can ignore those. This always happens when the tokio-postgres crate adds a new feature. Regarding the MSRV check I'm very unhappy with the way this is implemented but breakage can also happen when dependencies get updated. That should not happen but the dependency resolution of Cargo still hasn't added a way to reliably implement MSRV checks. 😫

I will add the missing features in another PR and fix the MSRV check. The checks were not broken by your changes.

@bikeshedder
Copy link
Collaborator

The only place where the statement cache actually interacts with PostgreSQL are the prepare and prepare_typed methods. I was wondering if we could either replace those methods that take a generic callable that is used to prepare the statement (so it can be mocked in tests) or remove those methods in favor of a version that returns some kind of Guard object which can then be used to either wait for the statement to be ready or prepare if the semaphore was acquired.

@tgeoghegan
Copy link
Author

I will take a swing at the testable abstraction you describe. It might take me a little while to come back around to this PR, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-postgres Area: PostgreSQL support / deadpool-postgres enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants