Precompile Caching MVP #8095

garyschulte · 2025-01-09T23:22:35Z

PR description

PR adds precompile caching behavior for an MVP set of precompiles that are costly enough to benefit from caching. Provision is added to disable caching via command line arg (for gas costing reasons), but it is enabled by default in besu, and disabled by default in evmtool and benchmark subcommand.

Changes:

add a static member and setter in AbstractPrecompiledContract used to control whether we want to cache results
add precompile-specific LRU caches with rational size limits in each MVP precompile
add a cli arg for precompile caching, defaulted to true

MVP precompiles include:

altbn128/bn254 precompiles for add, mul and pairing
ecrecover precompile
blake2 precompile
kzg point precompile
bls precompiles

Feedback welcome on the design choices:

one cache per precompile contract (since each will have different input and output size characteristics)
cache is <hashCode, input_and_result_tuple> in order to verify input is truly identical rather than just matching by hashCode (it is trivial to construct requests that have different inputs, but similar Bytes hashCode)

Parallel transaction execution should benefit from precompile caching when state conflicts are detected. Attached are preliminary results from the nethermind gas-benchmarks suite which indicate performance does not seem to take a hit for cache checking and misses, and the caching itself is effective for repetitive/identical inputs

updated:
ecmul_new.pdf
ecrec_new.pdf
blake2f_new.pdf

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests

ahamlat

I think it makes sens to have a cache per precompile as we discussed. Also you need to change the key to use a hashing function that has no collisions, as the hashcode method that returns int doesn't can have collisions

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128AddPrecompiledContract.java

ethereum/evmtool/src/main/java/org/hyperledger/besu/evmtool/BenchmarkSubCommand.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128MulPrecompiledContract.java

github-actions · 2025-02-22T01:58:58Z

This pr is stale because it has been open for 30 days with no activity.

github-actions · 2025-03-09T01:51:44Z

This pr was closed because it has been inactive for 14 days since being marked as stale.

besu/src/main/java/org/hyperledger/besu/cli/BesuCommand.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/AbstractPrecompiledContract.java

besu/src/main/java/org/hyperledger/besu/cli/BesuCommand.java

ahamlat

Could you review the way the hashcode is calculate for some precompiles (see below) ? It is sometimes calculated twice where we can store it and use it to cache the result. Also, I wonder why you didn't add a cache for KZGPointEvalPrecompiledContract.

evm/src/main/java/org/hyperledger/besu/evm/precompile/BLAKE2BFPrecompileContract.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/ECRECPrecompiledContract.java

ahamlat · 2025-03-19T12:43:38Z

Other than the small requested changes above, the PR is great, I think we can decouple it from parallel transaction execution as it can help even when parallel transaction execution is not enabled.
In terms of performance, in addition to what @garyschulte shared, profiling two nodes running this PR against two nodes running version 25.2.0 showed that the cache works fine, especially for EcRecover, and it reduces precompile execution time, reducing the number of samples for a period of 300 second sampling from ~18 to 1 sample in both cases.
A sample is collected each 11 ms. The profiling is done on the same blocks.

Without this PR

With this PR

evm/src/main/java/org/hyperledger/besu/evm/precompile/ECRECPrecompiledContract.java

ahamlat · 2025-03-19T14:16:15Z

An interesting idea from this implementation, is that you created a cache where the eviction mechanism is based on hashcode collisions. It is like a hashmap where we keep only one node behind each bucket (hashcode index), the new value will always replace the existing one.

garyschulte · 2025-03-19T20:34:49Z

Could you review the way the hashcode is calculate for some precompiles (see below) ? It is sometimes calculated twice where we can store it and use it to cache the result. Also, I wonder why you didn't add a cache for KZGPointEvalPrecompiledContract.

Will do, and will add similar caches for the pectra BLS precompiles.

garyschulte · 2025-03-20T21:59:18Z

OK, I found the issues with false positives. It appears we are subsequently mutating the input bytes. So the input value in our precompile result tuple was getting mutated after we cached it, and that was the source of the false positives.

What I have done for all precompiles is to copy the input in the precompile result tuple IF we have caching enabled. This seems to be the sweet spot for removing false positives. We should have little to no impact on precompile performance or overhead if caching is disabled.

Since these changes, I have not seen any false positives:

If you would, please re-review at your leisure @ahamlat 🙏

ahamlat

LGTM, just few comments.
I will approve once I checked the impact of the copy of the input.

besu/src/main/java/org/hyperledger/besu/cli/BesuCommand.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128AddPrecompiledContract.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/ECRECPrecompiledContract.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128MulPrecompiledContract.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128AddPrecompiledContract.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/BLAKE2BFPrecompileContract.java

evm/src/main/java/org/hyperledger/besu/evm/precompile/KZGPointEvalPrecompiledContract.java

ahamlat

Could you update the benchmarks in the description with the results from the last implementation ?
The CPU profiling shows an improvement on the nodes running this PR :
Control node 1 / 11 samples for MessageCallProcessor.executePrecompile

Control node 2 / 11 samples for MessageCallProcessor.executePrecompile

Node 1 running this PR / 4 samples for MessageCallProcessor.executePrecompile

Node 2 running this PR / 4 samples for MessageCallProcessor.executePrecompile

garyschulte · 2025-03-21T19:37:31Z

Updated the pdf docs. Input copying doesn't seem to be a big problem in this case. The ecmul point-at-infinity optimization performs better without the cache - interestingly even when we run the optimization check before the cache check. But otherwise the current implementation looks to be better all around.

garyschulte · 2025-03-21T20:22:26Z

Added bls precompile caching also. The gas-benchmarks suite doesn't have bls precompile tests, but evmtool gives some pretty dramatic results:

➜  besu git:(feature/precompile-caching-part1) ✗ build/install/besu/bin/evmtool benchmark --native --use-precompile-cache bls12      
besu/v25.3-develop-f4df214/osx-aarch_64/corretto-java-22
Benchmarks for Bls12
Bls12 G1 Add    375 avg gas @    1.1 µs /   332.3 MGps
Bls12 G1 MSM   501,672 total gas @   13.2 µs /38,046.9 MGps
Bls12 MapFpToG1  5,500 avg gas @    0.5 µs /10,890.2 MGps
Bls12 G2 Add    600 avg gas @    0.9 µs /   697.4 MGps
Bls12 G2 MSM   991,080 total gas @   16.6 µs /59,835.1 MGps
Bls12 MapFp2G1 23,800 avg gas @    0.1 µs /184,270.0 MGps
Bls12 Pairing 2,209,700 total gas @   20.7 µs /106,567.6 MGps

versus

➜  besu git:(feature/precompile-caching-part1) ✗ build/install/besu/bin/evmtool benchmark --native --use-precompile-cache=false bls12
besu/v25.3-develop-f4df214/osx-aarch_64/corretto-java-22
Benchmarks for Bls12
Bls12 G1 Add    375 avg gas @    5.6 µs /    66.9 MGps
Bls12 G1 MSM   501,672 total gas @4,155.8 µs /   120.7 MGps
Bls12 MapFpToG1  5,500 avg gas @   47.9 µs /   114.7 MGps
Bls12 G2 Add    600 avg gas @    6.3 µs /    95.1 MGps
Bls12 G2 MSM   991,080 total gas @6,888.4 µs /   143.9 MGps
Bls12 MapFp2G1 23,800 avg gas @  226.9 µs /   104.9 MGps
Bls12 Pairing 2,209,700 total gas @22,388.5 µs /    98.7 MGps

garyschulte · 2025-03-24T17:54:54Z

Also I don't see metrics for BLS12_G1ADD, BLS12_G1MULTIEXP, BLS12_G2ADD, BLS12_G2MULTIEXP, BLS12_MAP_FIELD_TO_CURVE, BLS12_PAIRING. I guess, it is because these precompiles are not called, but it is worth double checking.

BLS doesn't go live until pectra, thus no stats yet...

garyschulte · 2025-03-24T18:18:57Z

Sharing the metrics on the configured precompiles on Ethereum mainnet, after ~40 minutes of executions. We're using counters, so this is the hit ratio for the 40 minutes of execution. From the metrics, we can reconsider at least enabling the cache on KZGPointEval, as the hit ratio is 0. It was suggestion from my side to enable caching on KZGPointEval, but the metrics show that it is not a good candidate for caching.

Over 4 days, I see a pretty low hit ratio:

nuc 8: 198 / 2665
nuc 14: 146 / 2594

how low of a ratio makes it not worth it ?

ahamlat · 2025-03-25T08:30:39Z

Sharing the metrics on the configured precompiles on Ethereum mainnet, after ~40 minutes of executions. We're using counters, so this is the hit ratio for the 40 minutes of execution. From the metrics, we can reconsider at least enabling the cache on KZGPointEval, as the hit ratio is 0. It was suggestion from my side to enable caching on KZGPointEval, but the metrics show that it is not a good candidate for caching.

Over 4 days, I see a pretty low hit ratio:

nuc 8: 198 / 2665 nuc 14: 146 / 2594

how low of a ratio makes it not worth it ?

It depends on :

The cost of checking if the entry is on the cache and adding it to the cache (i.e overhead of caching)
vs
The cost of the precompile execution itself.

So in the case of the metrics you shared, the cache has an overhead on 2665 calls and could avoid the execution of only 198 precompile calls.
The overhead here is calculating the hashcode on the input byte array, and checking if the integer key (hashcode result) is in the cache. There is also the overhead of adding the new execution result to the cache. So in this case, we're improving 7% of the calls and generating an overhead on 93% of the calls.
This kind of cache can still be interesting if the execution of the precompile is slow, let me get more metrics on KZGPointEval.

ahamlat · 2025-03-25T16:47:35Z

It depends on :

The cost of checking if the entry is on the cache and adding it to the cache (i.e overhead of caching)
vs

The cost of the precompile execution itself.

So in the case of the metrics you shared, the cache has an overhead on 2665 calls and could avoid the execution of only 198 precompile calls. The overhead here is calculating the hashcode on the input byte array, and checking if the integer key (hashcode result) is in the cache. There is also the overhead of adding the new execution result to the cache. So in this case, we're improving 7% of the calls and generating an overhead on 93% of the calls. This kind of cache can still be interesting if the execution of the precompile is slow, let me get more metrics on KZGPointEval.

So the execution time of KZGPointEval call is around 500 us on my laptop, which pretty slow. Will suggest a PR tomorrow to add it to the existing benchmarks. I think even if it is only 7 % hit ratio, we can keep it.
As a reference, EcRecover takes around 50 us, around 10x faster.

Signed-off-by: garyschulte <[email protected]>

…ck before cache check in ecmul Signed-off-by: garyschulte <[email protected]>

Signed-off-by: garyschulte <[email protected]>

… inputs may be large Signed-off-by: garyschulte <[email protected]>

Signed-off-by: garyschulte <[email protected]>

garyschulte force-pushed the feature/precompile-caching-part1 branch from 49dd4dc to e9155f3 Compare January 10, 2025 00:27

garyschulte changed the title ~~Precompile caching part1~~ Precompile caching MVP Jan 10, 2025

garyschulte changed the title ~~Precompile caching MVP~~ Precompile Caching MVP Jan 10, 2025

garyschulte force-pushed the feature/precompile-caching-part1 branch from 0d95b1d to 67f912f Compare January 10, 2025 00:34

ahamlat requested changes Jan 10, 2025

View reviewed changes

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128AddPrecompiledContract.java Outdated Show resolved Hide resolved

ahamlat reviewed Jan 10, 2025

View reviewed changes

ethereum/evmtool/src/main/java/org/hyperledger/besu/evmtool/BenchmarkSubCommand.java Show resolved Hide resolved

ahamlat reviewed Jan 10, 2025

View reviewed changes

evm/src/main/java/org/hyperledger/besu/evm/precompile/AltBN128MulPrecompiledContract.java Show resolved Hide resolved

garyschulte force-pushed the feature/precompile-caching-part1 branch from d519ece to 73b29e1 Compare January 23, 2025 00:30

github-actions bot added the Stale label Feb 22, 2025

github-actions bot closed this Mar 9, 2025

garyschulte reopened this Mar 18, 2025

garyschulte force-pushed the feature/precompile-caching-part1 branch from 5de5681 to 93bb963 Compare March 18, 2025 14:29

ahamlat reviewed Mar 18, 2025

View reviewed changes

besu/src/main/java/org/hyperledger/besu/cli/BesuCommand.java Outdated Show resolved Hide resolved

evm/src/main/java/org/hyperledger/besu/evm/precompile/AbstractPrecompiledContract.java Show resolved Hide resolved

garyschulte commented Mar 18, 2025

View reviewed changes

besu/src/main/java/org/hyperledger/besu/cli/BesuCommand.java Outdated Show resolved Hide resolved

github-actions bot removed the Stale label Mar 19, 2025

ahamlat requested changes Mar 19, 2025

View reviewed changes

evm/src/main/java/org/hyperledger/besu/evm/precompile/BLAKE2BFPrecompileContract.java Outdated Show resolved Hide resolved

evm/src/main/java/org/hyperledger/besu/evm/precompile/ECRECPrecompiledContract.java Outdated Show resolved Hide resolved

ahamlat reviewed Mar 19, 2025

View reviewed changes

evm/src/main/java/org/hyperledger/besu/evm/precompile/ECRECPrecompiledContract.java Outdated Show resolved Hide resolved

garyschulte force-pushed the feature/precompile-caching-part1 branch 2 times, most recently from 08fd69d to 352c9f3 Compare March 20, 2025 21:14

ahamlat reviewed Mar 21, 2025

View reviewed changes

ahamlat approved these changes Mar 21, 2025

View reviewed changes

garyschulte force-pushed the feature/precompile-caching-part1 branch from 352c9f3 to fb18fd6 Compare March 21, 2025 20:21

garyschulte force-pushed the feature/precompile-caching-part1 branch from 995504b to edeeace Compare March 24, 2025 17:36

garyschulte force-pushed the feature/precompile-caching-part1 branch from 1c0ae22 to 4e742ea Compare March 25, 2025 14:52

garyschulte added 21 commits March 31, 2025 12:24

convert PrecompileContractResult to record

dbe667e

Signed-off-by: garyschulte <[email protected]>

PoC caching of bn254 precompiles

c6b666b

Signed-off-by: garyschulte <[email protected]>

safer precompile result caching comparison

b8bd0cf

Signed-off-by: garyschulte <[email protected]>

add ecrecover caching

25c6726

Signed-off-by: garyschulte <[email protected]>

add blake2 caching

67e268a

Signed-off-by: garyschulte <[email protected]>

add cli param to disable precompile caching

ece1eca

Signed-off-by: garyschulte <[email protected]>

defaults, spotless, javadoc

6a091b6

Signed-off-by: garyschulte <[email protected]>

add precompile arg to evmtool benchmark subcommand

06e9eaf

Signed-off-by: garyschulte <[email protected]>

add HIT/MISS/FALSE_POSITIVE metrics for precompile caching by operation

0b7e7f3

Signed-off-by: garyschulte <[email protected]>

nuisance javadoc

58dc861

Signed-off-by: garyschulte <[email protected]>

spotless

5275d7d

Signed-off-by: garyschulte <[email protected]>

fix blake cachekey, add kzg precompile caching

5530907

Signed-off-by: garyschulte <[email protected]>

input copying for precompile caches, debug output cleanup

735d6e5

Signed-off-by: garyschulte <[email protected]>

update with feedback from Amez, common getCacheKey, move infinity che…

3ed20b0

…ck before cache check in ecmul Signed-off-by: garyschulte <[email protected]>

add bls precompile caching

3f4feac

Signed-off-by: garyschulte <[email protected]>

spotless/javadoc

43ce34d

Signed-off-by: garyschulte <[email protected]>

default precompile caching to disabled

bad2215

Signed-off-by: garyschulte <[email protected]>

use size based cache for bls MSM and pairing precompiles, since their…

bd2b26d

… inputs may be large Signed-off-by: garyschulte <[email protected]>

comments and spotless

5c916fd

Signed-off-by: garyschulte <[email protected]>

add TTL to large precompile caches

590d2d8

Signed-off-by: garyschulte <[email protected]>

spotless, rebase

aad1762

Signed-off-by: garyschulte <[email protected]>

garyschulte force-pushed the feature/precompile-caching-part1 branch from 4e742ea to aad1762 Compare March 31, 2025 20:25

garyschulte enabled auto-merge (squash) March 31, 2025 20:27

garyschulte merged commit 2440f6a into hyperledger:main Mar 31, 2025
43 checks passed

Precompile Caching MVP #8095

Precompile Caching MVP #8095

Uh oh!

Conversation

garyschulte commented Jan 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

Uh oh!

ahamlat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 22, 2025

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahamlat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ahamlat commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ahamlat commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garyschulte commented Mar 19, 2025

Uh oh!

garyschulte commented Mar 20, 2025

Uh oh!

ahamlat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahamlat left a comment

Choose a reason for hiding this comment

Uh oh!

garyschulte commented Mar 21, 2025

Uh oh!

garyschulte commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garyschulte commented Mar 24, 2025

Uh oh!

garyschulte commented Mar 24, 2025

Uh oh!

ahamlat commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahamlat commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

garyschulte commented Jan 9, 2025 •

edited

Loading

ahamlat left a comment •

edited

Loading

ahamlat commented Mar 19, 2025 •

edited

Loading

ahamlat commented Mar 19, 2025 •

edited

Loading

garyschulte commented Mar 21, 2025 •

edited

Loading

ahamlat commented Mar 25, 2025 •

edited

Loading

ahamlat commented Mar 25, 2025 •

edited

Loading