Skip to content

Conversation

@matt-o-how
Copy link
Contributor

@matt-o-how matt-o-how commented Sep 25, 2025

This PR adds a costed and cached shatree which accounts for cost as if it isn't caching so that further improvements may be made in the future without breaking consensus.

Attached are the benchmarked performance graphs for cost of call, cost per pair and cost for 32byte chunk.

sha256tree-per-pair sha256tree-per-byte sha256tree-base

@coveralls-official
Copy link

coveralls-official bot commented Sep 25, 2025

Pull Request Test Coverage Report for Build 20073713856

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 86 of 335 (25.67%) changed or added relevant lines in 6 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-3.0%) to 87.278%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/chia_dialect.rs 0 1 0.0%
src/test_ops.rs 0 1 0.0%
src/treehash.rs 81 109 74.31%
tools/src/bin/sha256tree-benching.rs 0 219 0.0%
Totals Coverage Status
Change from base Build 18973962912: -3.0%
Covered Lines: 6380
Relevant Lines: 7310

💛 - Coveralls

@matt-o-how matt-o-how marked this pull request as ready for review October 27, 2025 15:49
Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a tool that can be extended that establishes a reasonable cost for new operators as well. We need some kind of benchmark to set the cost.

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you feel confident that the cost benchmarks are good? specifically the cost per byte, cost per pair and cost per atom?


seed(1337)

SHA256TREE_BASE_COST = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the correct value for this is 0, I think we should just remove it

@arvidn
Copy link
Contributor

arvidn commented Nov 20, 2025

new raspberry PI results:

opcode: sha256tree (65)
   time: per-32byte: 132.36ns
   cost: per-32byte: 851
   time: base: 69.57ns
   cost: base: 165
   time: per-node: 586.04ns
   cost: per-node: 3766
   intercept: 293355.28

MacBookPro M1:

opcode: sha256tree (65)
   time: per-32byte: 92.18ns
   cost: per-32byte: 592
   time: base: 31.28ns
   cost: base: 100
   time: per-node: 383.56ns
   cost: per-node: 2465
   intercept: 183875.97

src/treehash.rs Outdated
Comment on lines 32 to 133
#[derive(Default)]
pub struct TreeCache {
hashes: Vec<[u8; 32]>,
// parallel vector holding the cost used to compute the corresponding hash
costs: Vec<Cost>,
// each entry is an index into hashes and costs, or one of 3 special values:
// u16::MAX if the pair has not been visited
// u16::MAX - 1 if the pair has been seen once
// u16::MAX - 2 if the pair has been seen at least twice (this makes it a
// candidate for memoization)
pairs: Vec<u16>,
}

const NOT_VISITED: u16 = u16::MAX;
const SEEN_ONCE: u16 = u16::MAX - 1;
const SEEN_MULTIPLE: u16 = u16::MAX - 2;

impl TreeCache {
/// Get cached hash and its associated cost (if present).
pub fn get(&self, n: NodePtr) -> Option<(&[u8; 32], Cost)> {
// We only cache pairs (for now)
if !matches!(n.object_type(), ObjectType::Pair) {
return None;
}

let idx = n.index() as usize;
let slot = *self.pairs.get(idx)?;
if slot >= SEEN_MULTIPLE {
return None;
}
Some((&self.hashes[slot as usize], self.costs[slot as usize]))
}

/// Insert a cached hash with its associated cost. If the cache is full we
/// ignore the insertion.
pub fn insert(&mut self, n: NodePtr, hash: &[u8; 32], cost: Cost) {
// If we've reached the max size, just ignore new cache items
if self.hashes.len() == SEEN_MULTIPLE as usize {
return;
}

if !matches!(n.object_type(), ObjectType::Pair) {
return;
}

let idx = n.index() as usize;
if idx >= self.pairs.len() {
self.pairs.resize(idx + 1, NOT_VISITED);
}

let slot = self.hashes.len();
self.hashes.push(*hash);
self.costs.push(cost);
self.pairs[idx] = slot as u16;
}

/// mark the node as being visited. Returns true if we need to
/// traverse visitation down this node.
fn visit(&mut self, n: NodePtr) -> bool {
if !matches!(n.object_type(), ObjectType::Pair) {
return false;
}
let idx = n.index() as usize;
if idx >= self.pairs.len() {
self.pairs.resize(idx + 1, NOT_VISITED);
}
if self.pairs[idx] > SEEN_MULTIPLE {
self.pairs[idx] -= 1;
}
self.pairs[idx] == SEEN_ONCE
}

pub fn should_memoize(&mut self, n: NodePtr) -> bool {
if !matches!(n.object_type(), ObjectType::Pair) {
return false;
}
let idx = n.index() as usize;
if idx >= self.pairs.len() {
false
} else {
self.pairs[idx] <= SEEN_MULTIPLE
}
}

pub fn visit_tree(&mut self, a: &Allocator, node: NodePtr) {
if !self.visit(node) {
return;
}
let mut nodes = vec![node];
while let Some(n) = nodes.pop() {
let SExp::Pair(left, right) = a.sexp(n) else {
continue;
};
if self.visit(left) {
nodes.push(left);
}
if self.visit(right) {
nodes.push(right);
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all code related to the cached version should be removed until we have a good solution. I suspect it will need to look very different from this

src/treehash.rs Outdated
enum TreeOp {
SExp(NodePtr),
Cons,
ConsAddCacheCost(NodePtr, Cost),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

including this one

src/treehash.rs Outdated
match op {
TreeOp::SExp(node) => {
// charge a call cost for processing this op
cost += SHA256TREE_COST_PER_NODE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, "node" means both pair and atom, right? When you measure these costs in the benchmark, you just have base-cost, per-32-bytes-cost and per-pair cost, right? It's not clear to me how you establish the cost-per-node.

src/treehash.rs Outdated
}
}
NodeVisitor::Pair(left, right) => {
increment_bytes(65, &mut cost);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also doesn't seem to match how you measure these costs in the benchmark.

src/treehash.rs Outdated
Comment on lines 13 to 15
const SHA256TREE_BASE_COST: Cost = 30;
const SHA256TREE_COST_PER_NODE: Cost = 3000;
const SHA256TREE_COST_PER_32_BYTES: Cost = 700;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me how you establish these costs. They seem to represent different metrics than what you actually measure in the benchmark.

I think it would be good to add comments to these constants to describe what they represent.

}

// this adds 32 bytes at a time compared to per_byte which adds 5 at a time
fn time_per_byte_for_atom(a: &mut Allocator, op: &Operator, output: &mut dyn Write) -> (f64, f64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense that just looking at the slope here, you isolate the cost per 32 bytes. The constant factor is unknown as it includes base cost and cost per node.

samples.push(sample);
}

linear_regression_of(&samples).expect("linreg failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the slope here represents: 2 * cost-per-node + 3 cost-per-32-bytes. Since you charge for hashing 65 bytes in the pair case of the tree hash function.

So, in order to isolate the cost-per-node, you need to subtract 3 cost-per-32-bytes and then divide by two. Am I missing something?

Since this is a bit more complex than the other benchmarks, that isolate a single cost per test, it would be good to add some comments. It might also be good to reconsider how to apply cost. If you don't charge the cost for hashing 65 bytes for a pair, this slope becomes cost-per-pair + cost-per-atom. But then you'd need to isolate the cost per atom as well, separate from bytes.

Comment on lines 86 to 89
(f64, f64),
(f64, f64), // time slopes
(f64, f64),
(f64, f64), // cost slopes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can tell, only the second value in these pairs are slopes, the first is the constant, where the line intersects x=0.

println!("CLVM cost slope : {:.4}", atom_clvm_c.0);

println!("list results: ");
println!("Native time slope (ns): {:.4}", cons_nat_t.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a slope. But labelling these as "slope" isn't very helpful. they are just cost-per-pair, or cost-per-32-bytes or things like that, right?

And the constants don't mean anything at all, as far as I can tell.

Comment on lines +81 to +82
bytes32_native_cost: f64,
bytes32_clvm_cost: f64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost should be specified as u64

Comment on lines +92 to +93
f64,
f64, // cost slopes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cost should be u64. Also, referring to these as "slope" isn't very helpful, I think. It kind of tells you that it's cost per something, but I think it would be much better to refer to these by what they actually represent. It's time per node and cost per node, right? But what are the two other values?

let result_1 = node_to_bytes(a, red.1).expect("should work");
let duration = start.elapsed().as_nanos() as f64;
let duration = (duration - (3.0 * bytes32_native_time)) / 2.0;
let cost = (cost as f64 - (3.0 * bytes32_native_cost)) / 2.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 3.0/2.0 a constant to convert from time to cost? I think it should be a constant with a name and a comment explaining how you arrived at it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you alter cost here. Shouldn't you report the actual cost reported from the CLVM interpreter?

Comment on lines +151 to +152
linear_regression_of(&samples_cost_native).unwrap().0,
linear_regression_of(&samples_cost_clvm).unwrap().0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a good reason to collect both time and cost in parallel, and run regression analysis on both, independently?

I would expect all cost samples to be exactly proportional to the time samples, since they're just multiplied. You could just apply the multiplication at the end instead, and only run analysis on the timings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary objective of this program is to compare the cost of the native shatree operator against the cost of the CLVM implementation of shatree. I think it would be sufficient to run a few example trees (maybe simple ones) and print the costs for the two implementations.

I don't think you need to time anything, or run any regression analysis here. All that is already done in benchmark-clvm-cost.rs, right?


let duration = start.elapsed().as_nanos() as f64;
let duration = (duration - (3.0 * bytes32_clvm_time)) / 2.0;
let cost = (cost as f64 - (3.0 * bytes32_clvm_cost)) / 2.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I would expect this program to compare the cost of the native operator to the CLVM implementation. But here you alter the cost.

src/treehash.rs Outdated
const SHA256TREE_BASE_COST: Cost = 30;
const SHA256TREE_COST_PER_NODE: Cost = 3000;
// this is the cost per node, whether it is a cons box or an atom
const SHA256TREE_COST_PER_NODE: Cost = 2000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment suggests that this cost is the base cost for both an atom and a pair. I believe the benchmark isn't measuring that cost, it's measuring the cost for a pair, only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was applying this cost for both pairs and atoms.

src/treehash.rs Outdated

// the base cost is the cost of calling it to begin with
const SHA256TREE_BASE_COST: Cost = 30;
const SHA256TREE_BASE_COST: Cost = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be 0. (sha256tree ()) would have very low cost otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now set to the same base cost as sha256

run_program(a, &dialect, call, a.nil(), 11000000000).unwrap();
let duration = start.elapsed();
let sample = (i as f64, duration.as_nanos() as f64);
let duration_f64 = (duration.as_nanos() as f64 - (4.0 * time_per_byte32)) / 2.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this / 2.0 warrants a comment

// this function is used for calculating a theoretical cost per node
// we pass in the time it takes for a byte32 chunk and subtract 4*chunk_time
// we then divide by two to account for the fact that we are adding a nil atom each time the list grows too
// this is because atoms are nodes too
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this function? It doesn't look like it's used for anything. If you were to make it measure the cost of 1 pair and 1 NIL-atom, you could use the result to validate the model of only charging for the blocks of hashed bytes. But as it is right now, what do you do with the result?

Do you check to see that it's close to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose for the pure purpose of this file that it makes sense to remove this legacy costing function as we no longer have a cost for the thing it is trying to measure. I have now removed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but now you've re-introduced a cost per node

/*
This file is for comparing the native sha256tree with the clvm implementation which previously existed.
The costs for the native implementation should be lower as it is not required to make allocations.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how the output of this program should be interpreted. We want it to tell whether the current cost for the native shatree operator is reasonable compared to the CLVM implementation.

How can we tell?

Why does the actual timing matter? It's just the cost that matters, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we want to output the timing as well so we can check that the Cost is also close to the timing * cost_factor measurement too. I have added a comment to this end.

Comment on lines +156 to +157
let duration = (duration - (500.0 + i as f64) * (4.0 * bytes32_clvm_time)) / 2.0;
let cost = (cost as f64 - (500.0 + i as f64) * (4.0 * bytes32_clvm_cost)) / 2.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking some more about this; It doesn't seem safe to assume that the remaining time (after subtracting the 4 sha256 blocks) is evenly distributed between the node and the pair.

But I still don't understand why you're measuring this here. I expect this cost value to match whatever constant you've picked. So why "measure" it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes me nervous to not have any measurements of trees of the sha256tree operator. The only think you measure now is the time for hashing a single atom, and then assume that the cost of a pair is 3 blocks. It would be good to have a measurement confirming this assumption.

);

// taken from benchmark-clvm-cost.rs
let cost_scale = ((101094.0 / 39000.0) + (1343980.0 / 131000.0)) / 2.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looked suspicious to me. Partly because this benchmark is already measuring the cost and the timing, so you already know what the scale factor is, why do you need this constant?

When looking into it, I found the comment above it in benchmark-clvm-cost.rs. It says:

// this "magic" scaling depends on the computer you run the tests on.
// It's calibrated against the timing of point_add, which has a cost

I'm pretty sure that means you have to update it to match your computer in order for the final cost to make sense. It would be nice to automatically establish this scale, but that would perhaps be a bit of a scope creep.

Comment on lines +13 to +18
const SHA256TREE_BASE_COST: Cost = 87;
// this cost is applied for every node we traverse to
const SHA256TREE_NODE_COST: Cost = 500;
// this is the cost for every 32 bytes in a sha256 call
// it is set to the same as sha256
const SHA256TREE_COST_PER_32_BYTES: Cost = 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to be very careful and deliberate when picking these costs. We want to be absolutely certain we won't regret it. Right now I don't really understand how you arrive at these numbers. I'm hoping the PR description can explain how they are picked along with the measurements used to decide this.

I especially think SHA256TREE_NODE_COST is on shaky grounds right now, as there isn't a measurement for a list (or a tree) parameter.

Ideally we can demonstrate that the model we pick works for both lists and a trees.

i.e.

(a . (b . (c . (d . (...)))))

as well as:

(((a . a) . (a . a)) . ((a . a) . (a . a)))

etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants