-
Notifications
You must be signed in to change notification settings - Fork 66
Add op_sha256tree #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add op_sha256tree #632
Conversation
Pull Request Test Coverage Report for Build 20073713856Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
909adfe to
837026a
Compare
arvidn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a tool that can be extended that establishes a reasonable cost for new operators as well. We need some kind of benchmark to set the cost.
2dcc4df to
52fb0ba
Compare
arvidn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you feel confident that the cost benchmarks are good? specifically the cost per byte, cost per pair and cost per atom?
tools/generate-sha256tree-tests.py
Outdated
|
|
||
| seed(1337) | ||
|
|
||
| SHA256TREE_BASE_COST = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the correct value for this is 0, I think we should just remove it
|
new raspberry PI results: MacBookPro M1: |
src/treehash.rs
Outdated
| #[derive(Default)] | ||
| pub struct TreeCache { | ||
| hashes: Vec<[u8; 32]>, | ||
| // parallel vector holding the cost used to compute the corresponding hash | ||
| costs: Vec<Cost>, | ||
| // each entry is an index into hashes and costs, or one of 3 special values: | ||
| // u16::MAX if the pair has not been visited | ||
| // u16::MAX - 1 if the pair has been seen once | ||
| // u16::MAX - 2 if the pair has been seen at least twice (this makes it a | ||
| // candidate for memoization) | ||
| pairs: Vec<u16>, | ||
| } | ||
|
|
||
| const NOT_VISITED: u16 = u16::MAX; | ||
| const SEEN_ONCE: u16 = u16::MAX - 1; | ||
| const SEEN_MULTIPLE: u16 = u16::MAX - 2; | ||
|
|
||
| impl TreeCache { | ||
| /// Get cached hash and its associated cost (if present). | ||
| pub fn get(&self, n: NodePtr) -> Option<(&[u8; 32], Cost)> { | ||
| // We only cache pairs (for now) | ||
| if !matches!(n.object_type(), ObjectType::Pair) { | ||
| return None; | ||
| } | ||
|
|
||
| let idx = n.index() as usize; | ||
| let slot = *self.pairs.get(idx)?; | ||
| if slot >= SEEN_MULTIPLE { | ||
| return None; | ||
| } | ||
| Some((&self.hashes[slot as usize], self.costs[slot as usize])) | ||
| } | ||
|
|
||
| /// Insert a cached hash with its associated cost. If the cache is full we | ||
| /// ignore the insertion. | ||
| pub fn insert(&mut self, n: NodePtr, hash: &[u8; 32], cost: Cost) { | ||
| // If we've reached the max size, just ignore new cache items | ||
| if self.hashes.len() == SEEN_MULTIPLE as usize { | ||
| return; | ||
| } | ||
|
|
||
| if !matches!(n.object_type(), ObjectType::Pair) { | ||
| return; | ||
| } | ||
|
|
||
| let idx = n.index() as usize; | ||
| if idx >= self.pairs.len() { | ||
| self.pairs.resize(idx + 1, NOT_VISITED); | ||
| } | ||
|
|
||
| let slot = self.hashes.len(); | ||
| self.hashes.push(*hash); | ||
| self.costs.push(cost); | ||
| self.pairs[idx] = slot as u16; | ||
| } | ||
|
|
||
| /// mark the node as being visited. Returns true if we need to | ||
| /// traverse visitation down this node. | ||
| fn visit(&mut self, n: NodePtr) -> bool { | ||
| if !matches!(n.object_type(), ObjectType::Pair) { | ||
| return false; | ||
| } | ||
| let idx = n.index() as usize; | ||
| if idx >= self.pairs.len() { | ||
| self.pairs.resize(idx + 1, NOT_VISITED); | ||
| } | ||
| if self.pairs[idx] > SEEN_MULTIPLE { | ||
| self.pairs[idx] -= 1; | ||
| } | ||
| self.pairs[idx] == SEEN_ONCE | ||
| } | ||
|
|
||
| pub fn should_memoize(&mut self, n: NodePtr) -> bool { | ||
| if !matches!(n.object_type(), ObjectType::Pair) { | ||
| return false; | ||
| } | ||
| let idx = n.index() as usize; | ||
| if idx >= self.pairs.len() { | ||
| false | ||
| } else { | ||
| self.pairs[idx] <= SEEN_MULTIPLE | ||
| } | ||
| } | ||
|
|
||
| pub fn visit_tree(&mut self, a: &Allocator, node: NodePtr) { | ||
| if !self.visit(node) { | ||
| return; | ||
| } | ||
| let mut nodes = vec![node]; | ||
| while let Some(n) = nodes.pop() { | ||
| let SExp::Pair(left, right) = a.sexp(n) else { | ||
| continue; | ||
| }; | ||
| if self.visit(left) { | ||
| nodes.push(left); | ||
| } | ||
| if self.visit(right) { | ||
| nodes.push(right); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all code related to the cached version should be removed until we have a good solution. I suspect it will need to look very different from this
src/treehash.rs
Outdated
| enum TreeOp { | ||
| SExp(NodePtr), | ||
| Cons, | ||
| ConsAddCacheCost(NodePtr, Cost), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
including this one
src/treehash.rs
Outdated
| match op { | ||
| TreeOp::SExp(node) => { | ||
| // charge a call cost for processing this op | ||
| cost += SHA256TREE_COST_PER_NODE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, "node" means both pair and atom, right? When you measure these costs in the benchmark, you just have base-cost, per-32-bytes-cost and per-pair cost, right? It's not clear to me how you establish the cost-per-node.
src/treehash.rs
Outdated
| } | ||
| } | ||
| NodeVisitor::Pair(left, right) => { | ||
| increment_bytes(65, &mut cost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also doesn't seem to match how you measure these costs in the benchmark.
src/treehash.rs
Outdated
| const SHA256TREE_BASE_COST: Cost = 30; | ||
| const SHA256TREE_COST_PER_NODE: Cost = 3000; | ||
| const SHA256TREE_COST_PER_32_BYTES: Cost = 700; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me how you establish these costs. They seem to represent different metrics than what you actually measure in the benchmark.
I think it would be good to add comments to these constants to describe what they represent.
tools/src/bin/benchmark-clvm-cost.rs
Outdated
| } | ||
|
|
||
| // this adds 32 bytes at a time compared to per_byte which adds 5 at a time | ||
| fn time_per_byte_for_atom(a: &mut Allocator, op: &Operator, output: &mut dyn Write) -> (f64, f64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes sense that just looking at the slope here, you isolate the cost per 32 bytes. The constant factor is unknown as it includes base cost and cost per node.
tools/src/bin/benchmark-clvm-cost.rs
Outdated
| samples.push(sample); | ||
| } | ||
|
|
||
| linear_regression_of(&samples).expect("linreg failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the slope here represents: 2 * cost-per-node + 3 cost-per-32-bytes. Since you charge for hashing 65 bytes in the pair case of the tree hash function.
So, in order to isolate the cost-per-node, you need to subtract 3 cost-per-32-bytes and then divide by two. Am I missing something?
Since this is a bit more complex than the other benchmarks, that isolate a single cost per test, it would be good to add some comments. It might also be good to reconsider how to apply cost. If you don't charge the cost for hashing 65 bytes for a pair, this slope becomes cost-per-pair + cost-per-atom. But then you'd need to isolate the cost per atom as well, separate from bytes.
tools/src/bin/sha256tree-benching.rs
Outdated
| (f64, f64), | ||
| (f64, f64), // time slopes | ||
| (f64, f64), | ||
| (f64, f64), // cost slopes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can tell, only the second value in these pairs are slopes, the first is the constant, where the line intersects x=0.
tools/src/bin/sha256tree-benching.rs
Outdated
| println!("CLVM cost slope : {:.4}", atom_clvm_c.0); | ||
|
|
||
| println!("list results: "); | ||
| println!("Native time slope (ns): {:.4}", cons_nat_t.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a slope. But labelling these as "slope" isn't very helpful. they are just cost-per-pair, or cost-per-32-bytes or things like that, right?
And the constants don't mean anything at all, as far as I can tell.
| bytes32_native_cost: f64, | ||
| bytes32_clvm_cost: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cost should be specified as u64
| f64, | ||
| f64, // cost slopes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cost should be u64. Also, referring to these as "slope" isn't very helpful, I think. It kind of tells you that it's cost per something, but I think it would be much better to refer to these by what they actually represent. It's time per node and cost per node, right? But what are the two other values?
tools/src/bin/sha256tree-benching.rs
Outdated
| let result_1 = node_to_bytes(a, red.1).expect("should work"); | ||
| let duration = start.elapsed().as_nanos() as f64; | ||
| let duration = (duration - (3.0 * bytes32_native_time)) / 2.0; | ||
| let cost = (cost as f64 - (3.0 * bytes32_native_cost)) / 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is 3.0/2.0 a constant to convert from time to cost? I think it should be a constant with a name and a comment explaining how you arrived at it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you alter cost here. Shouldn't you report the actual cost reported from the CLVM interpreter?
| linear_regression_of(&samples_cost_native).unwrap().0, | ||
| linear_regression_of(&samples_cost_clvm).unwrap().0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a good reason to collect both time and cost in parallel, and run regression analysis on both, independently?
I would expect all cost samples to be exactly proportional to the time samples, since they're just multiplied. You could just apply the multiplication at the end instead, and only run analysis on the timings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary objective of this program is to compare the cost of the native shatree operator against the cost of the CLVM implementation of shatree. I think it would be sufficient to run a few example trees (maybe simple ones) and print the costs for the two implementations.
I don't think you need to time anything, or run any regression analysis here. All that is already done in benchmark-clvm-cost.rs, right?
tools/src/bin/sha256tree-benching.rs
Outdated
|
|
||
| let duration = start.elapsed().as_nanos() as f64; | ||
| let duration = (duration - (3.0 * bytes32_clvm_time)) / 2.0; | ||
| let cost = (cost as f64 - (3.0 * bytes32_clvm_cost)) / 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, I would expect this program to compare the cost of the native operator to the CLVM implementation. But here you alter the cost.
src/treehash.rs
Outdated
| const SHA256TREE_BASE_COST: Cost = 30; | ||
| const SHA256TREE_COST_PER_NODE: Cost = 3000; | ||
| // this is the cost per node, whether it is a cons box or an atom | ||
| const SHA256TREE_COST_PER_NODE: Cost = 2000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment suggests that this cost is the base cost for both an atom and a pair. I believe the benchmark isn't measuring that cost, it's measuring the cost for a pair, only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was applying this cost for both pairs and atoms.
src/treehash.rs
Outdated
|
|
||
| // the base cost is the cost of calling it to begin with | ||
| const SHA256TREE_BASE_COST: Cost = 30; | ||
| const SHA256TREE_BASE_COST: Cost = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can be 0. (sha256tree ()) would have very low cost otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now set to the same base cost as sha256
tools/src/bin/benchmark-clvm-cost.rs
Outdated
| run_program(a, &dialect, call, a.nil(), 11000000000).unwrap(); | ||
| let duration = start.elapsed(); | ||
| let sample = (i as f64, duration.as_nanos() as f64); | ||
| let duration_f64 = (duration.as_nanos() as f64 - (4.0 * time_per_byte32)) / 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this / 2.0 warrants a comment
tools/src/bin/benchmark-clvm-cost.rs
Outdated
| // this function is used for calculating a theoretical cost per node | ||
| // we pass in the time it takes for a byte32 chunk and subtract 4*chunk_time | ||
| // we then divide by two to account for the fact that we are adding a nil atom each time the list grows too | ||
| // this is because atoms are nodes too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this function? It doesn't look like it's used for anything. If you were to make it measure the cost of 1 pair and 1 NIL-atom, you could use the result to validate the model of only charging for the blocks of hashed bytes. But as it is right now, what do you do with the result?
Do you check to see that it's close to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose for the pure purpose of this file that it makes sense to remove this legacy costing function as we no longer have a cost for the thing it is trying to measure. I have now removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now you've re-introduced a cost per node
| /* | ||
| This file is for comparing the native sha256tree with the clvm implementation which previously existed. | ||
| The costs for the native implementation should be lower as it is not required to make allocations. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how the output of this program should be interpreted. We want it to tell whether the current cost for the native shatree operator is reasonable compared to the CLVM implementation.
How can we tell?
Why does the actual timing matter? It's just the cost that matters, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we want to output the timing as well so we can check that the Cost is also close to the timing * cost_factor measurement too. I have added a comment to this end.
| let duration = (duration - (500.0 + i as f64) * (4.0 * bytes32_clvm_time)) / 2.0; | ||
| let cost = (cost as f64 - (500.0 + i as f64) * (4.0 * bytes32_clvm_cost)) / 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking some more about this; It doesn't seem safe to assume that the remaining time (after subtracting the 4 sha256 blocks) is evenly distributed between the node and the pair.
But I still don't understand why you're measuring this here. I expect this cost value to match whatever constant you've picked. So why "measure" it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it makes me nervous to not have any measurements of trees of the sha256tree operator. The only think you measure now is the time for hashing a single atom, and then assume that the cost of a pair is 3 blocks. It would be good to have a measurement confirming this assumption.
| ); | ||
|
|
||
| // taken from benchmark-clvm-cost.rs | ||
| let cost_scale = ((101094.0 / 39000.0) + (1343980.0 / 131000.0)) / 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looked suspicious to me. Partly because this benchmark is already measuring the cost and the timing, so you already know what the scale factor is, why do you need this constant?
When looking into it, I found the comment above it in benchmark-clvm-cost.rs. It says:
// this "magic" scaling depends on the computer you run the tests on.
// It's calibrated against the timing of point_add, which has a cost
I'm pretty sure that means you have to update it to match your computer in order for the final cost to make sense. It would be nice to automatically establish this scale, but that would perhaps be a bit of a scope creep.
| const SHA256TREE_BASE_COST: Cost = 87; | ||
| // this cost is applied for every node we traverse to | ||
| const SHA256TREE_NODE_COST: Cost = 500; | ||
| // this is the cost for every 32 bytes in a sha256 call | ||
| // it is set to the same as sha256 | ||
| const SHA256TREE_COST_PER_32_BYTES: Cost = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to be very careful and deliberate when picking these costs. We want to be absolutely certain we won't regret it. Right now I don't really understand how you arrive at these numbers. I'm hoping the PR description can explain how they are picked along with the measurements used to decide this.
I especially think SHA256TREE_NODE_COST is on shaky grounds right now, as there isn't a measurement for a list (or a tree) parameter.
Ideally we can demonstrate that the model we pick works for both lists and a trees.
i.e.
(a . (b . (c . (d . (...)))))
as well as:
(((a . a) . (a . a)) . ((a . a) . (a . a)))
etc.
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
Co-authored-by: Arvid Norberg <[email protected]>
This PR adds a costed and cached shatree which accounts for cost as if it isn't caching so that further improvements may be made in the future without breaking consensus.
Attached are the benchmarked performance graphs for cost of call, cost per pair and cost for 32byte chunk.