-
-
Notifications
You must be signed in to change notification settings - Fork 34k
buffer: optimize Buffer.of with bufferPool #60372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Review requested:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #60372 +/- ##
==========================================
+ Coverage 88.52% 88.54% +0.02%
==========================================
Files 703 703
Lines 208222 208234 +12
Branches 40131 40145 +14
==========================================
+ Hits 184320 184391 +71
+ Misses 15905 15840 -65
- Partials 7997 8003 +6
🚀 New features to boost your workflow:
|
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Benchmark GHA: https://github.com/aduh95/node/actions/runs/18809554526 Results (improvements across the board) |
|
@aduh95 the CI didn't pick up the benchmarks, I think the category is |
|
I feel like adding pooling to places which previously returned non-pooled buffers should be semver-major to be on the safe side Too much code in the ecosystem assuming non-pooled buffers and accessing |
|
Which could improve perf in some usecases significantly (~3x on allocations), and would be much more noticeable than But it would definitely be a semver-major And the only difference from |
|
Previously we have added pooling as semver minor (#31615), but I have no problem with semver major.
I think Buffer.alloc docs say it's a new buffer instance explicitly and it's one of the differences with unsafeAlloc, but maybe we still can? However it's less obvious. |
#31615 did not add pooling, see the diff there --
chalker@Nikitas-MacBook-Air % nvm use 4
Now using node v4.9.1 (npm v2.15.11)
chalker@Nikitas-MacBook-Air % node
> Buffer.from('x').buffer.byteLength
8192
> Buffer.from([1]).buffer.byteLength
8192 |
|
I'll let the collaborators decide – both are fine for me |
|
e.g. this will definitely break: https://github.com/website-local/website-scrap-engine/blob/6433593ee96fe2a386092984111a7c64599c9709/test/util.spec.ts#L80 but other places might use it indirectly, I have seen a lot of code assuming non-pooled uint8arrays, including in popular deps I vote semver-major 🙃 |
|
The commit message should also be changed to be more explicit, it's adding the buffer pool to @mcollina Do you think this change should be a topic for tsc due to the impact it might have be? |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more things this could break, including our own APIs.
At the very least, documentation should be updated for this to land
Lines 694 to 697 in dec0213
| Do not pass a pooled {Buffer} object instance in to this method. | |
| Pooled `Buffer` objects are created using `Buffer.allocUnsafe()`, | |
| or `Buffer.from()`, or are often returned by various `node:fs` module | |
| callbacks. These types of `Buffer`s use a shared underlying |
E.g., but not sure if limited to:
Lines 827 to 831 in dec0213
| The `Buffer` module pre-allocates an internal `Buffer` instance of | |
| size [`Buffer.poolSize`][] that is used as a pool for the fast allocation of new | |
| `Buffer` instances created using [`Buffer.allocUnsafe()`][], [`Buffer.from(array)`][], | |
| [`Buffer.from(string)`][], and [`Buffer.concat()`][] only when `size` is less than | |
| `Buffer.poolSize >>> 1` (floor of [`Buffer.poolSize`][] divided by two). |
Lines 5463 to 5465 in dec0213
| `Buffer` instances returned by [`Buffer.allocUnsafe()`][], [`Buffer.from(string)`][], | |
| [`Buffer.concat()`][] and [`Buffer.from(array)`][] _may_ be allocated off a shared | |
| internal memory pool if `size` is less than or equal to half [`Buffer.poolSize`][]. |
|
Did you find something in Node.js code that would break? Or are you saying it just could? As you can see in the docs and the code most Buffer methods may already use it. I understand it's semver major and I'm OK with that, but for me one of the reasons to use Buffer methods to create a new Uint8Array is the buffer pool. Otherwise I'd just use Uint8Array static methods, etc. But yeah good idea to update the docs including .of as well. |
|
@gurgunday the first doc citation in my previous comment starts with
I trusted that means that it would break, but I did not recheck. |
|
Docs are added I think this can land as semver major This also gives people a good reason to use |
|
More data (not against landing this, just more data): 0 is main, 1 is this branch rebased on top of main, and last one is still non-pooled const of = (...items) => {
- const newObj = createUnsafeBuffer(items.length);
- for (let k = 0; k < items.length; k++)
+ const len = items.length;
+ const newObj = len <= 64 ? new FastBuffer(len) : createUnsafeBuffer(len);
+ for (let k = 0; k < len; k++)
newObj[k] = items[k];
return newObj;
};note: I increased n 10x for better consistency and added more small sizes (as that's on what Buffer.of is likely used) pooling improves the perf over that, but that patch is backportable and non semver-major more than half of the perf issue here was that Also note: it's important to not optimize |
|
Hm, I wonder if just always From top to bottom: const of = (...items) => {
- const newObj = createUnsafeBuffer(items.length);
- for (let k = 0; k < items.length; k++)
+ const len = items.length;
+ const newObj = new FastBuffer(len);
+ for (let k = 0; k < len; k++)
newObj[k] = items[k];
Perhaps it would make sense to land that in 24/25, and this in 26 on top. |
|
Is createUnsafeBuffer zero-filling in your benchmarks? |
|
@gurgunday no, this is rebased on main (also, I suggest rebasing this PR on main to get the non-zero-filling change in) |
|
I checked usage. |
|
@gurgunday I filed #60503. I also modified your benchmark to denoise it a little, but I didn't commit it to not conflict with this PR in benchmarks. |
|
Rebased, here are the latest results: branch: buffers/buffer-of.js
buffers/buffer-of.js n=500000 len=0: 23,176,224.64213939
buffers/buffer-of.js n=500000 len=1: 28,260,428.21105159
buffers/buffer-of.js n=500000 len=8: 22,288,220.118461892
buffers/buffer-of.js n=500000 len=64: 8,651,554.224031374
buffers/buffer-of.js n=500000 len=256: 2,675,041.3293885393
buffers/buffer-of.js n=500000 len=1024: 761,538.1119104564main: buffers/buffer-of.js
buffers/buffer-of.js n=500000 len=0: 23,458,759.5007976
buffers/buffer-of.js n=500000 len=1: 24,023,591.166525528
buffers/buffer-of.js n=500000 len=8: 19,374,432.13539411
buffers/buffer-of.js n=500000 len=64: 8,169,940.247344614
buffers/buffer-of.js n=500000 len=256: 1,423,056.0698322074
buffers/buffer-of.js n=500000 len=1024: 665,914.9969122255After looking at the codebase again, I think it's safe to merge as semver major; as previously stated, it gives a good reason to prefer |
|
most used sizes are likely 0-8, and improvement on 1-8 seems significant (15-20%) according to #60372 (comment) unless someone demonstrates that this somehow does not bring any measurable real-world perf improvement, i think this should go in |
|
@anonrig maybe PTAL? :) For context: this will be semver-major but a really good perf improvement and in alignment with other Buffer methods like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this needs to overload TypedArray.of()? I would like that we don't unnecessarily diverge Buffer from Uint8Array any further. It's already messy enough as it is. This might as well be a Buffer-specific static method name that doesn't confict with TypedArray definitions.
Yes, to not call the deprecated Buffer constructor Edit: Well, it's up to you in the end but I think it wouldn't be messy if we make it clear that the difference between Buffer and Uint8Array methods is the pool, so users have a clear reason to use them. It's not like we will ever remove the pool from |
You got me lost there. Please explain. Why can't a faster version of |
|
Oh apologies I misread your comment, I thought you were questioning why we're overwriting the method in the first place, which is to avoid deprecated Buffer constructor warnings
I'm not opposed to this approach, I think it would even be better long-term to have a complete separation between pooled and non-pooled methods, but then we would have the problem of If we can't though, I think continuing in the steps of making Buffer static methods return pooled instances is also clearer than the inconsistency we have right now |
|
I'm not down with deprecating Buffer.from. I am merely objecting to making the functional gap between Buffer and Uint8Array-inherited methods even bigger. |
Continues the work of utilizing the bufferPool
After:
Before: