Description
Node.js Version
v18-v22
NPM Version
v10.8.2
Operating System
Linux zacknewsham-xps 6.8.0-48-generic #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Subsystem
buffer, string_decoder, v8
Description
I'm trying to understand why my application has high baseline memory usage - in doing this I discovered something I can't explain - strings seem to cost >10x more memory per character than the equivalent buffer. Some amount of this is expected (~2x given UTF-16 nature of JS strings) - but not on this scale.
A secondary question is why the setup time of a String->String map is so much slower (8x) than a map that takes that string and converts it to a buffer before storing.
Below is a minimal preproduction - the commented out lines in test
allow you to toggle between the string->string map and the string->buffer map
I run it with --expose-gc
just to get a valid heap snapshot at the end. The total string size stored is (17 + 1000) * 100,000 - so the absolute minimal memory usage of this would be around 100mb (a trivial C++ implementation of the same takes 114mb).
When running with the string->string map, the memory cost is around 3.2GB and the setup time (to populate the map) is ~11s, when running as a string->buffer map the memory cost is 280MB and the setup time is ~1.3s. The "time" difference reported is completely explicable (the cost of parsing the buffer each time)
Minimal Reproduction
import { setTimeout } from "timers/promises";
const characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
// random alpha numeric strings of a specific length
function makeid(length) {
let result = '';
const charactersLength = characters.length;
let counter = 0;
while (counter < length) {
result += characters.charAt(Math.floor(Math.random() * charactersLength));
counter += 1;
}
return result;
}
// test setup - 100,000 keys 17 chars long, 1mn iterations, values are 1000 chars long
const keyCount = 100000;
const iterations = 1_000_000;
const keyLength = 17;
const valueLength = 1000;
const keys = new Array(keyCount).fill(0).map(() => makeid(keyLength));
function testMap(map) {
const startSetup = performance.now();
keys.forEach(key => map.set(key, makeid(valueLength)));
const endSetup = performance.now();
const start = performance.now();
for (let i = 0; i < iterations; i++) {
const key = keys[Math.floor(Math.random() * keys.length)];
const value = map.get(key);
// v8 optimisation busting - without this the loop is 4x faster due to optimising out the get call
globalThis.value = value;
}
const end = performance.now();
return { time: end - start, setup: endSetup - startSetup };
}
// a naive implementation that keeps the API the same but converts value's into buffers
class ConvertToBufferMap extends Map {
set(key, value) {
super.set(key, Buffer.from(value, "utf-8"));
}
get(key) {
return super.get(key)?.toString("utf-8");
}
}
async function test() {
// const map = new Map();
// console.log("map", testMap(map));
const bufferMap = new ConvertToBufferMap();
console.log("bufferMap", testMap(bufferMap));
gc();
console.log(process.memoryUsage().rss / 1024 / 1024);
// pause to go get a heap snapshot or whatever
await setTimeout(100000);
}
test();
Output
bufferMap { time: 705.9530600000003, setup: 1303.258812 }
Memory usage: 279.30078125
map { time: 83.8109829999994, setup: 10450.127824000001 }
Memory usage: 3195.6953125
Before You Submit
- I have looked for issues that already exist before submitting this
- My issue follows the guidelines in the README file, and follows the 'How to ask a good question' guide at https://stackoverflow.com/help/how-to-ask