Skip to content

refactor: Consolidate code based on Buffer Presence #394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 18, 2020

Conversation

nbbeeken
Copy link
Contributor

Update code to rely on Buffer class presence, removes redundant code to handle supporting typed arrays internally

NODE-2722

Description

Something of note, TypedArrays are now serialized as BSON Binary. I have a code snippet below that breaks apart the details for the changes, This seems like a bug, but I'm looking for feedback, I think a primary consideration is to think about query-ability, although sometimes plain binary persistence is desired while some other meta data is best kept for querying.

  • Should TypedArrays instead be serialized to BSON arrays with the closest matching type, examples:
    • Float32Arrays become a BSON array of BSON Double.
    • Int32Arrays become BSON array of BSON Int32 meanwhile Uint32Arrays would become BSON Array of int64s
  • Or should TypedArrays be considered binary blobs and kept as a sequence of bytes

I went with the direction of relying on Buffer being present (it's not bundled into BSON). I can however take this work in the opposite direction and eliminate the usage of Buffer in favor of JS TypedArrays and ArrayBuffer. The cost of removing Buffer is roping in either dependencies or vendor-ing code to support the features of Buffer we rely on, for example base64 translations or utf8 parsing. (In order to read an write utf8 text from TypedArrays we would need a TextEncoder / TextDecoder polyfill, for older versions of node)

const BSON = require('PATH_TO_OLD_BSON');
const newBSON = require('PATH_TO_NEW_BSON');

// CURRENT

var b = BSON.serialize({ floats: new Float32Array([12.34, 34, 21.23, 3]) })
console.log(b.toString('hex'))
console.log(BSON.deserialize(b))
/*
36000000        // size
03              // Embedded Document (not array! would be 0x04)
666c6f61747300  // "floats\0"
   29000000           // size (embedded)
   01 3000            // type double and key '0\0'
   0000008014ae2840
   10 3100            // type int32 and key '1\0'
   22000000
   01 3200            // type double and key '2\0'
   00000040e13a3540
   10 3300            // type int32 and key '3\0'
   03000000
   00                 // doc null (embedded)
00              // doc null

{
  floats: { '0': 12.34000015258789, '1': 34, '2': 21.229999542236328, '3': 3 }
}
*/

// NEW

var b = newBSON.serialize({ floats: new Float32Array([12.34, 34, 21.23, 3]) })
console.log(b.toString('hex'))
console.log(newBSON.deserialize(b))
/*

22000000                              // size
05                                    // Binary type
666c6f61747300                        // "floats\0"
   10000000                           // 16 bytes
   00                                 // subtype 0
   a4704541000008420ad7a94100004040   // Bytes
00                                    // doc null

{
  floats: Binary {
    sub_type: 0,
    buffer: <Buffer a4 70 45 41 00 00 08 42 0a d7 a9 41 00 00 40 40>,
    position: 16
  }
}
*/

Another note: It's not easy to round trip the current output, so we should at least try and make this code produce an array I believe:
for example, Float32Array.from({ '0': 12.34000015258789, '1': 34, '2': 21.229999542236328, '3': 3 }) doesn't work, you'd have to manually iterate the keys.

@nbbeeken nbbeeken changed the title refactor(Buffer): ♻️ Consolidate code based on Buffer Presence refactor(Buffer): Consolidate code based on Buffer Presence Sep 15, 2020
0
];
// prettier-ignore
var bytes = [26, 1, 0, 0, 7, 95, 105, 100, 0, 161, 190, 98, 75, 118, 169, 3, 0, 0, 3, 0, 0, 4, 97, 114, 114, 97, 121, 0, 26, 0, 0, 0, 16, 48, 0, 1, 0, 0, 0, 16, 49, 0, 2, 0, 0, 0, 16, 50, 0, 3, 0, 0, 0, 0, 2, 115, 116, 114, 105, 110, 103, 0, 6, 0, 0, 0, 104, 101, 108, 108, 111, 0, 3, 104, 97, 115, 104, 0, 19, 0, 0, 0, 16, 97, 0, 1, 0, 0, 0, 16, 98, 0, 2, 0, 0, 0, 0, 9, 100, 97, 116, 101, 0, 161, 190, 98, 75, 0, 0, 0, 0, 7, 111, 105, 100, 0, 161, 190, 98, 75, 90, 217, 18, 0, 0, 1, 0, 0, 5, 98, 105, 110, 97, 114, 121, 0, 7, 0, 0, 0, 2, 3, 0, 0, 0, 49, 50, 51, 16, 105, 110, 116, 0, 42, 0, 0, 0, 1, 102, 108, 111, 97, 116, 0, 223, 224, 11, 147, 169, 170, 64, 64, 11, 114, 101, 103, 101, 120, 112, 0, 102, 111, 111, 98, 97, 114, 0, 105, 0, 8, 98, 111, 111, 108, 101, 97, 110, 0, 1, 15, 119, 104, 101, 114, 101, 0, 25, 0, 0, 0, 12, 0, 0, 0, 116, 104, 105, 115, 46, 120, 32, 61, 61, 32, 51, 0, 5, 0, 0, 0, 0, 3, 100, 98, 114, 101, 102, 0, 37, 0, 0, 0, 2, 36, 114, 101, 102, 0, 5, 0, 0, 0, 116, 101, 115, 116, 0, 7, 36, 105, 100, 0, 161, 190, 98, 75, 2, 180, 1, 0, 0, 2, 0, 0, 0, 10, 110, 117, 108, 108, 0, 0];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

320 line file -> 38 line file 😌 I felt this was just the right thing to do

// API-extractor makes use of the declarations, npm script should be cleaning these up
"declaration": true,
"declarationMap": true
"declarationMap": true,
"types": []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid any types being relied upon, namely the @types/node I want there to always have to be an import {Buffer} from 'buffer' where needed, and not rely on global namespace types.

@@ -20,10 +20,11 @@
// Generate separate source maps files with sourceContent included
"sourceMap": true,
"inlineSourceMap": false,
"inlineSources": true,
"inlineSources": false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sync with the driver, src will be shipped, no need to inline sources.

!(typeof buffer === 'string') &&
!Buffer.isBuffer(buffer) &&
!(buffer instanceof Uint8Array) &&
!ArrayBuffer.isView(buffer) &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayBuffer.isView() returns true for any typed array class, this includes Buffer since it is a subclass of Uint8Array, as well a DataView. ensureBuffer now handles bytes-ifying any typed array / view

Copy link
Contributor

@reggi reggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯++

src/binary.ts Outdated
this.position = 0;
} else if (typeof buffer === 'string') {
// string
this.buffer = writeStringToArray(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we use Buffer.from for everything? It takes array|string|arrayBuffer etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything that could be made more simple here, there's a slight difference to each case.

Somewhat related the ensureBuffer function is used at the top level deserialize functions so I've updated those types:

src/objectid.ts Outdated
} else {
time = this.generationTime;
}
const time = this.id.readUInt32BE(0, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these false parameters coming from? I don't see them in the node docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops legacy node pre 1.0 arguments, removed.

@@ -368,7 +369,7 @@ function serializeBuffer(
index = index + numberOfWrittenBytes;
buffer[index++] = 0;
// Get size of the buffer (current write point)
const size = value.length;
const size = value.byteLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this just a bug? There is a difference between these two values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that any TypedArrays land in this path of serialization it's important to use the byteLength. Float32Array([1.2, 3.4]) has length 2 but byteLength 8. If it were only int8 types arriving here then length and byteLength would be the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that any TypedArrays land in this path

Just in terms of typing right? TypedArrays have always been able to take this path. Does this mean we've been serializing them incorrectly all along? Might be a case for making more of a breaking change if it never worked correctly in the first place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a modification in the serialization that led to this (because the type annotations followed the change it was having me change this to byteLength) this has be fixed

@@ -1,3 +1,14 @@
import { Buffer } from 'buffer';
export type BufferEncoding =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really not provided by the node or buffer types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there but its not exported, maybe we should just accept string, it would mean not having to ever change this if there was some new format. (waiting on utf32 to take off 😆)

Update code to rely on Buffer class presence, removes redundant code to handle supporting typed arrays internally

TypedArrays are serialized as BSON Binary
If any typed array other than Uint8 is passed as an object value it will be serialized in the same form that JSON supports. Which is a plain object of keys which are stringified numbers that map to each value in the array
@nbbeeken nbbeeken force-pushed the NODE-2722/arraybuffer branch from cccc000 to cb85187 Compare September 18, 2020 15:17
@nbbeeken nbbeeken requested a review from mbroadst September 18, 2020 16:48
Copy link
Member

@mbroadst mbroadst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🚀

@nbbeeken nbbeeken changed the title refactor(Buffer): Consolidate code based on Buffer Presence refactor: Consolidate code based on Buffer Presence Sep 18, 2020
@nbbeeken nbbeeken merged commit f55eeed into master Sep 18, 2020
@nbbeeken nbbeeken deleted the NODE-2722/arraybuffer branch September 18, 2020 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants