refactor: Consolidate code based on Buffer Presence #394

nbbeeken · 2020-09-15T21:11:09Z

Update code to rely on Buffer class presence, removes redundant code to handle supporting typed arrays internally

Description

Something of note, TypedArrays are now serialized as BSON Binary. I have a code snippet below that breaks apart the details for the changes, This seems like a bug, but I'm looking for feedback, I think a primary consideration is to think about query-ability, although sometimes plain binary persistence is desired while some other meta data is best kept for querying.

Should TypedArrays instead be serialized to BSON arrays with the closest matching type, examples:
- Float32Arrays become a BSON array of BSON Double.
- Int32Arrays become BSON array of BSON Int32 meanwhile Uint32Arrays would become BSON Array of int64s
Or should TypedArrays be considered binary blobs and kept as a sequence of bytes

I went with the direction of relying on Buffer being present (it's not bundled into BSON). I can however take this work in the opposite direction and eliminate the usage of Buffer in favor of JS TypedArrays and ArrayBuffer. The cost of removing Buffer is roping in either dependencies or vendor-ing code to support the features of Buffer we rely on, for example base64 translations or utf8 parsing. (In order to read an write utf8 text from TypedArrays we would need a TextEncoder / TextDecoder polyfill, for older versions of node)

const BSON = require('PATH_TO_OLD_BSON');
const newBSON = require('PATH_TO_NEW_BSON');

// CURRENT

var b = BSON.serialize({ floats: new Float32Array([12.34, 34, 21.23, 3]) })
console.log(b.toString('hex'))
console.log(BSON.deserialize(b))
/*
36000000        // size
03              // Embedded Document (not array! would be 0x04)
666c6f61747300  // "floats\0"
   29000000           // size (embedded)
   01 3000            // type double and key '0\0'
   0000008014ae2840
   10 3100            // type int32 and key '1\0'
   22000000
   01 3200            // type double and key '2\0'
   00000040e13a3540
   10 3300            // type int32 and key '3\0'
   03000000
   00                 // doc null (embedded)
00              // doc null

{
  floats: { '0': 12.34000015258789, '1': 34, '2': 21.229999542236328, '3': 3 }
}
*/

// NEW

var b = newBSON.serialize({ floats: new Float32Array([12.34, 34, 21.23, 3]) })
console.log(b.toString('hex'))
console.log(newBSON.deserialize(b))
/*

22000000                              // size
05                                    // Binary type
666c6f61747300                        // "floats\0"
   10000000                           // 16 bytes
   00                                 // subtype 0
   a4704541000008420ad7a94100004040   // Bytes
00                                    // doc null

{
  floats: Binary {
    sub_type: 0,
    buffer: <Buffer a4 70 45 41 00 00 08 42 0a d7 a9 41 00 00 40 40>,
    position: 16
  }
}
*/

Another note: It's not easy to round trip the current output, so we should at least try and make this code produce an array I believe:
for example, Float32Array.from({ '0': 12.34000015258789, '1': 34, '2': 21.229999542236328, '3': 3 }) doesn't work, you'd have to manually iterate the keys.

nbbeeken · 2020-09-15T21:14:32Z

test/node/promote_values_test.js

-      0
-    ];
+    // prettier-ignore
+    var bytes = [26, 1, 0, 0, 7, 95, 105, 100, 0, 161, 190, 98, 75, 118, 169, 3, 0, 0, 3, 0, 0, 4, 97, 114, 114, 97, 121, 0, 26, 0, 0, 0, 16, 48, 0, 1, 0, 0, 0, 16, 49, 0, 2, 0, 0, 0, 16, 50, 0, 3, 0, 0, 0, 0, 2, 115, 116, 114, 105, 110, 103, 0, 6, 0, 0, 0, 104, 101, 108, 108, 111, 0, 3, 104, 97, 115, 104, 0, 19, 0, 0, 0, 16, 97, 0, 1, 0, 0, 0, 16, 98, 0, 2, 0, 0, 0, 0, 9, 100, 97, 116, 101, 0, 161, 190, 98, 75, 0, 0, 0, 0, 7, 111, 105, 100, 0, 161, 190, 98, 75, 90, 217, 18, 0, 0, 1, 0, 0, 5, 98, 105, 110, 97, 114, 121, 0, 7, 0, 0, 0, 2, 3, 0, 0, 0, 49, 50, 51, 16, 105, 110, 116, 0, 42, 0, 0, 0, 1, 102, 108, 111, 97, 116, 0, 223, 224, 11, 147, 169, 170, 64, 64, 11, 114, 101, 103, 101, 120, 112, 0, 102, 111, 111, 98, 97, 114, 0, 105, 0, 8, 98, 111, 111, 108, 101, 97, 110, 0, 1, 15, 119, 104, 101, 114, 101, 0, 25, 0, 0, 0, 12, 0, 0, 0, 116, 104, 105, 115, 46, 120, 32, 61, 61, 32, 51, 0, 5, 0, 0, 0, 0, 3, 100, 98, 114, 101, 102, 0, 37, 0, 0, 0, 2, 36, 114, 101, 102, 0, 5, 0, 0, 0, 116, 101, 115, 116, 0, 7, 36, 105, 100, 0, 161, 190, 98, 75, 2, 180, 1, 0, 0, 2, 0, 0, 0, 10, 110, 117, 108, 108, 0, 0];


320 line file -> 38 line file 😌 I felt this was just the right thing to do

nbbeeken · 2020-09-15T21:16:17Z

tsconfig.json

    // API-extractor makes use of the declarations, npm script should be cleaning these up
    "declaration": true,
-    "declarationMap": true
+    "declarationMap": true,
+    "types": []


This is to avoid any types being relied upon, namely the @types/node I want there to always have to be an import {Buffer} from 'buffer' where needed, and not rely on global namespace types.

nbbeeken · 2020-09-15T21:16:53Z

tsconfig.json

@@ -20,10 +20,11 @@
    // Generate separate source maps files with sourceContent included
    "sourceMap": true,
    "inlineSourceMap": false,
-    "inlineSources": true,
+    "inlineSources": false,


To sync with the driver, src will be shipped, no need to inline sources.

nbbeeken · 2020-09-15T21:18:47Z

src/binary.ts

      !(typeof buffer === 'string') &&
-      !Buffer.isBuffer(buffer) &&
-      !(buffer instanceof Uint8Array) &&
+      !ArrayBuffer.isView(buffer) &&


ArrayBuffer.isView() returns true for any typed array class, this includes Buffer since it is a subclass of Uint8Array, as well a DataView. ensureBuffer now handles bytes-ifying any typed array / view

reggi

💯++

src/binary.ts

mbroadst · 2020-09-16T18:32:37Z

src/binary.ts

+      this.position = 0;
+    } else if (typeof buffer === 'string') {
+      // string
+      this.buffer = writeStringToArray(buffer);


can't we use Buffer.from for everything? It takes array|string|arrayBuffer etc

I don't think there's anything that could be made more simple here, there's a slight difference to each case.

Somewhat related the ensureBuffer function is used at the top level deserialize functions so I've updated those types:

bson.ts:202

bson.ts:245

src/binary.ts

mbroadst · 2020-09-16T19:22:45Z

src/objectid.ts

-    } else {
-      time = this.generationTime;
-    }
+    const time = this.id.readUInt32BE(0, false);


where are these false parameters coming from? I don't see them in the node docs

Oops legacy node pre 1.0 arguments, removed.

src/parser/deserializer.ts

mbroadst · 2020-09-16T19:27:20Z

src/parser/serializer.ts

@@ -368,7 +369,7 @@ function serializeBuffer(
  index = index + numberOfWrittenBytes;
  buffer[index++] = 0;
  // Get size of the buffer (current write point)
-  const size = value.length;
+  const size = value.byteLength;


was this just a bug? There is a difference between these two values

Now that any TypedArrays land in this path of serialization it's important to use the byteLength. Float32Array([1.2, 3.4]) has length 2 but byteLength 8. If it were only int8 types arriving here then length and byteLength would be the same.

Now that any TypedArrays land in this path

Just in terms of typing right? TypedArrays have always been able to take this path. Does this mean we've been serializing them incorrectly all along? Might be a case for making more of a breaking change if it never worked correctly in the first place

There was a modification in the serialization that led to this (because the type annotations followed the change it was having me change this to byteLength) this has be fixed

mbroadst · 2020-09-16T19:30:13Z

src/parser/utils.ts

@@ -1,3 +1,14 @@
+import { Buffer } from 'buffer';
+export type BufferEncoding =


is this really not provided by the node or buffer types?

It's there but its not exported, maybe we should just accept string, it would mean not having to ever change this if there was some new format. (waiting on utf32 to take off 😆)

src/parser/utils.ts

src/binary.ts

src/fnv1a.ts

Update code to rely on Buffer class presence, removes redundant code to handle supporting typed arrays internally TypedArrays are serialized as BSON Binary

If any typed array other than Uint8 is passed as an object value it will be serialized in the same form that JSON supports. Which is a plain object of keys which are stringified numbers that map to each value in the array

…rays

mbroadst

🎉 🚀

nbbeeken requested review from reggi, emadum and mbroadst September 15, 2020 21:11

nbbeeken changed the title ~~refactor(Buffer): ♻️ Consolidate code based on Buffer Presence~~ refactor(Buffer): Consolidate code based on Buffer Presence Sep 15, 2020

nbbeeken commented Sep 15, 2020

View reviewed changes

reggi approved these changes Sep 16, 2020

View reviewed changes

mbroadst suggested changes Sep 16, 2020

View reviewed changes

reggi mentioned this pull request Sep 17, 2020

NODE-2724/types-against-driver #395

Merged

mbroadst reviewed Sep 17, 2020

View reviewed changes

src/binary.ts Outdated Show resolved Hide resolved

src/fnv1a.ts Outdated Show resolved Hide resolved

nbbeeken added 7 commits September 18, 2020 11:15

refactor(Buffer): ♻️ Consolidate code based on Buffer Presence

6e77012

Update code to rely on Buffer class presence, removes redundant code to handle supporting typed arrays internally TypedArrays are serialized as BSON Binary

fix: 🎨 Address comments

dfef14e

fix: cleanup unused var

9a38ebf

fix: 🐛 Fix Binary EJSON interface

379df56

feat: 🎨 Add more flexible typing to top level API

ca0b6da

fix: 🔥 Remove support for any typed array

e12930d

If any typed array other than Uint8 is passed as an object value it will be serialized in the same form that JSON supports. Which is a plain object of keys which are stringified numbers that map to each value in the array

test: 🐛 Test for object with stringified numbers as keys for typed ar…

cb85187

…rays

nbbeeken force-pushed the NODE-2722/arraybuffer branch from cccc000 to cb85187 Compare September 18, 2020 15:17

nbbeeken added 3 commits September 18, 2020 11:29

fix legacy buffer read/write calls

8b6b902

Revert ArrayBuffer serialization changes

2833da1

Fix arraybuffer test

2637c77

nbbeeken requested a review from mbroadst September 18, 2020 16:48

mbroadst approved these changes Sep 18, 2020

View reviewed changes

nbbeeken changed the title ~~refactor(Buffer): Consolidate code based on Buffer Presence~~ refactor: Consolidate code based on Buffer Presence Sep 18, 2020

nbbeeken merged commit f55eeed into master Sep 18, 2020

nbbeeken deleted the NODE-2722/arraybuffer branch September 18, 2020 20:59

nbbeeken mentioned this pull request Feb 1, 2024

fix(NODE-5873): objectId symbol property not defined on instances from cross cjs and mjs #643

Merged

5 tasks

		@@ -1,3 +1,14 @@
		import { Buffer } from 'buffer';
		export type BufferEncoding =

refactor: Consolidate code based on Buffer Presence #394

refactor: Consolidate code based on Buffer Presence #394

Uh oh!

Conversation

nbbeeken commented Sep 15, 2020

Description

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reggi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mbroadst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!