Skip to content

Documentation for copy from/to #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
porsager opened this issue Sep 19, 2021 · 13 comments · Fixed by #259
Closed

Documentation for copy from/to #230

porsager opened this issue Sep 19, 2021 · 13 comments · Fixed by #259
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@porsager
Copy link
Owner

If anyone would like to contribute some documentation for the copy from/to streams features added, I'd be very grateful.

Maybe you wanna take a look at it @uasan ?

@porsager porsager added the documentation Improvements or additions to documentation label Sep 19, 2021
@porsager porsager added this to the v2 milestone Sep 19, 2021
@uasan
Copy link

uasan commented Sep 19, 2021

Maybe you wanna take a look at it @uasan ?

I can see, but we started developing our own postgres client.

Features of our client

  1. Binary protocol for support all types by ParameterDescription and RowDescription
  2. Async flow requests, we do not wait ReadyForQuery, to send a new request
  3. WebStream on promises instead of Stream node.js on callbacks

By the way, thanks to the second point, in the benchmark our results are better by about 2x )

@porsager
Copy link
Owner Author

Ah interesting.

Anything you're gonna share at some point? Always interesting to see other takes at something like this.

  1. Cool. I've also explored adding support for the binary protocol, so would be nice to hear your experiences doing that.
  2. Yeah that's how Postgres.js does it
  3. Care to elaborate? Not sure I understand.

@uasan
Copy link

uasan commented Sep 19, 2021

  1. Binary protocol, gives better typing to nested structures, arrays and records inside store oid types of elements, you do not need to parse a string, and less bytes for some data types, for example, uuid - 16 bytes instead of 36 in text form, timestamp - 8 bytes instead of 24 in text form
  2. When I tested, for example await Promise.all(sql`SELECT pg_sleep(3)`, sql`SELECT 2`) the second request will be written into socket only after 3 seconds, when the response from the first request arrives, but you can write into socket without waiting for responses to the previous requests
  3. I use for COPY and other API: https://nodejs.org/api/webstreams.html instead of https://nodejs.org/api/stream.html

@porsager
Copy link
Owner Author

  1. Neat, and you haven't bumped into anything unexpected yet?
  2. How did you check that it first sent the 2nd after 3 seconds?? Would love to see a reproducible case of that happening
  3. Ah for COPY, I see..

@uasan
Copy link

uasan commented Sep 19, 2021

How did you check that it first sent the 2nd after 3 seconds?? Would love to see a reproducible case of that happening

Just the console.log() in your source code where writing to socket comes in, maybe I was mistaken, but you check if this is a bug, after fixing your results in the benchmark will improve

@uasan
Copy link

uasan commented Sep 19, 2021

Neat, and you haven't bumped into anything unexpected yet?

Yes, it will not work to give up the text format, there are data types that do not have a binary format.

@porsager
Copy link
Owner Author

Everything looks good from here, but if you can repro, be sure to let me know!

If you're seeing 2x improvement compared to Postgres.Js I'd love to know!

Right, I thought there was some downside which kept me from diving deeper back then. I suppose it could be neat as opt in for those cases where the types work and the performance benefit is worth it.

@uasan
Copy link

uasan commented Sep 19, 2021

Everything looks good from here, but if you can repro, be sure to let me know!

Strange, because that was the only explanation for such a difference in our speed.
If I can open the source, I'll let you know.

@uasan
Copy link

uasan commented Sep 25, 2021

Hello @porsager, I published the alpha version of my postgres client
https://github.com/uasan/postgres

I wrote a benchmark script, it utilizes the processor by 100%, the number of RPS is displayed in the console.

node ./src/tests/bench.js

To start your client, you need to pass a parameter postgres to this script

node ./src/tests/bench.js postgres

It will show the number of RPS of your client, the difference is more than 2x.
I am interested to understand what is the reason for this difference, if you have any questions, write to issues in my repo.

There is also a test script, binary encoding/decoding parameters, you may be interested.

node ./src/tests/types.js

@porsager
Copy link
Owner Author

porsager commented Sep 26, 2021

Hey @uasan .. That's awesome! Good job!

I see a 35% improvement and not 100% as you mention when I activate prepared statements for unsafe (you need to pass { prepare: true }) in options. prepared statements are only enabled by default when using sql``.

That's still really cool though — I suppose it's due to using the binary protocol instead of only text? What do you think?

@uasan
Copy link

uasan commented Sep 26, 2021

(you need to pass { prepare: true })

Ok, I didn't know )

That's still really cool though — I suppose it's due to using the binary protocol instead of only text? What do you think?

The binary protocol gives acceleration on large responses, in this benchmark there is a very small response size, so I think the difference is due to more optimal I/O and reuse buffers.
But I just started optimization, all the fun is still ahead.

@uasan
Copy link

uasan commented Sep 26, 2021

If performance.eventLoopUtilization().utilization show 100% utilization, this means that instead of sending and receiving new requests to Postgres, JS is execute many synchronous operations - this bottleneck.

@porsager
Copy link
Owner Author

porsager commented Sep 27, 2021

You're right, it's not because of binary. I dug a bit deeper yesterday, and I actually found 2 things that I was doing wrong.

The first was that I sent flush before sync after execute. Simply fixing that gave about 15%.
Then I'm also unnecessarily calling describe for each query, so that was another 10%..

It's gonna be nice to get these fixes in, so thanks a lot for bringing this to my attention!

@porsager porsager mentioned this issue Jan 11, 2022
Merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants