add mutable configuration to optimize ReadMessage performance #843

lesismal · 2023-07-22T11:12:12Z

Summary of Changes

add mutable configuration for Upgrader and Conn
support reusing the same buffer to read a message and then improve ReadMessage performance
new configuration field is set to false by default to compatible with old versions

Explain
The common processes for the ws message is:

Read a message;
Unmarshal the message to another variable(struct or other types), usually we are still in the same goroutine as step 1;
Use the variable.

The current version, ReadMessage calls ReadAll, which uses new buffer to read data every time.
But we usually drop the buffer after step 2, and it wastes too much.
ReadAll's append is also slow when the message is bigger than 512:
https://github.com/golang/go/blob/master/src/io/io.go#L702
https://github.com/golang/go/blob/master/src/io/io.go#L715

PS: Make sure your PR includes/updates tests! If you need help with this part, just ask!

package tests passed.
I also add benchmarks for the new feature and got better performance compared with old usage:

pkg: github.com/gorilla/websocket
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BenchmarkReadMessageReadAll-8   	  811484	      1473 ns/op	    2896 B/op	       6 allocs/op
BenchmarkReadMessageMutable-8   	 2054629	       581.2 ns/op	      80 B/op	       3 allocs/op
PASS
ok  	github.com/gorilla/websocket	3.008s

lesismal · 2023-07-22T11:53:55Z

Most of the users use ReadMessage directly, including some well-designed and well-known frameworks such as melody:
https://github.com/olahol/melody/blob/master/session.go#L129

And for most of the users, it's a little complex to optimize the ReadMessage by using NextReader.
It will be much more convenient if provide this configuration.

FZambia · 2023-07-22T12:39:40Z

Hello!

Does this mean that every connection will bring an additional constant memory overhead equal to the largest message read from the connection? Given that memory consumption of Gorilla WebSocket lib is one of the most difficult things to solve by itself at this point it's hard to justify additional overheads IMO.

As you pointed by using NextReader it seems already possible to achieve a similar effect and avoid attaching additional slice of bytes to the connection.

lesismal · 2023-07-22T12:53:26Z

Does this mean that every connection will bring an additional constant memory overhead equal to the largest message read from the connection?

Yes. But performs much better.
Here I write some benchmark to compare different frameworks:
https://github.com/lesismal/go-websocket-benchmark

In the beginning, I use ReadMessage directly for gorilla, and it performs very slowly. Maybe that's why nhooyr says in the doc:

1.75x faster WebSocket masking implementation in pure Go
Gorilla's implementation is slower and uses [unsafe](https://golang.org/pkg/unsafe/).

As you pointed by using NextReader it seems already possible to achieve a similar effect and avoid attaching additional slice of bytes to the connection.

Yes, NextReader can do it. But as I said:

And for most of the users, it's a little complex to optimize the ReadMessage by using NextReader.
It will be much more convenient if provide this configuration.

@FZambia

lesismal · 2023-07-22T13:01:00Z

Given that memory consumption of Gorilla WebSocket lib is one of the most difficult things to solve by itself at this point it's hard to justify additional overheads IMO.

@FZambia
As for the memory cost problem, all std-based frameworks are not easy to solve the problem.
But can try my lib using nonblocking mod: https://github.com/lesismal/nbio

Here's also a simple benchmark for 1m-websocket-connections on low hardware such as server running on 4 cpu cores and costing only 2g mem, for more details please refer to the code and script:
https://github.com/lesismal/go-websocket-benchmark
https://github.com/lesismal/go-websocket-benchmark/blob/main/script/1m_conns_benchmark.sh

lesismal · 2023-07-22T14:17:53Z

If worry about holding large buffer, can easily reset the buffer just before each calling for ReadMessage:

// Of course we can get the buffer from some pool
conn.SetMutableBuffer(make([]byte, InitReadBufferSize))
for {
	if len(conn.MutableBuffer()) > MaxReadBufferSize {
		// Maybe put the old conn.MutableBuffer() to some pool first

		// Of course we can get the buffer from some pool
		conn.SetMutableBuffer(make([]byte, InitReadBufferSize))
	}
	mt, msg, err := conn.ReadMessage()
	if err {
		//...
		return
	}
}

That's still much easier than using NextReader.

ghost · 2023-07-22T15:17:02Z

This comment proposes an alternative to the PR. The benefits of this alternative compared to the PR are listed below. If buffer reuse is a problem that needs to be solved, then this alternative should be considered.

Here's how to reuse a buffer using the package as is:

var buf bytes.Buffer
for {
	t, r, err := conn.NextReader()
	if err != nil {
		return
	}
	buf.Reset()
	_, err  = buf.ReadFrom(r)
	if err != nil {
		return
	}
	fmt.Printf("type=%d, message=%s\n", t, buf.Bytes())
}

This code is easy to write and understand, but bundling it up in a helper method might be nice:

var buf bytes.Buffer
for {
    t, err := conn.ReadMessageTo(&buf)
    if err != nil { return }
    fmt.Printf("type=%d, message=%s\n", t, buf.Bytes())
}

The helper method is:

// ReadMessageTo is a helper method that resets the buffer argument and 
// copies the next message to the buffer.
func (c *Conn) ReadMessageTo(buf *bytes.Buffer) (messageType int, err error) { 
	buf.Reset()
	t, r, err := conn.NextReader()
	if err != nil {
		return t, err
	}
	_, err  = buf.ReadFrom(r)
	return t, err
}

This proposal has the following benefits over the PR:

No "action at a distance" where a change to the configuration changes the functionality of a method.
The feature is exposed in a single standalone method. The PR adds four methods and an Upgrader field.

ReadMessage, ReadJSON, and my proposed ReadMessageTo should be helper functions instead of helper methods. Because we already have ReadMessage and ReadJSON methods, ReadMessageTo should be a method for consistency.

Here's how to limit the size of the buffer:

buf := bytes.NewBuffer(make([]byte, initBufferSize)
for {
    t, err := conn.ReadMessageTo(&buf)
    if err != nil { return }
    fmt.Printf("type=%d, message=%s\n", t, buf.Bytes())
    if buf.Len() > MaxReadBufferSize {
      buf = bytes.NewBuffer(make([]byte, initBufferSize)
    }
}

lesismal · 2023-07-22T16:03:05Z

@mylo88

bytes.Buffer version implements similar mem usage but cost a little more CPU logic than raw []byte operations.

If users need to call the new method ReadMessageTo, I don't think it's more convenient than this pr, and not get a []byte directly but need to use buf.Bytes(), that seems changed more than old version and this pr.

coreydaley · 2023-07-22T16:06:17Z

Thank you for the pull request! We will put it in the queue to review as soon as we have completed the project transition at the end of July!

ghost · 2023-07-22T16:48:54Z

bytes.Buffer version implements similar mem usage but cost a little more CPU logic than raw []byte operations.

The performance of bytes.Buffer vs []byte is not a concern. Any extra runtime cost for using a bytes.Buffer is negligible compared to the cost of a single read to the network stack.

I don't think it's more convenient than this pr

The number of lines that a developer writes to enable buffer reuse in the PR is shorter than what I propose, but lines of code is not all that should be considered. The overall complexity if an API should also be considered. The developer cost of understanding an API can outweigh the cost of writing a couple of extra lines of code.

The PR adds a configuration option and a method that interact to fundamentally change how ReadMessage works. Action at a distance is complex.

The PR adds significantly to the API surface area (four methods and one new configuration field). Even if these elements are easy to understand in isolation, the increased surface area adds to the cognitive load required to understand the package. (The reason that I said that ReadJSON, ReadMessage and ReadMessageTo should be standalone helper functions instead of methods is that this design reduces the API surface area for Conn. The reduced surface area makes Conn easier to understand.)

When doing research this comment, I noticed that the PR does not add a Mutable field to Dialer. If the lack of symmetry between client and server is not intentional, then the omission shows how action at a distance can be complex.

lesismal · 2023-07-22T17:18:02Z

The number of lines that a developer writes to enable buffer reuse in the PR is shorter than what I propose, but lines of code is not all that should be considered.

Since they implement similar things, why not use a short version?

The overall complexity if an API should also be considered. The developer cost of understanding an API can outweigh the cost of writing a couple of extra lines of code.
The PR adds significantly to the API surface area (four methods and one new configuration field). Even if these elements are easy to understand in isolation, the increased surface area adds to the cognitive load required to understand the package. (The reason that I said that ReadJSON, ReadMessage and ReadMessageTo should be standalone helper functions instead of methods is that this design reduces the API surface area for Conn. The reduced surface area makes Conn easier to understand.)

I do agree with this point, and that's also the reason why do most developers use ReadMessage rather than NextReader, and also is the reason why we'd better don't assume most of the users would use helper-method like ReadMessageTo.
Coz most of the users haven't even understood NextReader since there has been a ReadMessage.

In other frameworks, or just Upgrader itself, the configurations has been a way that users used to.
Another popular way is Options/WithOptions, also works but seems more complex and ugly than configurations.

I think there's no need to explain too much about this PR's code, but just 1 or 2 simple examples will be fine to show how to optimize performance and mem cost.

The PR adds a configuration option and a method that interact to fundamentally change how ReadMessage works. Action at a distance is complex.

The configuration is easy if users read the comments, and no need to change other logic code if not mind the large buffer problem.

I think it's not correct to say that interact to fundamentally change how ReadMessage works, the default configuration keep compatible with old versions. If users did nothing to their code but just upgrade gorilla/websocket, nothing changed(ignoring the if-else judgment and two variables' mem cost).

When doing research this comment, I noticed that the PR does not add a Mutable field to Dialer. If the lack of symmetry between client and server is not intentional, then the omission shows how action at a distance can be complex.

i didn't intend to add the SetMutable methods, but just because Dialer.Mutable seems weird, so I add SetMutable for users to set it after Dial. Mutable is added as a getter/setter twin.
If you guys think Dialer.Mutable is acceptable, then we do it.

ghost · 2023-07-22T20:13:15Z

There are at least two reasons why developers use ReadMessage:

The application requires an immutable copy of the message.
The application does not require the more complex NextReader code because the performance cost of allocating a buffer in ReadMessage is not an issue.

I agree that the addition of the mutable switch does not break existing applications. My complaint is with the complexity exposed to the application developer. The switch fundamentally change the expectations for the return value from ReadMessage and this action at a distance is inherently complex.

I am not disputing that the API in this PR can sufficiently documented. My complaint is that the API is complex compared to an alternative and that the API surface area hurts the overall usability of the package.

Dialer and Upgrader are the respective client and server configurations for a connection. It's weird to add a field to one of these types when the configuration can apply to both client and server connections.

If this PR does proceed, then the term "Mutable" in the exported API should be replaced by "ReuseReadMessageBuffer" or something similar. The term "Mutable" does not carry enough context.

Is there evidence that a developer identified the buffer allocation in ReadMessage as a performance problem, but that developer was incapable of writing the code in my first comment? Without evidence, no change should be made to the API including my proposal. Additions to an API should carry their weight.

lesismal · 2023-07-23T02:24:54Z

Let's see gofiber:

app := fiber.New(fiber.Config{
	Immutable: true, // default is 
})

And Immutable is false by default:
gofiber/fiber#2242

If be afraid of complexity, it will be hard to go ahead.

Is there evidence that a developer identified the buffer allocation in ReadMessage as a performance problem, but that developer was incapable of writing the code in my first comment? Without evidence, no change should be made to the API including my proposal. Additions to an API should carry their weight.

For most of the users, performance is not a problem even if they use py or php.
Some of the users want higher performance but haven't aware that ReadMessage is slow.

The switch fundamentally change the expectations for the return value from ReadMessage and this action at a distance is inherently complex.

Plz refer to The common processes for the ws message is in my first comments.

lesismal · 2023-07-23T02:34:29Z

Not only melody, but also other well-known frameworks which have lots of stars, should need higher performance but still use ReadMessage directly:
https://github.com/OpenIMSDK/Open-IM-Server/blob/main/internal/msggateway/long_conn.go#L85
https://github.com/OpenIMSDK/Open-IM-Server/blob/main/internal/msggateway/client.go#L125
https://github.com/OpenIMSDK/Open-IM-Server/blob/main/internal/msggateway/client.go#L138
https://github.com/OpenIMSDK/Open-IM-Server/blob/main/internal/msggateway/client.go#L166

Do you know why they still use ReadMessage? Because most of the users are even not aware that ReadMessage is slow, including me(for a long time since the first time I use gorilla).

How to process ws message is just like I said, it's the most popular way:

FZambia · 2023-07-23T05:26:38Z

I agree that this PR adds a lot to the API surface while the only way to avoid unexpected memory growth is to use all those getters/setters. Simply setting mutable option to true is the hidden bomb which potentially may result into more issues. I can easily imagine 2x memory usage growth due to using the option.

Proper usage of proposed API without memory issue requires too much knowledge about internals and all these getter/setter manipulations with per-connection buffer are unobvious. Having this type of API in Gorilla WebSocket may eventually be addressed as a problem in the library.

At the same time this package already provides API to achieve the desired performance speedup using NextReader. So I think whether it's better to concentrate on documenting it instead of translating code like this to the end users. I am sure developers who can do all these manipulations with per-connection buffer can improve performance with NextReader if the performance improvement is required.

lesismal · 2023-07-23T06:16:56Z

As long as more users can get benefits, it doesn't matter whether the pr is merged or not.
Maybe just showing some prominent examples in the documentation will work.

I'll leave this pr opening for discussion until you guys make the final decision, then please feel free to close it.

ghost · 2024-02-15T07:20:37Z

@jaitaiwan This PR should be closed. The comment by FZambia is a good summary of the problems with the PR.

lesismal added 2 commits July 22, 2023 18:28

add mutable buffer configuration for ReadMessage

588479a

add benchmark for mutable configuration

df01b34

lesismal mentioned this pull request Jul 22, 2023

optimize reading performance olahol/melody#87

Open

reset mutableBuf before first read

ad4727f

lesismal mentioned this pull request Jul 23, 2023

Optimize websocket Conn usage openimsdk/open-im-server#635

Closed

2 tasks

lesismal closed this Feb 15, 2024

add mutable configuration to optimize ReadMessage performance #843

add mutable configuration to optimize ReadMessage performance #843

Uh oh!

Conversation

lesismal commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesismal commented Jul 22, 2023

Uh oh!

FZambia commented Jul 22, 2023

Uh oh!

lesismal commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesismal commented Jul 22, 2023

Uh oh!

lesismal commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jul 22, 2023 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesismal commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coreydaley commented Jul 22, 2023

Uh oh!

ghost commented Jul 22, 2023

Uh oh!

lesismal commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Jul 22, 2023

Uh oh!

lesismal commented Jul 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesismal commented Jul 23, 2023

Uh oh!

FZambia commented Jul 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesismal commented Jul 23, 2023

Uh oh!

ghost commented Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lesismal commented Jul 22, 2023 •

edited

Loading

lesismal commented Jul 22, 2023 •

edited

Loading

lesismal commented Jul 22, 2023 •

edited

Loading

ghost commented Jul 22, 2023 •

edited by ghost

Loading

lesismal commented Jul 22, 2023 •

edited

Loading

lesismal commented Jul 22, 2023 •

edited

Loading

lesismal commented Jul 23, 2023 •

edited

Loading

FZambia commented Jul 23, 2023 •

edited

Loading