net: conn.Read() methods seem to thrash CPU, please return on insufficient data #27315

matthalladay · 2018-08-28T17:14:49Z

Version: Go 1.10
Does it reproduce: I'm sure, this is more of a request than an issue
Ubuntu 16.04 AMD64
What did I do...
It is likely that I missed something, but in implementing a network stream collector which sits and spins on arriving UDP messages (arrive every 40 microseconds) I see quite heavy CPU utilization.

This is using conn.SetReadDeadline with conn.Read. Note that the conn.SetReadDeadline is pulled out to a larger interval of every 25-50 packets rather than every packet. This same interval is used to insert Sleep() calls that allow the port to buffer packets for increased CPU efficiency. Somewhat complicated, but the end CPU utilization was around 1% CPU on Linux, down from utilizing most of a core.

Much of this is because when conn.Read() is called prior to a packet arriving the CPU spins on the read until data arrives, or until the ReadDeadline is hit. This would be much simpler with a method that returns on insufficient data (perhaps conn.TryRead()), or surfacing a conn.Length().

To this end I changed a line in the fd_unix.go file on my local install to achieve a conn.Read method that simple returns when data is unavailable. I understand that this would break backward compatibility, so am not suggesting a change to conn.Read(), but rather the addition of a method which allows for the non-blocking read call as this avoids burdening the kernel with repeated polling and in my experience results in fairly substantial CPU gains.

Additional benefit is that when working with Windows (assume a similar addition is possible there) - Sleep timeouts and timer precision are both much less granular.

I believe this addition will aid all streaming protocols and applications which want to maximize the number of streams and connections they are processing.

Inside fd_unix.go

// Read implements io.Reader.
func (fd *FD) Read(p []byte) (int, error) {
	if err := fd.readLock(); err != nil {
		return 0, err
	}
	defer fd.readUnlock()
	if len(p) == 0 {
		// If the caller wanted a zero byte read, return immediately
		// without trying (but after acquiring the readLock).
		// Otherwise syscall.Read returns 0, nil which looks like
		// io.EOF.
		// TODO(bradfitz): make it wait for readability? (Issue 15735)
		return 0, nil
	}
	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return 0, err
	}
	if fd.IsStream && len(p) > maxRW {
		p = p[:maxRW]
	}
	for {
		n, err := syscall.Read(fd.Sysfd, p)
		if err != nil {
			n = 0
			// cannot return from here as this results in many "resource temporarily unavailable" errors
			if err == syscall.EAGAIN && fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					//continue
					// replaced continue call with return n, err
					return n, err
					
				}
			}

			// On MacOS we can see EINTR here if the user
			// pressed ^Z.  See issue #22838.
			if runtime.GOOS == "darwin" && err == syscall.EINTR {
				continue
			}
		}
		err = fd.eofError(n, err)
		return n, err
	}
}

Am I missing some call which achieves a non-blocking read? Thanks for you time in reading this.

ianlancetaylor · 2018-08-28T23:59:57Z

What are you really trying to do? Why are you using SetReadDeadline? How many concurrent connections do you need to support?

Have you considered not using SetReadDeadline, but instead using an ordinary Read and sending the results on a channel? Then some other goroutine can use timers and read from the channel as it pleases.

mattha78 · 2018-08-29T04:40:16Z

Thanks for the response. Answers below.
What are you really trying to do?
Developing a stream processing appliance. It accepts MHz sampling rate streams from multiple attached devices.

Why are you using SetReadDeadline?
It publishes an error if the connection receives no data for the set amount of time. SetReadDeadline is reset outside of an inner read and process loop. In profiling, it does not take much CPU to reset the Deadline every couple hundred packets.

How many concurrent connections do you need to support?
Requirements say 5 concurrent connections at the MHz rate. each connection with 9 channels, so 45 million samples per second, or a packet every ~5 us, each on different ports. The appliance also filters, triggers, indexes and archives the data, so it is important enough to optimize CPU during collection that I've spent several days off and on optimizing this collector.

Have you considered not using SetReadDeadline, but instead using an ordinary Read and sending the results on a channel?
The SetReadDeadline does not appear to be a factor right now. I am essentially using an ordinary Read, the results are pushed onto a circular buffer - channels are not performant enough to handle the data rates, especially with the multiple downstream readers.

Then some other goroutine can use timers and read from the channel as it pleases.
I don't believe timers would accomplish anything different that relinquishing to the scheduler via occasional Sleeps when the buffer is emptied.

I don't mind essentially forking the functionality from conn out into a new specialized UDP reader, but believe that my extreme scenario is showing an area where there is low hanging fruit for optimization. I believe many out there might be spinning up a Go routine to process a stream of data from a port. The obvious and simple way to occomplish this is with a read loop continually hitting the port with bocking conn.Read() calls. This uses quite a bit of CPU. In my case, where I can allow the buffer to fill for a millisecond, then process repeatedly, having a read call that returns immedietly when there is insufficient data simplifies the timing logic that would otherwise be needed to determine that the buffer is empty and polling is occurring.

ianlancetaylor · 2018-08-29T04:47:54Z

Thanks for the detailed reply. I suspect that for your use case you will be better off using syscall.Read. You want complete control over your descriptor; the poller is just adding overhead.

matthalladay · 2018-08-29T17:47:26Z

My design when using a non-polled read is more transparent and CPU friendly. I'll look into syscall.Read, though quite a few of the internal helper functions are not exported so my initial thought is that this will require a bit of code to get it working. I haven't looked in detail yet though. Thanks for the quick replies. Though I am developing on Linux and targeting Linux we are keeping cross compilation as an option. In additional to simplifying timer logic across operating systems and platforms, a non-polled read might also reduce the large ntoskrnl.exe burden that we see on Windows - up to 5% per stream. I haven't tested this yet. If you think others might benefit and I can help, let me know.

ianlancetaylor · 2018-08-29T18:49:11Z

I personally would be extremely reluctant to add a non-polled (*net.UDPConn).Read to the standard library. That is encouraging a coding style that we do not recommend in Go. It would step away from the advantages of using an integrated poller for all network connections. It would have be plumbed through several layers of code. I do not think we should follow this path.

Looking back at your initial report, I noticed this:

Much of this is because when conn.Read() is called prior to a packet arriving the CPU spins on the read until data arrives, or until the ReadDeadline is hit.

That is not accurate. The code does not spin. SetReadDeadline is implemented by setting up a timer. The timer goroutine will sleep until the specified time, and then wake up the goroutine with the deadline. There is no spinning involved. Of course, if your deadline is very short, the overhead of the timer will indeed cause increased CPU use. Still, it might be interesting to see whether there is some inefficiency in the runtime package here.

matthalladay · 2018-08-29T20:03:44Z

I didn't mean to imply that SetReadDeadline causes the CPU to spins, but rather requesting a Read when a packet is not yet available. The SetReadDeadline only happens rarely, it does not use any noticeable CPU unless set on every read, in which case it does, but I see no reason to set it every time unless using it to set a deadline for slowly arriving messages or a single response.

I appreciate the push to use syscall.Read, though initially I thought it would be a bit painful there are only a couple of line of code that required a change. There aren't a ton of examples so I'm pasting the gist below.
*Notes

syscall.SetNonblock is required otherwise it is a blocking read
I have not gone through the different return paths from syscall.Read, so ymmv on checking for <= 0 for n
This should work for udp and tcp equally well
I've only tested it briefly and only on Ubuntu 18.04

func (t *FTDV) run2(conn *net.UDPConn) {
	defer conn.Close()

	file, _ := conn.File()
	// handle error here
	fd := int(file.Fd())
	syscall.SetNonblock(fd, true) // otherwise it is a blocking read

	pkt := make([]byte, ftdvUDPPktSize)
...
	for {
		n, err := syscall.Read(fd, pkt)
		if n <= 0 { // returns -1 with "resource not available" as the error
			// relinquish, publish metrics, etc. likely that no more data is available
			continue
		}
		if err != nil {
			...
			// handle other errors - haven't looked to see how this corresponds
			// to 'n' so better processing may be needed here or better logic with
			// the 'n <= 0' call
			continue
		}
		// parsing logic here
		// at first I expect to potentially get the UDP packet header, but this wasn't the case
		// performs the same to adding in a return to the core conn.Read() function, thus
		// non-blocking used there as well?
	}
}

Though the logic is a little different between the two read methods I believe most of the CPU performance increase is in avoiding the polled read. Using a 1 ms sleep each time the buffer is emptied (either by time to process 25 packets with the polled read. Since each packet arrives at a precise 40 us interval, but the processing time is much faster than that, every 25 packets the time to process is compared against 350us. If it takes longer, then the read loop sleeps for 1 ms. In the case of the syscall.Read, whenever n <= 0 it sleeps for 1 ms.

syscall.Read - 7.5% of a core, 8 cores
conn.Read - 45.6% of a core, 8 cores

You're probably right regarding the integrated poller, etc. For my scenario, a non-blocking read would be used and having it exist within the fabric of the internals code would also be appreciated. This would be a Read that returns when there is no data available. There is no noticeable performance delta between syscall.Read and conn.Read with the continue being replace with a "return n, err".

A non-blocking read would also make it possible to use a pseudo read-time scheduler with a hot thread per work queue, though the strongest argument for it is the reduction in CPU usage. There are quite a few project in Go that stand up read processes on ports.

Thanks again and tell me if I can help with anything.

ianlancetaylor · 2018-08-29T20:13:17Z

I didn't mean to imply that SetReadDeadline causes the CPU to spins, but rather requesting a Read when a packet is not yet available.

The CPU doesn't spin there either, though. The goroutine sleeps waiting in the poller.

Glad that syscall.Read is working for you.

matthalladay · 2018-08-29T20:46:22Z

Hopefully we can delete comments if my ignorance becomes too obvious. Looking at the fd_unix.go file (line 133), in the case where no data is available (potentially just when the connection is busy... not sure on this), the logic for (err != nil), (err == syscall.EAGAIN && fd.pd.pollable()), (err == nil) is taken over an over again. There does not appear to be any spot here where the process would sleep. When calling syscall.Read() directly without syscall.SetNonBlock(), the call only returns with data. When blocking is set to false this starts to return without data and with an error.

I guess the for loop in FD.Read() seems like the poller since this is where syscall.Read is iterated over, but I don't see a place where a sleep happens.

ianlancetaylor · 2018-08-29T22:29:43Z

In the EAGAIN case in internal/poll.(*FD).Read, the sleep occurs in the call to fd.pd.waitRead. It's this line: https://golang.org/src/internal/poll/fd_unix.go#L169 .

matthalladay · 2018-08-30T00:19:25Z

Yes, I see now. If it is possible in the future to see an indication of whether waitRead is being hit I could use that rather than syscall.Read as an indication that the buffers are empty. Optionally some direct call that returns the current conn buffer level would also serve.

Thanks for the help.

matthalladay · 2018-09-04T20:10:13Z

Update on syscall.Read in Go 1.11. Using the example code above the syscall.Reads will start failing after about a minute on "resource not available". A band-aid to this is to occasionally re-request the fd and set it to non-blocking, but this kind of starts to seem like magic. Do you know why the fd needs to be re-associated with the variable? Are syscalls not touching it in a way that the GC knows about it?

methane · 2018-10-26T09:40:26Z

I want Non-blocking version of Read (maybe, ReadOnce?) too.

In my case, I want to check TCP connection is closed or not, before sending packet on the connection.
See go-sql-driver/mysql#882.
If we can check "connection closed by server" before sending query, we can retry the query with new connection.

I tried to implement it with SyscallConn(). code

There are two problems:

syscall package is deprecated, and adding one more dependency (golang.org/x/sys) is hard decision for the project.
It's difficult to implement it in portable and right way.

There is another approach: use dedicated goroutine for reading.
But it increases CPU & memory usage. For request-response protocol, dedicated read goroutine is not idiomatic by performance point of view.
And it makes design complex and maintenance harder too.

mikioh · 2018-10-27T01:25:31Z

@methane,

Please open a new issue, perhaps a proposal might be better. To me, your statement If we can check "connection closed by server" before sending a query, we can retry ... sounds like we want to have extra control between the user-surface API and the underlying transport protocol; for the trinity of API, transport protocol and IO mechanism. IIUC, this issue focuses on the combination of net.Conn interface and connectionless datagram-oriented transport protocols such as UDP, but your statement implies the use of connection-oriented transport protocols.

troian · 2018-11-04T09:32:14Z

Looks like related to #15735

mattha78 · 2018-11-04T22:01:24Z

to clarify - my request is not related to 15735 and, in my opinion, net.Conn methods already allow for the 15735 request. What I want is non-blocking reads or the ability to see on the .Read return if it had to wait. As of now this is hidden. Having this simplifies cases where a programmer does not want to continue with repeated blocking reads when the port buffer is empty and awaiting more data. This can be accomplished through timing logic if one has a fixed interval stream of data, but the logic can be tricky as it depends on system burden and the hardware being used. It is not possible to easily write the logic for non-periodic streams. A non-blocking read tells you specifically whether there is data available or not. Another benefit realized with syscall.Read is a reduction in CPU, but this is secondary to the CPU reduction for non-blocking which would also be realized if the net.Conn library supported it. The CPU beefit I get from using non-blocking reads is a reduction from ~6% to 0.5%. By writing in a return on no data to the net.Conn there is a similar reduction, though not quite to far as there is still additional wrapper code around the syscall.Read calls. 6 down to 1% using Go. The additional drop from 1.0 down to 0.5 would not have been worth the time to use syscall.Read calls for me.

…

On Sun, Nov 4, 2018 at 1:33 AM Artur Troian ***@***.***> wrote: Looks like related to #15735 <#15735> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#27315 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQfI0rENwpigiEicjLPHac96XJwtwqV1ks5urqZggaJpZM4WQC60> .

troian · 2018-11-05T06:29:06Z

I doubt #15735 allows now to check if any data in socket available to read. Though it is mail goal of that

matthalladay · 2018-11-05T16:23:40Z

If #15735 desires immediate return of whether data is available, then sure. Go does allow for understanding whether data is available or not, just not with a non-blocking return. It is the non-blocking functionality that I desire, primarily for CPU usage reasons. Not needing to handle the non-blocking sleep logic would be even better. Another solution for higher performance applications might be to just supply a return value specifying whether conn.Read blocked. From: Artur Troian <[email protected]> To: golang/go <[email protected]> Cc: matthalladay <[email protected]>, State change <[email protected]> Date: 11/04/2018 10:31 PM Subject: Re: [golang/go] net: conn.Read() methods seem to thrash CPU, please return on insufficient data (#27315) I doubt #15735 [github.com] allows now to check if any data in socket available to read. Though it is mail goal of that ? You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub [github.com], or mute the thread [github.com].

ianlancetaylor · 2018-11-05T18:21:17Z

I'm sorry, I don't understand what this issue is about now. As @mikioh suggested, if you have a problem using syscall.Read, please open a new issue describing that problem. Thanks.

matthalladay · 2018-11-05T19:06:08Z

Problem: CPU is high when processing high bandwidth streamed data (40 us packet interval with 1500 byte packets). One method to lower CPU is to batch process buffered data with intermediate sleeps, e.g. sleep for 1 ms, process accumulated data until buffer is drained, sleep for 1 ms... Non-blocking reads simplify the sleep condition analysis and Go does not (outside of syscall.Read) support non-blocking reads. Ian's proposed solution: Use syscall.Read for "non-blocking" reads. Potential problems with this solution: 1. This may only be possible in Linux as my method of using a file descriptor throws a Windows build error. We decided to not support windows in the high performance use case. 2. It seems like we need to re-associate the file descriptor periodically to avoid losing the connection. This is only applicable to use cases where the connection is left open for streaming, as we are. Re-association does not require much CPU, so we re-associate whenever the non-blocking read fails. Additional benefits of this solution: syscall.Read also short circuits abstraction layers of the net.Conn infrastructure, thus is slightly more CPU efficient that enabling non-blocking IO in net.Conn packages would be. Future: If Go were to add a non-blocking read feature to net.Conn, we would consider using it. It would enable cross OS support, avoid the need to grab file descriptor handles, remove the syscall import, and re-enable cross OS compilations. Understand that we are no longer making changes to this portion of our application and are satisfied with the syscall.Read solution. Thanks. From: Ian Lance Taylor <[email protected]> To: golang/go <[email protected]> Cc: matthalladay <[email protected]>, State change <[email protected]> Date: 11/05/2018 10:23 AM Subject: Re: [golang/go] net: conn.Read() methods seem to thrash CPU, please return on insufficient data (#27315) I'm sorry, I don't understand what this issue is about now. As @mikioh [github.com] suggested, if you have a problem using syscall.Read, please open a new issue describing that problem. Thanks. ? You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub [github.com], or mute the thread [github.com].

ianlancetaylor · 2018-11-05T21:17:23Z

Thanks for the note. We have no current plans for adding non-blocking I/O to net.Conn. Such a change should go through the proposal process described at https://golang.org/s/proposal, but I doubt it would be accepted. Non-blocking I/O without using a poller seems unlikely to scale to large numbers of descriptors.

methane · 2018-11-06T03:05:39Z

I have created new issue: #28607

ianlancetaylor changed the title ~~conn.Read() methods seem to thrash CPU, please return on insufficient data~~ net: conn.Read() methods seem to thrash CPU, please return on insufficient data Aug 28, 2018

ianlancetaylor added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 29, 2018

ianlancetaylor added this to the Go1.12 milestone Aug 29, 2018

matthalladay closed this as completed Aug 30, 2018

matthalladay reopened this Sep 4, 2018

ianlancetaylor closed this as completed Nov 5, 2018

golang locked and limited conversation to collaborators Nov 6, 2019

gopherbot added the FrozenDueToAge label Nov 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net: conn.Read() methods seem to thrash CPU, please return on insufficient data #27315

net: conn.Read() methods seem to thrash CPU, please return on insufficient data #27315

matthalladay commented Aug 28, 2018

ianlancetaylor commented Aug 28, 2018

mattha78 commented Aug 29, 2018

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 29, 2018

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 29, 2018 •

edited

Loading

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 29, 2018 •

edited

Loading

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 30, 2018

matthalladay commented Sep 4, 2018

methane commented Oct 26, 2018

mikioh commented Oct 27, 2018

troian commented Nov 4, 2018

mattha78 commented Nov 4, 2018 via email

troian commented Nov 5, 2018

matthalladay commented Nov 5, 2018 via email

ianlancetaylor commented Nov 5, 2018

matthalladay commented Nov 5, 2018 via email

ianlancetaylor commented Nov 5, 2018

methane commented Nov 6, 2018

net: conn.Read() methods seem to thrash CPU, please return on insufficient data #27315

net: conn.Read() methods seem to thrash CPU, please return on insufficient data #27315

Comments

matthalladay commented Aug 28, 2018

ianlancetaylor commented Aug 28, 2018

mattha78 commented Aug 29, 2018

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 29, 2018

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 29, 2018 • edited Loading

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 29, 2018 • edited Loading

ianlancetaylor commented Aug 29, 2018

matthalladay commented Aug 30, 2018

matthalladay commented Sep 4, 2018

methane commented Oct 26, 2018

mikioh commented Oct 27, 2018

troian commented Nov 4, 2018

mattha78 commented Nov 4, 2018 via email

troian commented Nov 5, 2018

matthalladay commented Nov 5, 2018 via email

ianlancetaylor commented Nov 5, 2018

matthalladay commented Nov 5, 2018 via email

ianlancetaylor commented Nov 5, 2018

methane commented Nov 6, 2018

matthalladay commented Aug 29, 2018 •

edited

Loading

matthalladay commented Aug 29, 2018 •

edited

Loading