Skip to content

Jetty starts consuming CPU that remains high even without any traffic #6973

@dimas

Description

@dimas

Jetty version(s)
9.4.41.v20210516
9.4.43.v20210629

Java version/vendor (use: java -version)

$ java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~18.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)

OS type/version
Ubuntu 18.04

Description
We had an issue with some of the production servers spiking with CPU use.
We do not know what triggered it. When affected servers are removed from load balancer and all traffic to them ceases, the CPU usage still remains high.

About 5 out of 8 servers got affected over a stretch of several hours. The servers belong to different clusters and run different application code on them. The common theme is that they all are Spring apps with embedded Jetty, are on Amazon EC2 and exposed to the internet via load balancers. They are in different AWS accounts, different load balancers and their load pattern is not correlated in any way. Just the AWS region is the same. We saw 9.4.41.v20210516 and 9.4.43.v20210629 affected. Restarting the app (with Jetty) fixes CPU usage.

We tried profiling the running app and building flamegraphs and saw that all the CPU is being eaten by a couple of "qtp" threads with stacktraces similar to:

"qtp708090483-69573" #69573 prio=5 os_prio=0 tid=0x00007f6b1e429000 nid=0x311f in Object.wait() [0x00007f6ac36c8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.eclipse.jetty.server.HttpInput.blockForContent(HttpInput.java:584)
        at org.eclipse.jetty.server.HttpInput$1.blockForContent(HttpInput.java:1164)
        at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:330)
        - locked <0x00000000c54d9790> (a java.util.ArrayDeque)
        at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:270)
        at org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(UrlEncoded.java:485)
        - locked <0x00000000c54dea40> (a org.eclipse.jetty.util.MultiMap)
        at org.eclipse.jetty.util.UrlEncoded.decodeTo(UrlEncoded.java:572)
        at org.eclipse.jetty.server.Request.extractFormParameters(Request.java:582)
        at org.eclipse.jetty.server.Request.extractContentParameters(Request.java:531)
        at org.eclipse.jetty.server.Request.getParameters(Request.java:435)
        at org.eclipse.jetty.server.Request.getParameterNames(Request.java:1093)
        at ch.qos.logback.access.spi.AccessEvent.buildRequestParameterMap(AccessEvent.java:338)
        at ch.qos.logback.access.spi.AccessEvent.getRequestParameterMap(AccessEvent.java:351)
        at ch.qos.logback.access.spi.AccessEvent.prepareForDeferredProcessing(AccessEvent.java:584)
        #
        # code from https://jira.qos.ch/browse/LOGBACK-1486 here
        #
        at ch.qos.logback.core.AsyncAppenderBase.append(AsyncAppenderBase.java:160)
        at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.access.jetty.RequestLogImpl.log(RequestLogImpl.java:137)
        at org.eclipse.jetty.server.HttpChannel.onCompleted(HttpChannel.java:812)
        at org.eclipse.jetty.server.HttpChannel.onBadMessage(HttpChannel.java:872)
        at org.eclipse.jetty.server.HttpChannelOverHttp.badMessage(HttpChannelOverHttp.java:283)
        at org.eclipse.jetty.http.HttpParser.badMessage(HttpParser.java:1647)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1629)
        at org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:376)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:265)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
        at java.lang.Thread.run(Thread.java:748)

Clearly we are at the end of request trying to log it (with logback-access) and while preparing request for deferred processing it tries to read some of the parameters from it and it all ends up in HttpInput.read().

Which in turn blocks for content. However judging from how much CPU is used, our suspicion is that no actual sleeping happens, the thread just wakes up immediately, re-checks conditions, goes back to waiting and the story repeats.

I do not know anything about Jetty internals but given this code https://github.com/eclipse/jetty.project/blob/jetty-9.4.41.v20210516/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L1161-L1166 just calls input.blockForContent() and always returns true (unless throws), then this loop https://github.com/eclipse/jetty.project/blob/jetty-9.4.41.v20210516/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L314-L340 reduces to:

            while (true)
            {
                Content item = nextContent();
                if (item != null)
                {
                    l = get(item, b, off, len);
                    if (LOG.isDebugEnabled())
                        LOG.debug("{} read {} from {}", this, l, item);

                    // Consume any following poison pills
                    if (item.isEmpty())
                        nextInterceptedContent();
                    break;
                }

                // No content, so should we block?
                input.blockForContent();
            }

So in theory as long as nextContent() returns null, it just spins input.blockForContent().

We tried verifying it by enabling debug logging with

<logger name="org.eclipse.jetty.server.HttpInput" level="DEBUG"/>

and it resulted in about 10Mb/sec of logs:

-----------------------------------------------------------------------------------------------------------------------------------------------
|       @timestamp        | level |       thread       |                                       message                                        |
|-------------------------|-------|--------------------|--------------------------------------------------------------------------------------|
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-71877 | HttpInputOverHTTP@1202b981[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
| 2021-10-08 14:54:35.000 | DEBUG | qtp708090483-69573 | HttpInputOverHTTP@72895282[c=0,q=0,[0]=null,s=STREAM] blocking for content timeout=0 |
-----------------------------------------------------------------------------------------------------------------------------------------------

So it does indeed look like read() just spins blockForContent() endlessly which for some reason wakes up every time but there are no data available (or read() would log something on this line https://github.com/eclipse/jetty.project/blob/jetty-9.4.41.v20210516/jetty-server/src/main/java/org/eclipse/jetty/server/HttpInput.java#L321 )

Remember, there is no any load on that server at that moment, no established HTTP connections, just couple of some old requests stuck in this state.

How to reproduce?
Unfortunately I have no idea what triggers it. We just saw several servers ended up in this state. Last time we saw it before today was several months ago. Given all affected servers are on AWS eu-west-1 and behind load balancers, it could be that some region-wide issue with AWS load balancers resulted in some incorrect TCP connection closure or something like that which in turn triggered some edge condition in Jetty.

It also can be the same issue as #5435 although I am not 100% sure

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugFor general bugs on Jetty side

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions