Skip to content

[JENKINS-75675]: Refactor classloader logic in order to reduce memory consumption #10659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dukhlov
Copy link

@dukhlov dukhlov commented May 19, 2025

See JENKINS-75675.

Background

Our Jenkins installation relies heavily on Groovy pipelines, shared libraries, and similar mechanisms. Groovy uses dynamic invocation via MetaClass, CallSite, etc. During introspection, it attempts to load many classes that don’t exist, simply to determine whether a given token is a class, a variable, or something else.
My investigation shows that this behavior leads to memory leakage.

Problem

The base ClassLoader implementation in Java (which all other class loaders inherit from) supports two modes:

parallelCapable (currently used by all Jenkins core class loaders, including WebAppClassLoader and PlatformClassLoader)
non-parallel (legacy mode)
Parallel-capable class loaders create and retain a lock object per class name, indefinitely. So, if we have 1,000 class loading misses, every parallel-capable class loader in the hierarchy keeps 1,000 unused lock objects forever.

In the case of Jenkins's UberClassLoader, a typical setup with ~200 plugins results in around 500 class loaders. In our Jenkins instance, about 2,000 classes are loaded successfully, while we observe over 200,000 class loading misses.

As a result, the class loaders retain:

500 class loaders × 200,000 lock objects each
Plus the internal ConcurrentHashMap.Node objects used to store them
This results in roughly 2 GB of unnecessary memory consumption.

Solution

  • Introduce base DelegatingClassLoader implementation which overrides base ClassLoaders.loadClass to remove not needed base locking mechanism (it don't load class itself just delegates loading to another class loader and locking will be done here)
  • extend DelegatingClassLoader instead of ClassLoader if possible.
  • override getClassLoadingLock() for UrlClassLoader2 for use GCed locking object which won't be kept forever
  • Introduce FilteringClassLoader which can check if class exists (using getResource() call) and don't use loadClass() call not to create locking Object at all
  • wrap base WebAppClassLoader with FilteringClassLoader to avoid not needed lock files creation

Testing done

This PR tested by running 2-hour long proprietary automation tests based on jenkins scripted pipeline jobs
class loading logic is broadly used, so it definitely covered by this testing

Also I made jcmd GC.class_histogram. Amount of java.lang.Object and java.util.concurrent.ConcurrentHashMap$Node instances was decreased dramatically

Before fix:
Screenshot 2025-05-19 at 06 26 27

After fix:
Screenshot 2025-05-26 at 19 59 19

Proposed changelog entries

  • Reduce Plugin's ClassLoader memory consumption

Proposed changelog category

/label internal

Proposed upgrade guidelines

N/A

Submitter checklist

  • The Jira issue, if it exists, is well-described.
  • The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood (see examples). Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
  • There is automated testing or an explanation as to why this change has no tests.
  • New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
  • New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
  • New or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives (see documentation).
  • For dependency updates, there are links to external changelogs and, if possible, full differentials.
  • For new APIs and extension points, there is a link to at least one consumer.

Desired reviewers

@jenkinsci/core-pr-reviewers

Before the changes are marked as ready-for-merge:

Maintainer checklist

  • There are at least two (2) approvals for the pull request and no outstanding requests for change.
  • Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
  • Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
  • Proper changelog labels are set so that the changelog can be generated automatically.
  • If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
  • If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

Copy link

welcome bot commented May 19, 2025

Yay, your first pull request towards Jenkins core was created successfully! Thank you so much!

A contributor will provide feedback soon. Meanwhile, you can join the chats and community forums to connect with other Jenkins users, developers, and maintainers.

@krisstern krisstern requested a review from a team May 19, 2025 15:39
@basil
Copy link
Member

basil commented May 19, 2025

Would this not be better reported/patched against OpenJDK's java.net.URLClassLoader? While we could work around the issue as in this PR, I'm not sure we want to maintain custom class loading logic like this in the long term. We previously spent a long time maintaining a fork of Ant's class loader, which had various issues, so I would definitely prefer to use unmodified upstream OpenJDK code if possible.

@dukhlov
Copy link
Author

dukhlov commented May 19, 2025

@basil, thank you for the quick review. I understand your concern, but I don’t believe OpenJDK is the right place for this optimization.

This isn’t an issue with URLClassLoader itself — it’s designed to work correctly in typical Java applications, where an extra 10MB per class loader is a reasonable overhead in exchange for a simpler and faster implementation. The problem becomes significant in pluggable systems that use separate class loading. So, in my view, this should be addressed within the Jenkins plugin framework.

I’ve implemented a 10-line optimization that is easy to maintain or revert if it causes any issues. This is feasible because I used Guava’s MapMaker. Guava cannot be used in the OpenJDK codebase, however.

If you consider the ~5–10MB additional memory overhead per plugin acceptable, then let’s just reject the PR.

In my production Jenkins cluster, the heap usage is around 4GB when idle — 2GB of which is consumed by 50 billion unnecessary lock objects. It works fine, but the purist in me feels that this is not how it should be.

@dukhlov
Copy link
Author

dukhlov commented May 19, 2025

Alternative: It is possible to make jenkins.util.URLClassLoader2 not capable for parallel class loading by removing

static {
   registerAsParallelCapable();
}

In this case, class loading will become serial — URLClassLoader will use a single lock instead of a dedicated lock per class name.
However, I'm not fully aware of the complete set of use cases, so I can't say for sure whether we actually need parallel class loading within the scope of a single plugin or not.

@basil
Copy link
Member

basil commented May 19, 2025

This isn’t an issue with URLClassLoader itself — it’s designed to work correctly in typical Java applications, where an extra 10MB per class loader is a reasonable overhead in exchange for a simpler and faster implementation. The problem becomes significant in pluggable systems that use separate class loading.

I don't see anything in the URLClassLoader Javadoc that would confirm this viewpoint. The opposite assertion (that it ought to be efficient in pluggable systems) could be made without contradicting the Javadoc.

I hate wasting memory and I am not against this PR as a short-term solution, but only after we have a discussion with the OpenJDK folks about whether or not this is an upstream problem and, if so, what the long-term plan would be.

@dukhlov
Copy link
Author

dukhlov commented May 19, 2025

I don't see anything in the URLClassLoader Javadoc that would confirm this viewpoint. The opposite assertion (that it ought to be efficient in pluggable systems) could be made without contradicting the Javadoc.

Right, this is just my understanding. I’ve reviewed the existing standard ClassLoader implementation and I don’t have strong arguments to ask the OpenJDK team to change a standard mechanism that already meets the needs of 99% of Java applications — just to benefit Jenkins and perhaps a few other (~10) Java-based pluggable platforms.

That said, talking to the OpenJDK folks definitely makes sense if there’s a feasible path forward.

Copy link
Member

@basil basil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@basil basil added the needs-comments-resolved Comments in the PR need to be addressed before it can be considered for merging label May 19, 2025
Copy link
Member

@jtnord jtnord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry forgot to submit this earlier.

I Agree with Basil, this appears to be something for the JDK.
Anything here should be a temporary workaround until we could pickup a JDK with a fix (should the JDK team agree this is an issue)

@@ -1,7 +1,9 @@
package jenkins.util;

import com.google.common.collect.MapMaker;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are trying to reduce the use of guava in core not increase it :(

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood

Comment on lines 74 to 80
return super.getClassLoadingLock(className);
return parallelLockWeakValuesMap.computeIfAbsent(className, k -> new Object());
Copy link
Member

@jtnord jtnord May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the JDK does not simply return an interned String and needs to keep the lock in a map.
For sure interning will put everything in the string pool, but if there are no references then the a String in the string pool should be eligible to be GC'ed (normally a Sting literal in a class would cause a reference for as long as the class was loaded, meaning it would never actually be GCed, but for explicitly interned Strings this is different)

e.g

  return className.intern();

There may be some valid reasons here - interning is not free and could cause a slowdown on classloading.
Or maybe this is just a hangover from older JDK versions before the string pool was moved to the heap.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

className.intern() is jvm unique for class name. they need per classloader uniqueness

Copy link
Member

@jtnord jtnord May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HomerSimpsonLesSimpsonGIF

of course.

but the trick/hack could still work with (System.identityHashCode(this) + "-" + className).intern() but is probably going to become a bit of a hotspot for general purpose classloading. (- being an illegal character in a classname)

Copy link
Author

@dukhlov dukhlov May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc:

* <p> Class loaders that support concurrent loading of classes are known as
 * <em>{@linkplain #isRegisteredAsParallelCapable() parallel capable}</em> class
 * loaders and are required to register themselves at their class initialization
 * time by invoking the {@link
 * #registerAsParallelCapable ClassLoader.registerAsParallelCapable}
 * method. Note that the {@code ClassLoader} class is registered as parallel
 * capable by default. However, its subclasses still need to register themselves
 * if they are parallel capable.
 * In environments in which the delegation model is not strictly
 * hierarchical, class loaders need to be parallel capable, otherwise class
 * loading can lead to deadlocks because the loader lock is held for the
 * duration of the class loading process (see {@link #loadClass
 * loadClass} methods).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need per class loader + class name unique lock to avoid possible deadlocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone know if we need the parallel capable class loader?

If I remember correctly, enabling parallel loading on at least some of the Jenkins class loaders resulted in a significant performance improvement. So I went ahead and enabled it everywhere. If disabling it on some of the classloaders results in a big improvement in memory usage without a significant performance regression, I would be OK with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to find out by calling parentClassLoader.getResource(className.replace('.','/') + ".class") to do exactly that

This sounds like it should work, and would be preferable to using third-party libraries from our perspective. We try very hard not to add new third-party libraries to Jenkins core for a variety of reasons (including the fact that they become part of the public API we expose to Jenkins plugins).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jenkins/core/src/main/java/hudson/PluginManager.java

Lines 2416 to 2418 in 608d83c

if (name.startsWith("SimpleTemplateScript")) { // cf. groovy.text.SimpleTemplateEngine
throw new ClassNotFoundException("ignoring " + name);
}
already contains exclusions for Groovy garbage. I wonder if we need to expand what is in that list to include the classes you discovered?

groovy.tmp.* can be added.

It might be possible to skip all classes that don't have a specified package (is this feasible?).

Skipping java.util.jenkins.model.params$IS_INTERNAL_REGISTER and similar entries is complicated.

This happens because using params.IS_INTERNAL_REGISTER in my pipeline causes the Groovy compiler to try all possible combinations of default packages: java.lang, java.util, java.net, groovy.lang, groovy.util, hudson.model, jenkins.model, etc. This behavior depends on the configuration of the GroovyCompiler. I don't think jenkins-core is the right place to hardcode this list.

I'm not sure how to properly skip such entries without using ClassGraph, but perhaps we can live with them. In my setup, there are around 150k of these entries, but the number is limited and does not grow over time, unlike groovy.tmp.*.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it should work, and would be preferable to using third-party libraries from our perspective. We try very hard not to add new third-party libraries to Jenkins core for a variety of reasons (including the fact that they become part of the public API we expose to Jenkins plugins).

Ok will redo the PR using this approach

@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 4 times, most recently from 25e29b6 to b9045b6 Compare May 24, 2025 20:48
@dukhlov dukhlov changed the title JENKINS-75675: Use weak references for URLClassLoader2's lock storage JENKINS-75675: Refactor classloader logic in order to reduce memory consumption May 24, 2025
@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 14 times, most recently from fc1fa28 to feb64f2 Compare May 26, 2025 03:34
@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 5 times, most recently from 3609275 to 9d97d2c Compare May 28, 2025 14:27
@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch from 9d97d2c to 695d8ad Compare June 10, 2025 20:37
@dukhlov
Copy link
Author

dukhlov commented Jun 13, 2025

@basil, @timja , @jtnord, @qerub, I would like to understand whether this patch aligns with the core developers' vision for the direction of the project. Should I change something, or would it be better to stop and close the PR if I'm heading in the wrong direction?

@timja
Copy link
Member

timja commented Jun 13, 2025

@basil, @timja , @jtnord, @qerub, I would like to understand whether this patch aligns with the core developers' vision for the direction of the project. Should I change something, or would it be better to stop and close the PR if I'm heading in the wrong direction?

It definitely looks useful and it does look good to me.

@basil
Copy link
Member

basil commented Jun 14, 2025

@dukhlov I have been travelling this week and have not had enough time to work on this PR. I want to get this shipped, but this PR is still a bit too big and difficult for me to understand. Even after asking for explanations, it seems that this PR trying to solve several problems at once. I would prefer if it were split into two PRs, one being the minimal changes needed to directly address the memory usage issue, and another PR with any other cleanups. Describing the change(s) with the template here would also help me process this more quickly. If these things aren't done, that isn't necessarily a blocker for this work. I will still get to it eventually, but the lack of the above would place more work on me and it would take longer.

@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 3 times, most recently from 2db3d51 to 7a232ea Compare June 14, 2025 09:47
@dukhlov
Copy link
Author

dukhlov commented Jun 14, 2025

@dukhlov I have been travelling this week and have not had enough time to work on this PR. I want to get this shipped, but this PR is still a bit too big and difficult for me to understand. Even after asking for explanations, it seems that this PR trying to solve several problems at once. I would prefer if it were split into two PRs, one being the minimal changes needed to directly address the memory usage issue, and another PR with any other cleanups. Describing the change(s) with the template here would also help me process this more quickly. If these things aren't done, that isn't necessarily a blocker for this work. I will still get to it eventually, but the lack of the above would place more work on me and it would take longer.

Reverted changes in UberClassloader. They mostly related to improving performance (thanks to including webapp classes into cache) and not related to decreasing memory consumption

@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 3 times, most recently from 0c5d7f1 to 08d7509 Compare June 14, 2025 19:37
@basil basil self-assigned this Jun 14, 2025
@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 3 times, most recently from ea731d0 to 268823e Compare June 19, 2025 20:01
@dukhlov
Copy link
Author

dukhlov commented Jun 19, 2025

@basil, I pushed my temp branch to this pr's repo a few days ago by mistake. I am sorry about that. Now fixed and ready for review

@dukhlov dukhlov changed the title JENKINS-75675: Refactor classloader logic in order to reduce memory consumption [JENKINS-75675]: Refactor classloader logic in order to reduce memory consumption Jun 21, 2025
@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch 3 times, most recently from 4c4cb68 to c61a84a Compare July 8, 2025 20:48
…onsumption

Override URLClassloader2.getClassLoadingLock to use lock objects that can be GCed when not needed,
Wrap WebAppClassLoader with CheckingExistenceClassLoader to avoid missing loadClass() calls not to crate unnecessary persistent lock object,
Refactor delegating classloaders not to use locking:
 * DelegatingClassloader - non-locking base implementation and use it for classloaders
   which delegate functionality to other classloaders,
 * CheckingExistenceClassLoader - check that class exists before loadClass() calls.
@dukhlov dukhlov force-pushed the feature/JENKINS-75675 branch from c61a84a to 0976e4e Compare July 9, 2025 22:26
@dukhlov
Copy link
Author

dukhlov commented Jul 17, 2025

@basil, It looks like the patch review might take longer than I initially expected. Could you roughly let me know how long it might take? That would help me plan my upcoming work.

@basil
Copy link
Member

basil commented Jul 17, 2025

I am working on some performance improvements for job loading, but this is next on my TODO list after that. Hopefully by next week.

@dukhlov
Copy link
Author

dukhlov commented Jul 17, 2025

I am working on some performance improvements for job loading, but this is next on my TODO list after that. Hopefully by next week.

Ah, great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal needs-comments-resolved Comments in the PR need to be addressed before it can be considered for merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants