NHibernate AsyncReaderWriterLock stalls under load #2862

willg1983 · 2021-07-19T09:51:56Z

We've been investigating our process stalling under load, multiple crash dumps show a similar set of stack traces to this:

After some testing I put together a crude harness over NHibernate.Util.AsyncReadWriteLock which reproduces the problem - run this on .NET Framework 4.8 and the unit test doesn't complete after an hour, switch to .NET Core 3.0 and it runs < 1 minute.

  [TestMethod]
        public Task LockHammer53()
        {
            var asyncReaderWriterLock = new AsyncReaderWriterLock();
          
            return Task.WhenAll(Enumerable.Range(1, 100).Select(x => Task.Run(() =>
              {
                  for (var i = 0; i < 1000000; i++)
                  {
                      using (asyncReaderWriterLock.WriteLock())
                      {
                      }
                  }
              })));
        }

It looks like it might be related to this bug in .NET
dotnet/runtime#28717
It was fixed in .NET Core 3.0 but remains unfixed in .NET Framework and it looks like they have no plans to fix it.

Interestingly StackExchange.Redis was majorly impacted by this and worked around this issue by switching from SemaphoreSlim to MutexSlim
StackExchange/StackExchange.Redis@6873941

So I would propose either:
a) Reverting the locking to the design in 5.2
b) Rewrite the AsyncReaderWriterLock to avoid the bug in SemaphoreSlim

The text was updated successfully, but these errors were encountered:

fredericDelaporte · 2021-08-08T16:17:38Z

So this would be a regression of #2147.

Reverting to the previous design means that applications using both synchronous and asynchronous code will have bugs, because the previous design was handling the locking separately for both cases. Async operations were not locking synchronous operations, and synchronous ones were not locking async ones.

The MutexSlim used for replacing the SemaphoreSlim in StackExchange.Redis seems to be an utility of some extensions having a broader purpose. It looks unlikely to me NHibernate will take a dependency on it.

Until NHibernate ceases supporting anything older than .Net 5, this means we need to implement the AsyncReaderWriterLock with something else than the SemaphoreSlim class. Should we just incorporate the MutexSlim in our own code base?

According to this blog, SemaphoreSlim might be fixed in some .Net Framework service release (but this was written in 2019 and does not seem to have happen).

bahusoid · 2021-08-12T10:35:54Z

might be fixed in some .Net Framework service release

Nope. Fix was considered as perf improvement: dotnet/coreclr#22686 (comment)

fredericDelaporte · 2021-08-12T23:04:05Z

Maybe they are somehow right about qualifying it as a mere performance trouble: I have launched the test and then forgot about it, When I came back, it had succeeded.

fredericDelaporte · 2021-08-13T17:30:52Z

@maca88, any opinion on this subject, including on:

Until NHibernate ceases supporting anything older than .Net 5, this means we need to implement the AsyncReaderWriterLock with something else than the SemaphoreSlim class. Should we just incorporate the MutexSlim in our own code base?

?

maca88 · 2021-08-14T02:56:45Z

I did some testing and the performance difference between .NET Framework and .NET Core is huge for the Task Parallel Library. The mentioned dotnet bug it is just a small optimization and it is not really relevant in this case. Here is what I tested:

[Test]
public async Task LockHammer53()
{
  var asyncReaderWriterLock = new AsyncReaderWriterLock();
  var nitoAsyncReaderWriterLock = new Nito.AsyncEx.AsyncReaderWriterLock();
  var semaphore = new SemaphoreSlim(1, 1);
  var asyncLock = new AsyncLock();
  var mutex = new Pipelines.Sockets.Unofficial.Threading.MutexSlim(int.MaxValue);
  var sw = new Stopwatch();
  sw.Start();

  await Task.WhenAll(Enumerable.Range(1, 100).Select(x => Task.Run(() =>
  {
    for (var i = 0; i < 10000; i++)
    {
      // lock
      // .NET 461 => approx. 15ms
      // .NET Core => approx. 85 ms
      //lock (sw)
      //{
      //}

      // NHibernate AsyncLock
      // .NET 461 => approx. 7000ms
      // .NET Core => approx. 60 ms
      //using (asyncLock.Lock())
      //{
      //}

      // NHibernate AsyncReaderWriterLock
      // .NET 461 => approx. 20000ms
      // .NET Core => approx. 160 ms
      //using (asyncReaderWriterLock.WriteLock())
      //{
      //}

      // Nito.AsyncEx.AsyncReaderWriterLock
      // .NET 461 => approx. 80000ms
      // .NET Core => approx. 7000 ms
      //using (nitoAsyncReaderWriterLock.WriterLock())
      //{
      //}

      // SemaphoreSlim
      // .NET 461 => approx. 6000ms
      // .NET Core => approx. 60 ms
      //semaphore.Wait();
      //semaphore.Release();

      // Pipelines.Sockets.Unofficial.MutexSlim
      // .NET 461 => approx. 5000ms
      // .NET Core => approx. 5000ms
      //using (mutex.TryWait())
      //{
      //}
    }
  })));

  Console.WriteLine(sw.ElapsedMilliseconds);
}

Every class was tested separately in .NET Framework and .NET Core in Release mode. As you can see in the comments the difference between SemaphoreSlim and MutexSlim in .NET Framework is not that significant and won't change much if we use MutexSlim. So the option Rewrite the AsyncReaderWriterLock to avoid the bug in SemaphoreSlim won't help us.

@willg1983 If I understand correctly, with NHibernate version 5.2.x or lower you didn't experienced any stalling under load? Do you use both async and sync NHibernate methods in the application (e.g. ISession.Save and ISession.SaveAsync)?

@maca88, any opinion on this subject, including on:

As you mentioned, the second option (reverting to the old locking) is not a good idea as it may not work correctly when mixing async/sync methods. Another option would be to introduce a configuration that would lock by using AsyncLock instead, which is faster, at least when there are only write operations.

bahusoid · 2021-08-14T04:31:30Z

@maca88 If I got it right dotnet/runtime#28717 is about thread pool starvation. So aside from slow execution for particular jobs it slows down the whole process. So it looks related to me.. Or are you saying the modified code in fix is not involved here?

willg1983 · 2021-08-14T09:51:59Z

@maca88 we moved from nHibernate 4.1.x straight to 5.3.x . In production our system would stop responding around the same time each day when user load was at its peak. After several weeks and crash dumps we narrowed it down and reverted to 5.2.x and haven't seen the problem in production since. Our codebase is older and only uses nHibernate's sync methods.

maca88 · 2021-08-14T10:31:51Z

Or are you saying the modified code in fix is not involved here?

My point is that the performance improvement that they applied for the mentioned issue is minimal and even if it is ported to .NET Framework, our issue will still remain.

@maca88 we moved from nHibernate 4.1.x straight to 5.3.x

Ok now I understand, after benchmarking the lock statement (I updated the above test) the reason for the stalling becomes obvious. Prior NHibernate 5.3, when using only sync NHibernate api, the lock was done by using the MethodImpl attribute with MethodImplOptions.Synchronized argument, which basically wraps the whole method with the lock statement. In .NET Framework, based on the above benchmark, the lock statement is at least 300 times faster than fastest alternative that supports both sync and async locking (MutexSlim). Based on that, the final solution would need to use the lock statement for .NET Framework, so this leave us with a few options:

Revert to the old locking mechanism for .NET Framework
Do nothing and just add a possible breaking change
Introduce a new setting that would use the lock statement for sync methods and possibly throw a NotSupportedException for async methods in order to avoid bugs

bahusoid · 2021-08-14T11:30:59Z

Introduce a new setting that would use the lock statement for sync methods and possibly throw a NotSupportedException for async methods in order to avoid bugs

If we go with adding setting I would add new setting with ability to provide own locker class.

willg1983 · 2021-09-16T07:36:30Z

Given the speed of lock() across both .NET (core) and .NET Framework would it be feasible to use that type of lock for both? Other types of lock might be slightly faster on .NET Core (although other CPUs/systems may have different results) but not by much and the alternatives are a lot slower on .NET Framework.

fredericDelaporte · 2022-02-14T23:04:27Z

Fixed by #2944.

Will-Bill · 2022-02-21T15:21:38Z

@fredericDelaporte

Not sure this is fixed for async code.

We are still getting:

NHibernate.Util.AsyncReaderWriterLock.ReadLock NHibernate.Cache.ReadWriteCache.Get NHibernate.Event.Default.DefaultLoadEventListener.LoadFromSecondLevelCache NHibernate.Event.Default.DefaultLoadEventListener.DoLoad NHibernate.Event.Default.DefaultLoadEventListener.Load NHibernate.Event.Default.DefaultLoadEventListener.OnLoad NHibernate.Impl.SessionImpl.FireLoad NHibernate.Impl.SessionImpl.ImmediateLoad NHibernate.Proxy.AbstractLazyInitializer.Initialize

The 'fix' is to add in the old locker, but doesn't work in async scenarios.

@willg1983 have you any work arounds?

bahusoid · 2022-02-21T16:24:19Z

Not sure this is fixed for async code.

We don't have built-in fix for async scenario in Full .NET Framework.

But now you can implement your own locker (ICacheReadWriteLockFactory and ICacheLock) and specify it in cache.read_write_lock_factory. Maybe using suggested MutexSlim from Pipelines.Sockets.Unofficial.

bahusoid · 2022-02-22T12:07:09Z

Maybe using suggested MutexSlim

Basic implementation could be something like https://gist.github.com/bahusoid/dd91502076316bf40c597adc8a090486#file-mutexslimcachelock-cs

To use it:

Configuration cfg = ...
cfg.SetProperty(Environment.CacheReadWriteLockFactory, typeof(MutexSlimCacheFactory).AssemblyQualifiedName);

Note: Added just for reference I didn't really test it and have no idea it if really improves performance.

Will-Bill · 2022-02-22T13:05:00Z

Thanks for this, really appreciate it. The timeout for obtaining a lock should be shortish I guess right? E.G 1-2 seconds?

Edit: Doing some rudimentary test with NBomber, using MutexSlim locks quicker than the default lock mechanism.

willg1983 · 2022-05-03T10:21:40Z

@willg1983 have you any work arounds?

No, we reverted to 5.2 and have avoided updating to later versions.

JiiimmyyN · 2024-01-19T12:28:57Z

@willg1983 have you any work arounds?

No, we reverted to 5.2 and have avoided updating to later versions.

did you ever upgrade from 5.2? We are currently running 5.3.20 and are having similar issues as you've explained

Edit: Turns out our problem was not related to this issue in the end even though it showed up as similar symptoms.
Problem for us was because we had one code path that was rarely hit that never closed it's connection, so when the app ran for a long time we would hit the MaxPoolSize and that in turn lead to a long line of issues.

If anyone else lands here and is as stumped as we was I recommend "dotnet counters monitor npgql - " and see if you have "busy connections" slowly increasing over time (https://www.npgsql.org/doc/diagnostics/metrics.html)

fredericDelaporte added this to the 5.3.10 milestone Aug 8, 2021

fredericDelaporte added c: Core p: Major Regression t: Bug labels Aug 8, 2021

This comment has been minimized.

Sign in to view

bahusoid closed this as completed Sep 15, 2021

bahusoid reopened this Sep 16, 2021

This comment was marked as outdated.

Sign in to view

This comment has been minimized.

Sign in to view

fredericDelaporte modified the milestones: 5.3.10, 5.3.11 Sep 25, 2021

bahusoid mentioned this issue Nov 17, 2021

Add monitor based sync only locker for ReadWrite cache #2944

Merged

fredericDelaporte closed this as completed Feb 14, 2022

fredericDelaporte added the r: Fixed label Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NHibernate AsyncReaderWriterLock stalls under load #2862

NHibernate AsyncReaderWriterLock stalls under load #2862

willg1983 commented Jul 19, 2021

fredericDelaporte commented Aug 8, 2021

bahusoid commented Aug 12, 2021

fredericDelaporte commented Aug 12, 2021

fredericDelaporte commented Aug 13, 2021

maca88 commented Aug 14, 2021 •

edited

Loading

bahusoid commented Aug 14, 2021 •

edited

Loading

willg1983 commented Aug 14, 2021

maca88 commented Aug 14, 2021

bahusoid commented Aug 14, 2021

This comment has been minimized.

willg1983 commented Sep 16, 2021

This comment was marked as outdated.

This comment has been minimized.

This comment has been minimized.

fredericDelaporte commented Feb 14, 2022

Will-Bill commented Feb 21, 2022 •

edited

Loading

bahusoid commented Feb 21, 2022 •

edited

Loading

bahusoid commented Feb 22, 2022 •

edited

Loading

Will-Bill commented Feb 22, 2022 •

edited by bahusoid

Loading

willg1983 commented May 3, 2022

JiiimmyyN commented Jan 19, 2024 •

edited

Loading

NHibernate AsyncReaderWriterLock stalls under load #2862

NHibernate AsyncReaderWriterLock stalls under load #2862

Comments

willg1983 commented Jul 19, 2021

fredericDelaporte commented Aug 8, 2021

bahusoid commented Aug 12, 2021

fredericDelaporte commented Aug 12, 2021

fredericDelaporte commented Aug 13, 2021

maca88 commented Aug 14, 2021 • edited Loading

bahusoid commented Aug 14, 2021 • edited Loading

willg1983 commented Aug 14, 2021

maca88 commented Aug 14, 2021

bahusoid commented Aug 14, 2021

This comment has been minimized.

willg1983 commented Sep 16, 2021

This comment was marked as outdated.

This comment has been minimized.

This comment has been minimized.

fredericDelaporte commented Feb 14, 2022

Will-Bill commented Feb 21, 2022 • edited Loading

bahusoid commented Feb 21, 2022 • edited Loading

bahusoid commented Feb 22, 2022 • edited Loading

Will-Bill commented Feb 22, 2022 • edited by bahusoid Loading

willg1983 commented May 3, 2022

JiiimmyyN commented Jan 19, 2024 • edited Loading

maca88 commented Aug 14, 2021 •

edited

Loading

bahusoid commented Aug 14, 2021 •

edited

Loading

Will-Bill commented Feb 21, 2022 •

edited

Loading

bahusoid commented Feb 21, 2022 •

edited

Loading

bahusoid commented Feb 22, 2022 •

edited

Loading

Will-Bill commented Feb 22, 2022 •

edited by bahusoid

Loading

JiiimmyyN commented Jan 19, 2024 •

edited

Loading