Skip to content

NHibernate AsyncReaderWriterLock stalls under load #2862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
willg1983 opened this issue Jul 19, 2021 · 21 comments
Closed

NHibernate AsyncReaderWriterLock stalls under load #2862

willg1983 opened this issue Jul 19, 2021 · 21 comments

Comments

@willg1983
Copy link

We've been investigating our process stalling under load, multiple crash dumps show a similar set of stack traces to this:
image

After some testing I put together a crude harness over NHibernate.Util.AsyncReadWriteLock which reproduces the problem - run this on .NET Framework 4.8 and the unit test doesn't complete after an hour, switch to .NET Core 3.0 and it runs < 1 minute.

  [TestMethod]
        public Task LockHammer53()
        {
            var asyncReaderWriterLock = new AsyncReaderWriterLock();
          
            return Task.WhenAll(Enumerable.Range(1, 100).Select(x => Task.Run(() =>
              {
                  for (var i = 0; i < 1000000; i++)
                  {
                      using (asyncReaderWriterLock.WriteLock())
                      {
                      }
                  }
              })));
        }

It looks like it might be related to this bug in .NET
dotnet/runtime#28717
It was fixed in .NET Core 3.0 but remains unfixed in .NET Framework and it looks like they have no plans to fix it.

Interestingly StackExchange.Redis was majorly impacted by this and worked around this issue by switching from SemaphoreSlim to MutexSlim
StackExchange/StackExchange.Redis@6873941

So I would propose either:
a) Reverting the locking to the design in 5.2
b) Rewrite the AsyncReaderWriterLock to avoid the bug in SemaphoreSlim

@fredericDelaporte
Copy link
Member

So this would be a regression of #2147.

Reverting to the previous design means that applications using both synchronous and asynchronous code will have bugs, because the previous design was handling the locking separately for both cases. Async operations were not locking synchronous operations, and synchronous ones were not locking async ones.

The MutexSlim used for replacing the SemaphoreSlim in StackExchange.Redis seems to be an utility of some extensions having a broader purpose. It looks unlikely to me NHibernate will take a dependency on it.

Until NHibernate ceases supporting anything older than .Net 5, this means we need to implement the AsyncReaderWriterLock with something else than the SemaphoreSlim class. Should we just incorporate the MutexSlim in our own code base?

According to this blog, SemaphoreSlim might be fixed in some .Net Framework service release (but this was written in 2019 and does not seem to have happen).

@bahusoid
Copy link
Member

might be fixed in some .Net Framework service release

Nope. Fix was considered as perf improvement: dotnet/coreclr#22686 (comment)

@fredericDelaporte
Copy link
Member

Maybe they are somehow right about qualifying it as a mere performance trouble: I have launched the test and then forgot about it, When I came back, it had succeeded.
image

@fredericDelaporte
Copy link
Member

@maca88, any opinion on this subject, including on:

Until NHibernate ceases supporting anything older than .Net 5, this means we need to implement the AsyncReaderWriterLock with something else than the SemaphoreSlim class. Should we just incorporate the MutexSlim in our own code base?

?

@maca88
Copy link
Contributor

maca88 commented Aug 14, 2021

I did some testing and the performance difference between .NET Framework and .NET Core is huge for the Task Parallel Library. The mentioned dotnet bug it is just a small optimization and it is not really relevant in this case. Here is what I tested:

[Test]
public async Task LockHammer53()
{
  var asyncReaderWriterLock = new AsyncReaderWriterLock();
  var nitoAsyncReaderWriterLock = new Nito.AsyncEx.AsyncReaderWriterLock();
  var semaphore = new SemaphoreSlim(1, 1);
  var asyncLock = new AsyncLock();
  var mutex = new Pipelines.Sockets.Unofficial.Threading.MutexSlim(int.MaxValue);
  var sw = new Stopwatch();
  sw.Start();

  await Task.WhenAll(Enumerable.Range(1, 100).Select(x => Task.Run(() =>
  {
    for (var i = 0; i < 10000; i++)
    {
      // lock
      // .NET 461 => approx. 15ms
      // .NET Core => approx. 85 ms
      //lock (sw)
      //{
      //}

      // NHibernate AsyncLock
      // .NET 461 => approx. 7000ms
      // .NET Core => approx. 60 ms
      //using (asyncLock.Lock())
      //{
      //}

      // NHibernate AsyncReaderWriterLock
      // .NET 461 => approx. 20000ms
      // .NET Core => approx. 160 ms
      //using (asyncReaderWriterLock.WriteLock())
      //{
      //}

      // Nito.AsyncEx.AsyncReaderWriterLock
      // .NET 461 => approx. 80000ms
      // .NET Core => approx. 7000 ms
      //using (nitoAsyncReaderWriterLock.WriterLock())
      //{
      //}

      // SemaphoreSlim
      // .NET 461 => approx. 6000ms
      // .NET Core => approx. 60 ms
      //semaphore.Wait();
      //semaphore.Release();

      // Pipelines.Sockets.Unofficial.MutexSlim
      // .NET 461 => approx. 5000ms
      // .NET Core => approx. 5000ms
      //using (mutex.TryWait())
      //{
      //}
    }
  })));

  Console.WriteLine(sw.ElapsedMilliseconds);
}

Every class was tested separately in .NET Framework and .NET Core in Release mode. As you can see in the comments the difference between SemaphoreSlim and MutexSlim in .NET Framework is not that significant and won't change much if we use MutexSlim. So the option Rewrite the AsyncReaderWriterLock to avoid the bug in SemaphoreSlim won't help us.

@willg1983 If I understand correctly, with NHibernate version 5.2.x or lower you didn't experienced any stalling under load? Do you use both async and sync NHibernate methods in the application (e.g. ISession.Save and ISession.SaveAsync)?

@maca88, any opinion on this subject, including on:

As you mentioned, the second option (reverting to the old locking) is not a good idea as it may not work correctly when mixing async/sync methods. Another option would be to introduce a configuration that would lock by using AsyncLock instead, which is faster, at least when there are only write operations.

@bahusoid
Copy link
Member

bahusoid commented Aug 14, 2021

@maca88 If I got it right dotnet/runtime#28717 is about thread pool starvation. So aside from slow execution for particular jobs it slows down the whole process. So it looks related to me.. Or are you saying the modified code in fix is not involved here?

@willg1983
Copy link
Author

@maca88 we moved from nHibernate 4.1.x straight to 5.3.x . In production our system would stop responding around the same time each day when user load was at its peak. After several weeks and crash dumps we narrowed it down and reverted to 5.2.x and haven't seen the problem in production since. Our codebase is older and only uses nHibernate's sync methods.

@maca88
Copy link
Contributor

maca88 commented Aug 14, 2021

Or are you saying the modified code in fix is not involved here?

My point is that the performance improvement that they applied for the mentioned issue is minimal and even if it is ported to .NET Framework, our issue will still remain.

@maca88 we moved from nHibernate 4.1.x straight to 5.3.x

Ok now I understand, after benchmarking the lock statement (I updated the above test) the reason for the stalling becomes obvious. Prior NHibernate 5.3, when using only sync NHibernate api, the lock was done by using the MethodImpl attribute with MethodImplOptions.Synchronized argument, which basically wraps the whole method with the lock statement. In .NET Framework, based on the above benchmark, the lock statement is at least 300 times faster than fastest alternative that supports both sync and async locking (MutexSlim). Based on that, the final solution would need to use the lock statement for .NET Framework, so this leave us with a few options:

  • Revert to the old locking mechanism for .NET Framework
  • Do nothing and just add a possible breaking change
  • Introduce a new setting that would use the lock statement for sync methods and possibly throw a NotSupportedException for async methods in order to avoid bugs

@bahusoid
Copy link
Member

Introduce a new setting that would use the lock statement for sync methods and possibly throw a NotSupportedException for async methods in order to avoid bugs

If we go with adding setting I would add new setting with ability to provide own locker class.

@ebalders

This comment has been minimized.

@bahusoid bahusoid reopened this Sep 16, 2021
@willg1983
Copy link
Author

Given the speed of lock() across both .NET (core) and .NET Framework would it be feasible to use that type of lock for both? Other types of lock might be slightly faster on .NET Core (although other CPUs/systems may have different results) but not by much and the alternatives are a lot slower on .NET Framework.

@fredericDelaporte

This comment was marked as outdated.

@bahusoid

This comment has been minimized.

@fredericDelaporte

This comment has been minimized.

@fredericDelaporte
Copy link
Member

Fixed by #2944.

@Will-Bill
Copy link

Will-Bill commented Feb 21, 2022

@fredericDelaporte

Not sure this is fixed for async code.

We are still getting:

NHibernate.Util.AsyncReaderWriterLock.ReadLock NHibernate.Cache.ReadWriteCache.Get NHibernate.Event.Default.DefaultLoadEventListener.LoadFromSecondLevelCache NHibernate.Event.Default.DefaultLoadEventListener.DoLoad NHibernate.Event.Default.DefaultLoadEventListener.Load NHibernate.Event.Default.DefaultLoadEventListener.OnLoad NHibernate.Impl.SessionImpl.FireLoad NHibernate.Impl.SessionImpl.ImmediateLoad NHibernate.Proxy.AbstractLazyInitializer.Initialize

The 'fix' is to add in the old locker, but doesn't work in async scenarios.

@willg1983 have you any work arounds?

@bahusoid
Copy link
Member

bahusoid commented Feb 21, 2022

Not sure this is fixed for async code.

We don't have built-in fix for async scenario in Full .NET Framework.

But now you can implement your own locker (ICacheReadWriteLockFactory and ICacheLock) and specify it in cache.read_write_lock_factory. Maybe using suggested MutexSlim from Pipelines.Sockets.Unofficial.

@bahusoid
Copy link
Member

bahusoid commented Feb 22, 2022

Maybe using suggested MutexSlim

Basic implementation could be something like https://gist.github.com/bahusoid/dd91502076316bf40c597adc8a090486#file-mutexslimcachelock-cs

To use it:

Configuration cfg = ...
cfg.SetProperty(Environment.CacheReadWriteLockFactory, typeof(MutexSlimCacheFactory).AssemblyQualifiedName);

Note: Added just for reference I didn't really test it and have no idea it if really improves performance.

@Will-Bill
Copy link

Will-Bill commented Feb 22, 2022

Thanks for this, really appreciate it. The timeout for obtaining a lock should be shortish I guess right? E.G 1-2 seconds?

Edit: Doing some rudimentary test with NBomber, using MutexSlim locks quicker than the default lock mechanism.

@willg1983
Copy link
Author

@willg1983 have you any work arounds?

No, we reverted to 5.2 and have avoided updating to later versions.

@JiiimmyyN
Copy link

JiiimmyyN commented Jan 19, 2024

@willg1983 have you any work arounds?

No, we reverted to 5.2 and have avoided updating to later versions.

did you ever upgrade from 5.2? We are currently running 5.3.20 and are having similar issues as you've explained

Edit: Turns out our problem was not related to this issue in the end even though it showed up as similar symptoms.
Problem for us was because we had one code path that was rarely hit that never closed it's connection, so when the app ran for a long time we would hit the MaxPoolSize and that in turn lead to a long line of issues.

If anyone else lands here and is as stumped as we was I recommend "dotnet counters monitor npgql - " and see if you have "busy connections" slowly increasing over time (https://www.npgsql.org/doc/diagnostics/metrics.html)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants