-
Notifications
You must be signed in to change notification settings - Fork 934
NH-4011 - transaction scope failures #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NH-4011 - transaction scope failures #627
Conversation
27454a0
to
3ebb8cd
Compare
Update: the delay is a general trouble not specific to Oracle, due to MSDTC asynchronism. MSDTC let the scope disposal leave before the database has actually finished committing. Read more on later comments. |
3ebb8cd
to
e8cdcf5
Compare
It looks like Sql Server has issues with connection being released during transaction scope, when enlisting a new one in prepare phase. Removing the commit having done that change. |
0225ccd
to
5760e16
Compare
|
99f96f5
to
2158f4a
Compare
39c52a6
to
d52d74a
Compare
src/NHibernate.Test/TestDialect.cs
Outdated
@@ -33,15 +33,6 @@ public TestDialect(Dialect.Dialect dialect) | |||
public virtual bool SupportsOperatorSome { get { return true; } } | |||
public virtual bool SupportsLocate { get { return true; } } | |||
|
|||
public virtual bool SupportsDistributedTransactions { get { return true; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to dialect informational metadata.
src/NHibernate.Test/TestDialect.cs
Outdated
/// the database is locked when one transaction is run, so running a second transaction | ||
/// will cause a "database is locked" error message. | ||
/// </summary> | ||
public virtual bool SupportsConcurrentTransactions { get { return true; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to dialect properties, required for new ConnectionManager
implementation.
@@ -12,144 +12,131 @@ namespace NHibernate.AdoNet | |||
/// Manages the database connection and transaction for an <see cref="ISession" />. | |||
/// </summary> | |||
/// <remarks> | |||
/// This class corresponds to LogicalConnectionImplementor and JdbcCoordinator in Hibernate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having done significant changes anyway, I have cleaned-up the class.
/// </remarks> | ||
[Serializable] | ||
public class ConnectionManager : ISerializable, IDeserializationCallback | ||
{ | ||
private static readonly IInternalLogger log = LoggerProvider.LoggerFor(typeof(ConnectionManager)); | ||
|
||
public interface Callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused interface. Seems a forgotten old trash.
/// Does nothing in case an explicit transaction ongoing. | ||
/// </summary> | ||
/// <param name="transaction">The transaction in which the connection should be enlisted.</param> | ||
public void EnlistIfRequired(System.Transactions.Transaction transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously NHibernate was relying solely on connection auto-enlistment. This could not work with OnClose
connection release mode, and this was sometimes failing due to some race condition causing the connection to have not been un-enlisted soon enough from the transaction end. (In addition to not have been released and re-acquired.)
Enlisting explicitly avoids the "late un-enlistment" trouble.
/// Distributed transactions usually imply the use of<see cref="TransactionScope"/>, but using | ||
/// <c>TransactionScope</c> does not imply the transaction will be distributed. | ||
/// </remarks> | ||
public virtual bool SupportsDistributedTransactions => true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks in most case more the data-provider "fault" when not supported than the database itself. Maybe should it be moved to driver.
4451a41
to
532ee83
Compare
04eeab6
to
72a9db9
Compare
Jim Carley, from Microsoft, has given some interesting explanations about MSDTC behavior here, upon request of the Npgsql data provider GitHub owner. It notably confirms that the second phase of the 2 phases commit can occurs after the scope disposal has left, because the transaction is considered committed as soon as MSDTC has recorded the vote of all resources. Second phases notifications are sent after that point in the commit case. SqlClient has probably some internal locking logic preventing its consumers to be exposed to that, but most other data provider, including Oracle ODAC, lack it. |
Things get muddier... As said in the Npgsql issue, I have reproduced the "delayed transaction commit" issue with SQL Server too. That is not ADO.Net data provider which was protecting it from this issue, but its old way of handling "read committed" isolation level. Activating the read-committed snapshot mode causes SQL Server to also have that delayed commit issue, where the data committed from a distributed transaction may not be there right after the scope disposal. A So indeed, that is a general rule, after a distributed scope disposal, tests cannot assume they will be able to read the committed data... Now enlistment ressources having knowledge of this 2PC occurring concurrently to code after scope disposal have very likely some locking mechanism similar to the one I have added to protect against concurrent usages leading to failures (protecting session for NHibernate, connections for data providers). Npgsql clearly lacks this but its maintainer seems to be willing to implement some. (I have submitted a test case demonstrating a crash due to that concurrency, in its test suit.) The current synchronization mechanism I have put for NHibernate seems to do the job for the commit case (avoiding crashes, but not avoiding the stale data issue). It does not do it yet for rollback cases where second phase occurs directly without a first phase, I have to adjust that. |
using NUnit.Framework; | ||
|
||
namespace NHibernate.Test.SystemTransactions | ||
{ | ||
[TestFixture] | ||
public class TransactionFixture : TestCase | ||
public class TransactionFixture : TransactionFixtureBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tests were a bit minimalist. Replaced by those from distributed case, striped from distributed promotion.
/// </item> | ||
/// </list> | ||
/// </remarks> | ||
public override bool SupportsSystemTransactions => false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firebird .Net data provider is messy to the point of forbidding us to test even non distributed case.
0fcfd6c
to
aea3577
Compare
// Note that this fails with ConnectionReleaseMode.OnClose and SqlServer: | ||
// System.Data.SqlClient.SqlException : Microsoft Distributed Transaction Coordinator (MS DTC) has stopped this transaction. | ||
using (var s = OpenSession()) | ||
//using (var s = Sfi.WithOptions().ConnectionReleaseMode(ConnectionReleaseMode.OnClose).OpenSession()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yuck, one more trouble. I do not see anyway to fix that, excepted not using the old OnClose
connection release mode. Since documentation advises against using it, I think we are fine not fixing that trouble.
1011410
to
276d494
Compare
6f65c59
to
1c327d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this review, I point the changes I have made for rebasing on async feature.
There are some troubles in test generation causing missing async test counterparts. Either we wait for changes in the generator which would solve them, or we should change the way we generate async test.
Tools/packages.config
Outdated
@@ -7,5 +7,5 @@ | |||
<package id="NUnit.Extension.NUnitV2ResultWriter" version="3.6.0" targetFramework="net461" /> | |||
<package id="NUnit.Extension.TeamCityEventListener" version="1.0.2" targetFramework="net461" /> | |||
<package id="NUnit.Extension.VSProjectLoader" version="3.6.0" targetFramework="net461" /> | |||
<package id="CSharpAsyncGenerator.CommandLine" version="0.3.6" targetFramework="net461" /> | |||
<package id="CSharpAsyncGenerator.CommandLine" version="0.3.7" targetFramework="net461" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -109,6 +109,12 @@ | |||
typeConversion: | |||
- conversion: Ignore | |||
name: EnumerableImpl | |||
- conversion: Ignore | |||
name: SystemTransactionContext | |||
containingTypeName: AdoNetWithSystemTransactionFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently ignoring only serves one purpose, avoiding bloat async code generation. The transaction context is mostly used for handling callbacks from system transaction which are not async. But the generator was generating async private methods for some private members used by those callbacks.
It may have another purpose: the SystemTransactionContext.Wait
perform a ManualResetEvent.Wait
call which has an async counterpart but is currently not detected by the async generator (configuration?). If it were detected, this ignore configuration here for the whole class would anyway avoid it to be taken into account.
I think it should not be taken into account because this Wait
occurs on restricted conditions: only when using scopes, furthermore distributed. Generating async for it would generate an async version of AbstractSessionImpl.CheckAndUpdateSessionStatus
which would in turn generates async versions of many additional session members, while their calls would be still fully synchronous in usual cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ManualResetEvent.Wait
call which has an async counterpart but is currently not detected by the async generator
There is no WaitAsync
method for the ManualResetEvent or I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, previously it was a SemaphoreSlim
instead of a ManualResetEventSlim
. The semaphore has async methods, but indeed the reset does not, it just have some overloads taking cancellation token.
containingTypeName: AdoNetWithSystemTransactionFactory | ||
- conversion: Ignore | ||
name: DependentContext | ||
containingTypeName: AdoNetWithSystemTransactionFactory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some glitches not ignoring DependentContext
nested class too, which anyway does not hold anything to convert to async (unless we change stance for the previous one). Sometime, the class was fully copied by Async Generator to the async partial of AdoNetWithSystemTransactionFactory
, causing a compilation conflict with the sync partial.
This was not always occurring and I have not find a way to reproduce it with some test code, so I have not raised an issue on Async Generator. @maca88 may still be interested in knowing that there is that trouble though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange, I tried to generate the code multiple times without ignoring those two nested types and the DependentContext
was never generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen the trouble only when SystemTransactionContext
is ignored but not DependentContext
. In such case it sometimes takes it into account in a buggy way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I was able to reproduce and fix the problem.
/// Contains generated async methods | ||
/// </content> | ||
|
||
public partial class PeVerifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a strange async generator addition for which I do not have an explanation. It "disappears" if I exclude DeadLockHelper
from async generation, although this other class does not relates to those PeVerifier
and PeVerifyFixture
. Anyway, they do not seem to harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also strange, I excluded the DeadlockHelper
but the PeVerifier
and PeVerifyFixture
were generated. In order to exclude the StreamReader.ReadToEnd
method add the following setting to the test project (must be inside analyzation
setting):
ignoreSearchForAsyncCounterparts:
- name: ReadToEnd
with this setting, the PeVerifier
and PeVerifyFixture
won't be generated as they were generated only because of the async version of the StreamReader.ReadToEnd
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main cause why PeVerifier
is generated is because of this line. As ComponentTest
is a fixture all its methods bodies will get scanned for async counterparts. When the ReadToEnd
is found the generator checks all its usages and there is where PeVerifier
is found.
enlistment.Done(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are lacking generation of child classes which does just override settings like UseConnectionOnSystemTransactionPrepare
. Due to the async test class being generated as separated types instead of partials, this causes the tests corresponding to those different settings to not be asynchronously done.
Either we need the generator to generates async test class for descendant classes too (#43), or we need to switch generation back to the partial way. (Either for all then fixing broken fixtures or turning only failing ones to new types, or only for cases having such inheritance troubles.)
I would rather avoid a solution which would imply ad-hoc configuration for supporting this scheme, since it may let us forget some cases which would then lack async tests again.
{ | ||
try | ||
{ | ||
using (var scope = new TransactionScope(TransactionScopeOption.RequiresNew)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial version was TransactionScopeOption.Requires
instead of TransactionScopeOption.RequiresNew
. Changed for async test sake: with TransactionScopeAsyncFlowOption.Enabled
, the transaction from the test thread was propagated to the created thread, breaking the test logic.
// | ||
// Notify partner that I have finished my work | ||
// | ||
myLock.Release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now put in a finally
for avoiding deadlocking the test in case code preceding the release had failed, like it was occurring due to the unexpected transaction leak to second thread.
if (!partnerLock.Wait(120000)) | ||
{ | ||
throw new InvalidOperationException("Wait for partner has taken more than two minutes"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now added a timeout for the wait in order to avoid deadlocking for one hour teamcity runner again with this test, if it were to bug again.
…ing non distributed tests
…mentation choices.
…whenever possible.
…users. * More xml documentation. * More virtual members. * More pre-conditions.
1c327d7
to
75d2c9f
Compare
Rebased, regen, no more missing async tests. |
NH-2176NH-4011 - That is gone to the point of fixing it as much as possible.Since the branch used in #410 is public and in the official NHibernate repository, rebasing it is debatable.
So for going on with my tests, I have switched on a branch in my own repository.
Rebased, custom factory dropped, slightly squashed.
DTC failure tests cleaned up and completed.
PostgreSQL reactivated for DTC in order to check what is going on with it.