Skip to content

NH-4011 - transaction scope failures #627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 7, 2017

Conversation

fredericDelaporte
Copy link
Member

@fredericDelaporte fredericDelaporte commented May 18, 2017

NH-2176 NH-4011 - That is gone to the point of fixing it as much as possible.

Since the branch used in #410 is public and in the official NHibernate repository, rebasing it is debatable.

So for going on with my tests, I have switched on a branch in my own repository.

Rebased, custom factory dropped, slightly squashed.

DTC failure tests cleaned up and completed.

PostgreSQL reactivated for DTC in order to check what is going on with it.

@fredericDelaporte fredericDelaporte force-pushed the NH-2176 branch 3 times, most recently from 27454a0 to 3ebb8cd Compare May 20, 2017 14:34
@fredericDelaporte
Copy link
Member Author

fredericDelaporte commented May 20, 2017

null enlistment does not looks required for any tested database. But Oracle Managed seems to be flaky, it looks like sometime it needs some "delay" between two transactions for the next one to read what the previous one has done. I have seen that locally on two different cases, one re-using the connection and the other re-acquiring it from pool. (Using Oracle 11 XE, ODAC 12 r4, and lot of fiddling for installing this db and configuring Nhibernate for it. I do not know much about Oracle.)

Update: the delay is a general trouble not specific to Oracle, due to MSDTC asynchronism. MSDTC let the scope disposal leave before the database has actually finished committing. Read more on later comments.

@fredericDelaporte
Copy link
Member Author

It looks like Sql Server has issues with connection being released during transaction scope, when enlisting a new one in prepare phase. Removing the commit having done that change.

@fredericDelaporte fredericDelaporte force-pushed the NH-2176 branch 2 times, most recently from 0225ccd to 5760e16 Compare May 21, 2017 18:55
@fredericDelaporte
Copy link
Member Author

fredericDelaporte commented May 21, 2017

CanUseSessionOutsideOfScopeAfterScopeWithFlush demonstrates enlistment with null is indeed required at least for sql-server in some cases. Going to add that back.

@@ -33,15 +33,6 @@ public TestDialect(Dialect.Dialect dialect)
public virtual bool SupportsOperatorSome { get { return true; } }
public virtual bool SupportsLocate { get { return true; } }

public virtual bool SupportsDistributedTransactions { get { return true; } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to dialect informational metadata.

/// the database is locked when one transaction is run, so running a second transaction
/// will cause a "database is locked" error message.
/// </summary>
public virtual bool SupportsConcurrentTransactions { get { return true; } }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to dialect properties, required for new ConnectionManager implementation.

@@ -12,144 +12,131 @@ namespace NHibernate.AdoNet
/// Manages the database connection and transaction for an <see cref="ISession" />.
/// </summary>
/// <remarks>
/// This class corresponds to LogicalConnectionImplementor and JdbcCoordinator in Hibernate,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having done significant changes anyway, I have cleaned-up the class.

/// </remarks>
[Serializable]
public class ConnectionManager : ISerializable, IDeserializationCallback
{
private static readonly IInternalLogger log = LoggerProvider.LoggerFor(typeof(ConnectionManager));

public interface Callback
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused interface. Seems a forgotten old trash.

/// Does nothing in case an explicit transaction ongoing.
/// </summary>
/// <param name="transaction">The transaction in which the connection should be enlisted.</param>
public void EnlistIfRequired(System.Transactions.Transaction transaction)
Copy link
Member Author

@fredericDelaporte fredericDelaporte May 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously NHibernate was relying solely on connection auto-enlistment. This could not work with OnClose connection release mode, and this was sometimes failing due to some race condition causing the connection to have not been un-enlisted soon enough from the transaction end. (In addition to not have been released and re-acquired.)
Enlisting explicitly avoids the "late un-enlistment" trouble.

/// Distributed transactions usually imply the use of<see cref="TransactionScope"/>, but using
/// <c>TransactionScope</c> does not imply the transaction will be distributed.
/// </remarks>
public virtual bool SupportsDistributedTransactions => true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks in most case more the data-provider "fault" when not supported than the database itself. Maybe should it be moved to driver.

@fredericDelaporte
Copy link
Member Author

fredericDelaporte commented Jun 15, 2017

Jim Carley, from Microsoft, has given some interesting explanations about MSDTC behavior here, upon request of the Npgsql data provider GitHub owner. It notably confirms that the second phase of the 2 phases commit can occurs after the scope disposal has left, because the transaction is considered committed as soon as MSDTC has recorded the vote of all resources. Second phases notifications are sent after that point in the commit case.

SqlClient has probably some internal locking logic preventing its consumers to be exposed to that, but most other data provider, including Oracle ODAC, lack it.

@fredericDelaporte
Copy link
Member Author

fredericDelaporte commented Jun 16, 2017

Things get muddier... As said in the Npgsql issue, I have reproduced the "delayed transaction commit" issue with SQL Server too. That is not ADO.Net data provider which was protecting it from this issue, but its old way of handling "read committed" isolation level. Activating the read-committed snapshot mode causes SQL Server to also have that delayed commit issue, where the data committed from a distributed transaction may not be there right after the scope disposal. A Thread.Sleep then allows to query it a bit later...

So indeed, that is a general rule, after a distributed scope disposal, tests cannot assume they will be able to read the committed data...

Now enlistment ressources having knowledge of this 2PC occurring concurrently to code after scope disposal have very likely some locking mechanism similar to the one I have added to protect against concurrent usages leading to failures (protecting session for NHibernate, connections for data providers). Npgsql clearly lacks this but its maintainer seems to be willing to implement some. (I have submitted a test case demonstrating a crash due to that concurrency, in its test suit.)

The current synchronization mechanism I have put for NHibernate seems to do the job for the commit case (avoiding crashes, but not avoiding the stale data issue). It does not do it yet for rollback cases where second phase occurs directly without a first phase, I have to adjust that.

using NUnit.Framework;

namespace NHibernate.Test.SystemTransactions
{
[TestFixture]
public class TransactionFixture : TestCase
public class TransactionFixture : TransactionFixtureBase
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests were a bit minimalist. Replaced by those from distributed case, striped from distributed promotion.

/// </item>
/// </list>
/// </remarks>
public override bool SupportsSystemTransactions => false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firebird .Net data provider is messy to the point of forbidding us to test even non distributed case.

// Note that this fails with ConnectionReleaseMode.OnClose and SqlServer:
// System.Data.SqlClient.SqlException : Microsoft Distributed Transaction Coordinator (MS DTC) has stopped this transaction.
using (var s = OpenSession())
//using (var s = Sfi.WithOptions().ConnectionReleaseMode(ConnectionReleaseMode.OnClose).OpenSession())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck, one more trouble. I do not see anyway to fix that, excepted not using the old OnClose connection release mode. Since documentation advises against using it, I think we are fine not fixing that trouble.

@fredericDelaporte fredericDelaporte force-pushed the NH-2176 branch 2 times, most recently from 1011410 to 276d494 Compare June 18, 2017 21:32
Copy link
Member Author

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this review, I point the changes I have made for rebasing on async feature.

There are some troubles in test generation causing missing async test counterparts. Either we wait for changes in the generator which would solve them, or we should change the way we generate async test.

@@ -7,5 +7,5 @@
<package id="NUnit.Extension.NUnitV2ResultWriter" version="3.6.0" targetFramework="net461" />
<package id="NUnit.Extension.TeamCityEventListener" version="1.0.2" targetFramework="net461" />
<package id="NUnit.Extension.VSProjectLoader" version="3.6.0" targetFramework="net461" />
<package id="CSharpAsyncGenerator.CommandLine" version="0.3.6" targetFramework="net461" />
<package id="CSharpAsyncGenerator.CommandLine" version="0.3.7" targetFramework="net461" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required because some of this PR code was triggering bugs (1, 2) in Async Generator.

@@ -109,6 +109,12 @@
typeConversion:
- conversion: Ignore
name: EnumerableImpl
- conversion: Ignore
name: SystemTransactionContext
containingTypeName: AdoNetWithSystemTransactionFactory
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently ignoring only serves one purpose, avoiding bloat async code generation. The transaction context is mostly used for handling callbacks from system transaction which are not async. But the generator was generating async private methods for some private members used by those callbacks.

It may have another purpose: the SystemTransactionContext.Wait perform a ManualResetEvent.Wait call which has an async counterpart but is currently not detected by the async generator (configuration?). If it were detected, this ignore configuration here for the whole class would anyway avoid it to be taken into account.

I think it should not be taken into account because this Wait occurs on restricted conditions: only when using scopes, furthermore distributed. Generating async for it would generate an async version of AbstractSessionImpl.CheckAndUpdateSessionStatus which would in turn generates async versions of many additional session members, while their calls would be still fully synchronous in usual cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManualResetEvent.Wait call which has an async counterpart but is currently not detected by the async generator

There is no WaitAsync method for the ManualResetEvent or I am missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, previously it was a SemaphoreSlim instead of a ManualResetEventSlim. The semaphore has async methods, but indeed the reset does not, it just have some overloads taking cancellation token.

containingTypeName: AdoNetWithSystemTransactionFactory
- conversion: Ignore
name: DependentContext
containingTypeName: AdoNetWithSystemTransactionFactory
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some glitches not ignoring DependentContext nested class too, which anyway does not hold anything to convert to async (unless we change stance for the previous one). Sometime, the class was fully copied by Async Generator to the async partial of AdoNetWithSystemTransactionFactory, causing a compilation conflict with the sync partial.
This was not always occurring and I have not find a way to reproduce it with some test code, so I have not raised an issue on Async Generator. @maca88 may still be interested in knowing that there is that trouble though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, I tried to generate the code multiple times without ignoring those two nested types and the DependentContext was never generated

Copy link
Member Author

@fredericDelaporte fredericDelaporte Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen the trouble only when SystemTransactionContext is ignored but not DependentContext. In such case it sometimes takes it into account in a buggy way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I was able to reproduce and fix the problem.

/// Contains generated async methods
/// </content>

public partial class PeVerifier
Copy link
Member Author

@fredericDelaporte fredericDelaporte Sep 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a strange async generator addition for which I do not have an explanation. It "disappears" if I exclude DeadLockHelper from async generation, although this other class does not relates to those PeVerifier and PeVerifyFixture. Anyway, they do not seem to harm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also strange, I excluded the DeadlockHelper but the PeVerifier and PeVerifyFixture were generated. In order to exclude the StreamReader.ReadToEnd method add the following setting to the test project (must be inside analyzation setting):

ignoreSearchForAsyncCounterparts:
    - name: ReadToEnd

with this setting, the PeVerifier and PeVerifyFixture won't be generated as they were generated only because of the async version of the StreamReader.ReadToEnd method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main cause why PeVerifier is generated is because of this line. As ComponentTest is a fixture all its methods bodies will get scanned for async counterparts. When the ReadToEnd is found the generator checks all its usages and there is where PeVerifier is found.

enlistment.Done();
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are lacking generation of child classes which does just override settings like UseConnectionOnSystemTransactionPrepare. Due to the async test class being generated as separated types instead of partials, this causes the tests corresponding to those different settings to not be asynchronously done.

Either we need the generator to generates async test class for descendant classes too (#43), or we need to switch generation back to the partial way. (Either for all then fixing broken fixtures or turning only failing ones to new types, or only for cases having such inheritance troubles.)
I would rather avoid a solution which would imply ad-hoc configuration for supporting this scheme, since it may let us forget some cases which would then lack async tests again.

{
try
{
using (var scope = new TransactionScope(TransactionScopeOption.RequiresNew))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial version was TransactionScopeOption.Requires instead of TransactionScopeOption.RequiresNew. Changed for async test sake: with TransactionScopeAsyncFlowOption.Enabled, the transaction from the test thread was propagated to the created thread, breaking the test logic.

//
// Notify partner that I have finished my work
//
myLock.Release();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now put in a finally for avoiding deadlocking the test in case code preceding the release had failed, like it was occurring due to the unexpected transaction leak to second thread.

if (!partnerLock.Wait(120000))
{
throw new InvalidOperationException("Wait for partner has taken more than two minutes");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now added a timeout for the wait in order to avoid deadlocking for one hour teamcity runner again with this test, if it were to bug again.

@fredericDelaporte
Copy link
Member Author

Rebased, regen, no more missing async tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants