-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Durable backend for Distributed Data collections #2490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b90a2b
to
36dc66c
Compare
1735539
to
b7689fe
Compare
using Akka.Serialization; | ||
using LightningDB; | ||
|
||
namespace Akka.DistributedData.LightningDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM version by default uses LMDB backend for storage. I think, it's better to move that dependency to a separate package however.
At this point I've already fixed over 5 different bugs while trying to make UPDATE: it turned out that bug lies in original JVM implementation. Already issued it on their tracker. |
var n = i; | ||
var keydn = new GCounterKey("D" + n); | ||
_replicator.Tell(Dsl.Update(keydn, GCounter.Empty, WriteLocal.Instance, x => x.Increment(_cluster, n))); | ||
ExpectMsg(new UpdateSuccess(keydn, null)); | ||
} | ||
}, _config.First); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is 100% reproducible in each case but it looks like this pattern is a bug in MNTK:
EnterBarrier("after-1");
RunOn(() => {
// this one gets called
}, firstRole, secondRole);
RunOn(() => {
// this one never gets called
}, firstRole);
EnterBarrier("after-2");
a64ca44
to
93663cb
Compare
@Horusiath what will it take to get this PR into a state so we can include the bug fixes for 1.2? |
96cfb81
to
cb535bc
Compare
|
||
EnterBarrier("passThrough-third"); | ||
|
||
RunOn(() => | ||
{ | ||
_replicator.Tell(Dsl.Get(KeyE, _readMajority)); | ||
var c155 = ExpectMsg<Replicator.GetSuccess>(g => Equals(g.Key, KeyE)).Get(KeyE); | ||
|
||
var c155 = ExpectMsg<GetSuccess>(g => Equals(g.Key, KeyE)).Get(KeyE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this place spec fails. Basically what we've done up to this point was:
- Establish 3-node cluster with replicator instance on each node.
- Perform some updates on each node.
- Blackhole 3rd node from the rest of the cluster, making it unreachable.
- Perform more CRDT updates on each node.
- Pass through unreachable 3rd node again to the rest of the cluster.
- While 3rd node is up, replicators should send updates as part of Get request with read majority, and finally converge, but the exception occurs instead, causing node disassociation.
What I've managed to find is that upon marking 3rd node as reachable again - when all nodes have Up state - replicators try to send messages using Context.ActorSelection(Context.Parent.Path.ToStringWithAddress(address))
(Context.Parent is a replicator instance here). But even when I've confirmed that node under address
is acknowledged as up, the message never reaches the target. ResolveOne will also cause an exception in this case. I believe that this may be problem with remoting/cluster layer, after unreachable node becomes reachable again.
/cc @Aaronontheweb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same problem in SurviveNetworkInstabilitySpec
This PR introduces a durable, persistent backend for ddata. It allows users to specify a list of keys, for which CRDTs should not only be gossiped among cluster nodes but also persisted using durable store. Just like in JVM case, default implementation here uses LMDB (through LigthingDB driver on .NET).
TODO list - my goal here is to make all multinode tests for ddata running before this PR gets merged: