Durable backend for Distributed Data collections #2490

Horusiath · 2017-01-30T16:31:54Z

This PR introduces a durable, persistent backend for ddata. It allows users to specify a list of keys, for which CRDTs should not only be gossiped among cluster nodes but also persisted using durable store. Just like in JVM case, default implementation here uses LMDB (through LigthingDB driver on .NET).

TODO list - my goal here is to make all multinode tests for ddata running before this PR gets merged:

Horusiath · 2017-03-06T07:21:50Z

src/contrib/cluster/Akka.DistributedData.LightningDB/LmdbDurableStore.cs

+using Akka.Serialization;
+using LightningDB;
+
+namespace Akka.DistributedData.LightningDB


JVM version by default uses LMDB backend for storage. I think, it's better to move that dependency to a separate package however.

Horusiath · 2017-03-06T07:26:30Z

At this point I've already fixed over 5 different bugs while trying to make ReplicatorSpec passing. Some of them are critical. Right now I'm struggling to find a next one: for some reason it looks like not all keys gets replicated back to the reconnecting node (scenario: disconnect→update while disconnected→reconnect again and wait for replicas to converge). Good thing is that always the same key seems to be missing while replicating (12 of 30).

UPDATE: it turned out that bug lies in original JVM implementation. Already issued it on their tracker.

Horusiath · 2017-03-06T22:20:28Z

src/contrib/cluster/Akka.DistributedData.Tests.MultiNode/ReplicatorSpec.cs

+                    var n = i;
+                    var keydn = new GCounterKey("D" + n);
+                    _replicator.Tell(Dsl.Update(keydn, GCounter.Empty, WriteLocal.Instance, x => x.Increment(_cluster, n)));
+                    ExpectMsg(new UpdateSuccess(keydn, null));
                }
            }, _config.First);


I'm not sure if this is 100% reproducible in each case but it looks like this pattern is a bug in MNTK:

EnterBarrier("after-1"); RunOn(() => { // this one gets called }, firstRole, secondRole); RunOn(() => { // this one never gets called }, firstRole); EnterBarrier("after-2");

Aaronontheweb · 2017-03-10T23:46:04Z

@Horusiath what will it take to get this PR into a state so we can include the bug fixes for 1.2?

Horusiath · 2017-04-07T12:40:32Z

src/contrib/cluster/Akka.DistributedData.Tests.MultiNode/ReplicatorSpec.cs


            EnterBarrier("passThrough-third");

            RunOn(() =>
            {
                _replicator.Tell(Dsl.Get(KeyE, _readMajority));
-                var c155 = ExpectMsg<Replicator.GetSuccess>(g => Equals(g.Key, KeyE)).Get(KeyE);
+
+                var c155 = ExpectMsg<GetSuccess>(g => Equals(g.Key, KeyE)).Get(KeyE);


In this place spec fails. Basically what we've done up to this point was:

Establish 3-node cluster with replicator instance on each node.

Perform some updates on each node.

Blackhole 3rd node from the rest of the cluster, making it unreachable.

Perform more CRDT updates on each node.

Pass through unreachable 3rd node again to the rest of the cluster.

While 3rd node is up, replicators should send updates as part of Get request with read majority, and finally converge, but the exception occurs instead, causing node disassociation.

What I've managed to find is that upon marking 3rd node as reachable again - when all nodes have Up state - replicators try to send messages using Context.ActorSelection(Context.Parent.Path.ToStringWithAddress(address)) (Context.Parent is a replicator instance here). But even when I've confirmed that node under address is acknowledged as up, the message never reaches the target. ResolveOne will also cause an exception in this case. I believe that this may be problem with remoting/cluster layer, after unreachable node becomes reachable again.

/cc @Aaronontheweb

I have the same problem in SurviveNetworkInstabilitySpec

Horusiath added the WIP label Jan 30, 2017

Horusiath force-pushed the ddata-durable branch from 6b90a2b to 36dc66c Compare January 30, 2017 16:33

Horusiath force-pushed the ddata-durable branch 2 times, most recently from 1735539 to b7689fe Compare March 3, 2017 22:17

Horusiath commented Mar 6, 2017

View reviewed changes

Horusiath force-pushed the ddata-durable branch 2 times, most recently from a64ca44 to 93663cb Compare March 9, 2017 06:12

Horusiath force-pushed the ddata-durable branch from 96cfb81 to cb535bc Compare March 14, 2017 06:31

Horusiath added 19 commits April 6, 2017 09:20

dedicated serializers for ddata

898db53

minor bug fixes

37bbf4a

minor bug fixed & richer collections API

73e2865

init ddata durable envelopes

9023f54

replicator support for durable ddata

71e2754

durable LMDB storage for ddata

2705678

DurableDataSpec

74f2ad4

DurablePrunningSpec

96fcc50

fixed basic tests for ddata multinode

285e210

ToString for messages

77a645d

opened ReplicatorSpec

18d529d

repaired a bunch of bugs in order to get ReplicatorSpec passing

6c65859

fixed in numeric aggregator reads

3f4bab6

moved LMDB dependency to separate project

60ba8b5

remaining bug: ReplicatorSpec D12

a2c7141

removed replicator messages out of the Replicator class

e46b00d

DSL comments

40e89eb

found and fixed bug in akka JVM

f449f9f

general cleanup of ddata MNTK specs

545c067

Horusiath added 3 commits April 6, 2017 09:20

removed unnecessary try-catch

e5afe76

more logs for tests

56667a0

muted spec

74a312f

Horusiath force-pushed the ddata-durable branch from cb535bc to 74a312f Compare April 6, 2017 08:30

fixed LMDB durable store

09f4544

Horusiath changed the title ~~[WIP] Durable backend for Distributed Data collections~~ Durable backend for Distributed Data collections Apr 6, 2017

alexvaluyskiy added needs review and removed WIP labels Apr 7, 2017

Horusiath commented Apr 7, 2017

View reviewed changes

Aaronontheweb merged commit 43f2a6f into akkadotnet:dev Apr 7, 2017

Aaronontheweb added this to the 1.2.0 milestone Apr 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Durable backend for Distributed Data collections #2490

Durable backend for Distributed Data collections #2490

Uh oh!

Horusiath commented Jan 30, 2017 •

edited

Loading

Uh oh!

Horusiath Mar 6, 2017

Uh oh!

Horusiath commented Mar 6, 2017 •

edited

Loading

Uh oh!

Horusiath Mar 6, 2017

Uh oh!

Aaronontheweb commented Mar 10, 2017

Uh oh!

Horusiath Apr 7, 2017

Uh oh!

alexvaluyskiy Apr 7, 2017

Uh oh!

Uh oh!

Durable backend for Distributed Data collections #2490

Durable backend for Distributed Data collections #2490

Uh oh!

Conversation

Horusiath commented Jan 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Horusiath Mar 6, 2017

Choose a reason for hiding this comment

Uh oh!

Horusiath commented Mar 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Horusiath Mar 6, 2017

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb commented Mar 10, 2017

Uh oh!

Horusiath Apr 7, 2017

Choose a reason for hiding this comment

Uh oh!

alexvaluyskiy Apr 7, 2017

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Horusiath commented Jan 30, 2017 •

edited

Loading

Horusiath commented Mar 6, 2017 •

edited

Loading