[CBR-471] Fix Restoration Performance Regression#3786
Conversation
1a89260 to
96965af
Compare
dcoutts
left a comment
There was a problem hiding this comment.
Looks good to me, though I've not fully checked everything. Some suggestions below.
| wId :: WalletId | ||
| wId = WalletIdHdRnd rootId | ||
|
|
||
| -- Process the restoration of the block with the given 'HeaderHash'. |
There was a problem hiding this comment.
This comment would now be out of date right? If I followed correctly, it's restore starting from the block with the given hash.
| lift (nextHistoricalHash currentHash) >>= \case | ||
| Nothing -> throwM (RestorationFinishUnreachable tgtHash currentHash) | ||
| Just hh' -> put (hh', False) | ||
| lift $ getPrefilteredBlockOrThrow genesisHash currentHash |
There was a problem hiding this comment.
The use of a state monad here seems a bit odd to me. This is just keeping the currentHash as the running value right? I think I'd just write this as a local recursive action with the currentHash as a local arg.
There was a problem hiding this comment.
I've tried both approach actually and found the StateMonad the most readable of the two. The "edge" case makes the recursive-function approach a bit ugly :/
|
tried running the Nix scipt myself and first off, got a few locale related errors: but it did seem to sync: but then during wallet restore I got: |
96965af to
257f610
Compare
|
@erikd Yes indeed, my little patch yesterday just made it worse 😅 ... Now fixed properly. |
257f610 to
6fdffdd
Compare
|
|
||
| -- Flush to Acid-State & SQLite | ||
| k <- getSecurityParameter (wallet ^. walletNode) | ||
| let (blocks, txMetas, slotIds) = unzip3 (catMaybes updates) |
There was a problem hiding this comment.
I think I'd rewrite this code slightly; after discussing this code with @KtorZ , the catMaybes here has a dual purpose: any EBBs will be Nothing, and once we reach the final block halfway the batch, all the elements after that final block will also be Nothing. I find the style a bit confusing, and would probably prefer an explicit recursion style with an accumulator. That said, I don't think it's wrong what's here; but I think it does at least need a comment.
adinapoli-iohk
left a comment
There was a problem hiding this comment.
LGTM -- I don't have anything major to add on top of what Edsko & Duncan already said. Nice work!
| curl --silent --cacert ${stateDir}/tls/client/ca.crt --cert ${stateDir}/tls/client/client.pem -H "Content-Type: application/json" https://${wallet.walletListen}/api/v1/wallets \ | ||
| -d '{ | ||
| "operation": "restore", | ||
| "backupPhrase": ["session", "ring", "phone", "arrange", "notice", "gap", "media", "olympic", "water", "road", "spider", "rate"], |
There was a problem hiding this comment.
Hopefully people won't try to use this one for production wallets 😉
(I always feel like we should have some mechanism for preventing mainnet to accept any of the mnemonics we use in tests etc).
There was a problem hiding this comment.
Haha, I thought the same things 😶 ... I hope it's hidden enough in the nix entrails that no one will dig this out... right ? 😨
| Nothing -> withNode $ getFirstGenesisBlockHash genesisHash | ||
| Just sh -> nextHistoricalHash sh >>= maybe (throwM $ RestorationSuccessorNotFound sh) pure | ||
| restore genesisHash startingPoint NoTimingData | ||
| restore genesisHash startingPoint 1000 |
There was a problem hiding this comment.
Minor one -- might be nice to explicitly have a where clause for this "magic number", to clarify its purpose.
getBatch genesisHash hh batchSize = go [] batchSize hh
where
go !updates !n !currentHash | n <= 0 =
return (updates, currentHash)
go !updates _ !currentHash | currentHash == tgtHash =
getPrefilteredBlockOrThrow genesisHash currentHash >>= \case
Nothing -> go updates 0 currentHash
Just u -> go (updates ++ [u]) 0 currentHash
go !updates !n !currentHash = do
nextHash <- nextHistoricalHash currentHash >>= \case
Nothing -> throwM (RestorationFinishUnreachable tgtHash currentHash)
Just hh' -> return hh'
getPrefilteredBlockOrThrow genesisHash currentHash >>= \case
Nothing -> go updates (n - 1) nextHash
Just u -> go (updates ++ [u]) (n - 1) nextHashCurrently trying out this. I found a nice way to put this. A bit afraid about the boundary case so I'll go through a full restoration once again and see :) |
6fdffdd to
9cb1a58
Compare
|
Addressed comments above; went for the algorithm described above with accumulators and recursion instead of the |
We use to perform a db transaction per block which turns out to slow down the process quite a lot. Instead, we can apply k (.e.g 1000) updates inside one transaction and move on to the next batch. There's a bit of fiddling to do to get things working, but it turns out to be pretty efficient. The main trick is located in the new `applyHistoricalBlocks` which applies update in sequence, stopping at the first error encountered.
We should have cought the performance regression earlier. We can't realistically perform restoration as part of our unit or main integration tests because of the time it takes. However, it is perfectly realistic to have it ran nightly, and monitor its output to be inside accepted boundaries.
9cb1a58 to
80c4b5b
Compare
rvl
left a comment
There was a problem hiding this comment.
Looks good. Bash is bash.
The build timeouts are set in .buildkite/nightly.yml.
|
Integration tests still failing for an unrelated reason: Merging anyway 👍 |
…ration-perf-regression [CBR-471] Fix Restoration Performance Regression
…hk/KtorZ/CBR-471/fix-restoration-perf-regression [CBR-471] Fix Restoration Performance Regression
Description
I am still testing this (I had an issue with the very last block application causing the checkpoints to diverge in a non expected way). I believe it's now fixed but I have to wait full restoration.edit: Now tested, everything seems fine (see comment below)
We use to perform a db transaction per block which turns out to slow down the process quite a lot. Instead, we can apply k (.e.g 1000) updates inside one transaction and move on to the next batch. There's a bit of fiddling to do to get things working, but it turns out to be pretty efficient (~600 block/s on my setup, vs 80 block/s without the patch).
The main trick is located in the new
applyHistoricalBlockswhich applies update in sequence, stopping at the first error encountered.Besides, we should have cought the performance regression earlier. We can't realistically perform restoration as part of our unit or main integration tests because of the time it takes. However, it is perfectly realistic to have it ran nightly, and monitor its output to be inside accepted boundaries.
Linked issue
[CBR-471]
Type of change
Developer checklist
Testing checklist
QA Steps
... then go for a long coffee ☕
Screenshots (if available)
Note: There was a typo in the script when I ran it causing it not to terminate properly once synced. I've verified the log file and no errors related to sync (some lookup network failures but nothing unusual).
Plus, checking the wallet manually at the end:
{ "data" : [ { "id" : "Ae2tdPwUPEZ1UMga1bnFFD7FoFN9ZFoPLCWLaGcqNnfovcrcT3oMSuAz8DD", "spendingPasswordLastUpdate" : "2018-10-24T20:16:50.106258", "hasSpendingPassword" : false, "type" : "regular", "syncState" : { "data" : null, "tag" : "synced" }, "createdAt" : "2018-10-24T20:16:50.106258", "balance" : 0, "name" : "Acceptance Wallet", "assuranceLevel" : "normal" } ], "meta" : { "pagination" : { "totalEntries" : 1, "totalPages" : 1, "page" : 1, "perPage" : 10 } }, "status" : "success" }