Skip to content
This repository was archived by the owner on Jan 21, 2022. It is now read-only.

Feature/persistent#179

Closed
mhhf wants to merge 35 commits intoConsenSys-archive:masterfrom
nexusdev:feature/persistent
Closed

Feature/persistent#179
mhhf wants to merge 35 commits intoConsenSys-archive:masterfrom
nexusdev:feature/persistent

Conversation

@mhhf
Copy link
Copy Markdown

@mhhf mhhf commented Oct 10, 2016

There are a lot of this left to do, but this is a first draft in case anyone want take a look.
Unfortunately I struggle with a non deterministic test failure which might be caused my memdown and a lot bugs look familiar to #81.
However, all tests work fine in isolation.

  • this PR depends on a modified version of ethereumjs-blockchain:
    • the modification enables to submit a block on arbitrary block height without having its parent
      In the long run the ethereumjs-blockchain dependency can be omitted with including db storage directly into blockchain_double.js

@tcoulter
Copy link
Copy Markdown
Contributor

Awesome @mhhf, thanks for your work on this.

Quick heads up: I'm going to be away from the 13th to the 31st, which means this PR likely won't get merged until after I'm back. I'm sorry for the delay. If you think you can make this production ready by Wednesday, happy to jump on it, however, and do what I can to get it merged.

Regardless of timing, TestRPC 3.0 came out last week. Do you think you can remerge with master when you get a chance? Will be awesome to test this out.

Thanks!

@mhhf
Copy link
Copy Markdown
Author

mhhf commented Oct 10, 2016

I can make a sprint in the next 2 days: finalizing all features, merging the current master and clean everything up, however I don't think I will solve the problems caused by #81 and solving them is required for a production ready version :(

@tcoulter
Copy link
Copy Markdown
Contributor

tcoulter commented Oct 10, 2016

We should hammer down on that bug then. I suppose this gives us more time. :)

@mhhf
Copy link
Copy Markdown
Author

mhhf commented Oct 15, 2016

I solved one source of non-deterministic bugs which also might be related to #81 -
ethereumjs uses levelup/memdown combo which they always initialize with at path "":
In case there are many db's (e.g. multiple caching, multiple tries, multiple blockchains) they interfere with each other - e.g. if cache.flush is called some important data for some different actor might get removed.
another weird behavior with memdown is that it has difficulties to store mixed type key values:
merkle-patricia-tree just stores Buffer values as keys, however if the db is used to store strings or ints as keys as well memdown might not find them, because it has difficulties to compare them which was also leading to errors.
However, after removing those two bug sources, there is still one non deterministic bug left which causes some tests to fail, in special across different tests with opening the db and closing it.
But as for now, all tests work in isolation

@mhhf
Copy link
Copy Markdown
Author

mhhf commented Nov 16, 2016

from geth 1.5 upwards there is a new eth_getRawTransaction RPC API, which solves a major problem in state reconstruction. With this in place, I will remove the temporary hack of storing fake transactions. Also @tcoulter do you think we can pick up the work on this PR any time soon?

@tcoulter
Copy link
Copy Markdown
Contributor

@mhhf 👍 I'll have to look into that new request.

About this PR: apologies. I'm spread super thin right now. Perhaps this is something @johnmcdowall might want to look into? (warning: low level testrpc arch stuff)

Regardless, I'll see what I can do.

@mhhf
Copy link
Copy Markdown
Author

mhhf commented Nov 26, 2016

@tcoulter I hear you :) also have very limited resources right now. Did you had a change look on the code? Spend some more time to finishing it as far as I could and here is the outcome:

TDLR: Code is working although messy, need to sync up with you on how to proceed.

Long version:
Although its working, the current architecture wasn't designed to work with persistency (e.g. interlinked object instances are passed everywhere instead of json).
Also I cannot say I fully understood the big-architecture-picture of testrpc and really struggled to debug certain aspect, which required me to refactor major parts of the code, therefore I might have put some significant technical debt inside and with this might also have broken some parts, not covered by tests.

  1. I noticed that event/ log syncing is pretty slow with the current approach of chain-forking: if a filter is installed from genesis to latest block, testrpc tries to inspect each block for logs, and because they aren't there it syncs each block from the fallback blockchain. This is verry slow. What I currently do is forwarding the event filter to the forked-blockchain if the filter lies in its range, otherwise construct the result as usually. This might not be the best strategy, but its way faster then the previous one. However, coming up with some alternative strategy for log synching/ event filtering might be a better approach here.

  2. I made extensive usage of ethereumjs-blockchain which I had to modify in order to support chain forking and address ownership faking. The capabilities of ethereumjs-blockchain are similar to the native blockchain_double.js so currently the blockchain logic is distributed between both. I suggest either to outsource much logic from blockhain_double.js into ethereumjs-blockchain. Or omitting the usage of it and putting all logic into blockchain_double.js. Some feedback would be great on this problem.

  3. Its great that eth_getRawTransaction is implemented as a rpc command, but I think I'll propose to keep the current 'hacky' tx/fake.js implementation, since this allows us to pass wrongly signed transactions while still being able to store them persistently. In order to achieve this I had to make some very ugly etherumjs-tx hacks: see statemanager.js:20

  4. I'm using testrpc as a dependency, therefore I didn't approaches the cli. An important definition would be having a flag for a persistent storage e.g.: --db=/usr/root/db/. Currently in-memory storage is used by default.

  5. I could resolve the most, but there is still some non deterministic bugs which mostly are caused by dbs interference and asynchronously opening/ closing the db while the provider-engine is still working see this which was solved by putting setTimeout 100 on closing the server/provider.

Overall: Do you want to schedule a call and go over those points/ plan a merge?

"eth_call": "before",
"eth_getStorageAt": "before",
"eth_getLogs": "before"
"eth_getStorageAt": "before"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed eth_getLogs from here. I remember it being needed to speed things up, if not provide the correct functionality. Are you sure it's no longer required? (I'm basically looking for surety; open to removing it if it's no longer required.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had major trouble with it while working with logs on forked blockchain as some internal logic was triggering something twice. I don't remember fully what the side behavior was but after spending quite some time on debugging provider-engine (which is kind of hard) i just removed it here, and everything worked fine.

Comment thread lib/utils/forkedblockchain.js Outdated
var Blockchain = require('ethereumjs-blockchain');
var Log = require("./log.js");
var Receipt = require("./receipt.js");
// var Receipt = require("./receipt.js");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this isn't needed. Remove if it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

block.header.stateRoot = utils.toBuffer(json.stateRoot);
block.header.difficulty = utils.toBuffer('0x00');
block.header.gasLimit = utils.toBuffer(json.gasLimit);
block.header.number = utils.toBuffer(json.number + 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything here we can pull out and put in BlockchainDouble.createBlock() for consistency? Looks like difficulty might be the only one. For my edification, what was the reason to move away from createBlock() and instead use new Block()?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockchainDouble.createBlock() here is never called, instead BlockchainDouble builds on top of parentBlock, which we want to prevent here as we forking of with a new Block. super.createBlock() is needed for later in the ForkedBlockchain as well, so I cannot override this.
I think the best way is to keeping it here.

});

var receipt = new Receipt(result.tx, result.block, logs, receipt_json.cumulativeGasUsed, receipt_json.contractAddress);
self.web3.eth.getTransactionReceipt(hash, callback);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blows my mind that we can get rid of all this! The tests pass. Good catch!


ForkedBlockchain.prototype.getRelativeBlockNumber = function(number) {
number = this.getEffectiveBlockNumber(number);
return number - to.number(this.fork_block_number) - 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there was a reason I did this calculation, but I don't remember now exactly. If getRelativeBlockNumber and getEffectiveBlockNumber are synonymous, shouldn't we remove one or the other?

In my head, relative meant relative to our own block store: i.e., 0-based index in memory, which is needed to know if this is our block or the forked block. effective meant to translate things like "latest" to a real number. Have you merged the two?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should totally remove one. I was just sloppy on this.

var relative = this.getRelativeBlockNumber(number);

if (relative < 0) {
if (relative <= parseInt(this.fork_block_number)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this might answer my above question, as the answer might be "you don't need getRelativeBlockNumber anymore".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this :)
I don't dig to deep into the relative vs. absolute block number and your code logic, just made enough changes to satisfy my need and make the tests pass. This part and the part above needs more consideration in oder for a clean refactoring.

Comment thread lib/utils/receipt.js
var current = block.transactions[i];
if (current.hash() == tx.hash()) {
this.transactionIndex = index;
this.transactionIndex = i;
Copy link
Copy Markdown
Contributor

@tcoulter tcoulter Dec 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, how did that ever work...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh - consider some gentle linting

Comment thread lib/utils/txhelper.js Outdated
gasPrice: utils.toBuffer("0x" + json.gasPrice.toString(16)),
data: utils.toBuffer(json.input)
});
tx._hash = json.hash;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't generally like variables prefixed with _ that are set outside of the class file/object file itself. Can you tell me why this one is here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed the etheruemjs-tx style _from variable here which is faked in the same way. Again that is part of the ugly ethereumjs-tx hack.

Comment thread package.json Outdated
"solc": "0.4.6",
"web3": "~0.16.0",
"web3-provider-engine": "~8.1.0",
"web3-provider-engine": "git+https://github.com/nexusdev/provider-engine.git",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't rely on these github repositories for a final release. Let's see if we can get your changes merged upstream in provider-engine and ethereumjs-blockchain merged upstream.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the very least you can target a specific commit
git+https://github.com/nexusdev/provider-engine.git#commitHash

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged and published as web3-provider-engine@8.1.15 👍

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kumavis, will include it here!
@tcoulter this was never intended to be part of the commit. But I needed a change for proper operation. Will change this!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcoulter I updated the provider engine to the newest version (~8.1.16), However, this lead to the mining and mining_inverval tests failing.
The errors always happens at line like this: https://github.com/ethereumjs/testrpc/blob/master/test/mining.js#L300
I suppose the old provider engine had the feature of throwing an error in case a throwing transaction was send, now its successfully getting included into the new block (I assume the user still pays for a failed gas).

In my fork I skipped tests, which fails out of this reason.

Comment thread package.json
"eslint-plugin-standard": "~1.3.3",
"mocha": "~2.2.5"
"mocha": "~2.2.5",
"solc": "~0.3.0-1",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably update solc here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread test/accounts.js Outdated

assert(accounts[0].toLowerCase(), expected_address.toLowerCase());
done();
provider.close(done);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we might be able to get around this which would remove the need for any for changes in many of these test files.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 works without

@tcoulter
Copy link
Copy Markdown
Contributor

tcoulter commented Dec 9, 2016

Alright, added a bunch of comments (I should have probably started a review; sorry for all the emails). Thanks again @mhhf!

@@ -73,8 +88,12 @@ ForkedBlockchain.prototype.initialize = function(accounts, callback) {
self.web3.eth.getBlock(blockNumber, function(err, json) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered a weirdness in web3 - if typeof blockNumber === "string" blockNumber is treaded as hash and a eth_getBlockByHash method call is generated which leads to a crash if block number 0 is requested which gets translated to 0x0. Only in the case its an number, its treaded as the blockNumber.
I fixed it in my fork by introducing this line:

if(typeof blockNumber == "string" && blockNumber[1] === "x" && blockNumber.length < 42) blockNumber = parseInt(blockNumber)

However, it doesn't feel clean. Also there is a chance for this happening somewhere else - I haven't scanned the code for other occurrences of this (just noticed it here). Is this line good enough or should I submit a new Issue for this?

@mhhf
Copy link
Copy Markdown
Author

mhhf commented Dec 23, 2016

@tcoulter I submitted a bunch of new corrections, so you can go and have a look on them.
The biggest chunk I'm fighting with is ethereumjs-blockchain: Since my initial fork a lot of things were refactored, including breaking changes (such as having constrains on when save the block under its blockNumber, making querying arbitrary blocks by their number impossible) also I found the current state quite buggy (e.g. they forget to mark the blockchain as initialized after the initialization function finishes). I spend more then a full working day on trying to stabilize current ethereumjs-blockchain state and refactor it into testrpc and failed, with an unknown amount of work to go. But I've started talking about needed changes and bugs.
I'd suggest working with the current mod-blockchain fork until ethereumjs-blockchain is considered stable and all necessary features (such as flag for disable validator for forked blocks) are included.

@mhhf
Copy link
Copy Markdown
Author

mhhf commented Feb 7, 2017

Should I close this?

@tcoulter
Copy link
Copy Markdown
Contributor

tcoulter commented Feb 7, 2017

Absolutely not! I really appreciate your work here, and I apologize for the length of time this has taken me to complete. Merging this is the TestRPC's highest priority, actually. We currently have people using it and noting memory issues which we need to explore. They may be unrelated to this change, but I need to sit down with it and check it out. I expect to do this in the next two weeks.

As an aside, I'm spread really thin. This has 100% been blocked by me and I take all the blame for it. I've sent you a message on gitter and would love to chat more.

@tcoulter
Copy link
Copy Markdown
Contributor

tcoulter commented Mar 9, 2017

@mhhf Update:

I did some digging into this, and it looks like a) ethereumjs-blockchain adds restrictions that are hard to get around, and b) it's likely etheruemjs-blockchain is going to be rewritten. The general word is, "Ya, it isn't used that much." My impression of the conversation was that it wouldn't be a good idea to use it.

So instead, I used your branch as inspiration and rewrote database persistence in a different way, saving all the data in a custom fashion. You can see the PR here.

The pros to this branch:

  • The underlying data structures don't change. Data that were stored in arrays, like the blocks array, are still kept in array-like data structures.

Cons:

  • All the data read/writes are now asynchronous, hence most of these changes.

I'll probably be moving forward with the persistence branch within the next couple of days. All the tests pass except my poorly written performance test (I should have guessed it wouldn't pass). Feel free to put it through the ringer and let me know what you think.

@tcoulter
Copy link
Copy Markdown
Contributor

tcoulter commented Mar 9, 2017

Closing this. Thanks for all your work @mhhf. Merged persistence into master.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants