-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Implement multiple store backends #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The first additional backend is BoltDB. A `store` flag has been added to commands, that accepts the name of a store (as of this commit, valid values are `boltdb` and `rocksdb`). Stores register their availability via `init()`, which means that new stores should be imported for side-effects wherever they may be required directly (predominantly in commands or tests). Store implementations can be included via build tags. BREAKING: The default store has been set to `boltdb`, primarily because this store is native Go, and can be built against even when CGO is not available. To produce builds that include rocksdb support, the `rocksdb` tag _must_ be supplied, ie: ``` go install -tags rocksdb ``` This initial implementation does not include a functional `merge` command for boltdb.
Multiple backend stores are not something we intend to support. There's an expectation of tight coupling between Dgraph and whatever we use to store data on disk. Which we've decided to be RocksDB, given the reasoning that you've rightly pointed out. I don't buy the argument that each project is different so read/write concurrency doesn't matter. To begin with, our bulk loader is so efficient purely because of write concurrency, that BoltDB doesn't provide. Also, in live Dgraph instance, we periodically merge dirty posting lists, which currently go unnoticed performance-wise, but would be very negatively impacted if reads and writes become mutually exclusive. If we really wanted to kill concurrency, we could have just used Python. Starting v0.4, which we're aiming to release this week, we'll be making it easier for people to use Dgraph by providing pre-built binaries, which can be readily run on Linux or Mac. That should tackle the main issue you're trying to solve here, which is ease of installation. Sorry, I can't accept this pull request. |
That could be a valid reason to reject this PR (maintenance complexity, though the surface area is quite small, had you looked at the code), but not for the reasons you've outlined.
Why? That's certainly not evident from the code except in a couple of the helper utilities. A KV store is pretty much a KV store, except in edge-cases or performance-related areas.
What reasoning is that? BoltDB provides good read throughput, so for read-heavy applications, it would possibly be an improvement.
That's not what I said. I said that projects differ in their requirements for read or write throughput. For some projects, read throughput is vastly more important than write throughput.
I guess that's good, if a project cares about that functionality.
Read operations in BoltDB occur concurrently, writes are serialized. Do you have any benchmarks to back up your assertions here, or are you just making stuff up?
Sick burn... or something.
Actually, I was looking at embedding Dgraph in a cross-platform/-arch project, and did not want to carry a dependency on a C library.
Yeah, looks like Dgraph is not for us. I have to say that based on the poor attitude in your response here, I think I may be better off not contributing to this project anyway. I was going to clean up a bunch of other stuff in the code-base after this merge, but I guess you can keep it. |
Sorry you feel that way. But, it's understandable given you already put effort into something without talking to us about it. The best first step here would have been to bring up your intentions in discuss.dgraph.io, so we could have warned you about this up front, before you spent the effort. Cgo is bad, I understand that you want to stay away from it. But, RocksDB doesn't have a Go equivalent. Bolt DB is surely not it. |
My disappointment is not so much at the wasted effort (it was only about an hour, and I specifically mentioned in the PR that I was happy to discuss this), but in the dismissive and condescending tone of your response - this is not how you engage with potential community members. I notice you can't be bothered elaborating on your position with a response to my queries either.
So, you understand that cgo is problematic, but don't want to do anything about it? Once you have multiple backends, it's easy to add something that satisfies more use-cases, or is objectively better - by whatever metric the user measures 'better' as - and the user gets to choose the characteristics that are important to them. In a description on your discuss board, you even specify that you are simply using RocksDB as a layer to read/write data to disk, so I don't know why you'd be so attached to it, especially without data to support your performance claims. |
I don't think my tone was condescending. Choosing a library which allows reads and writes concurrently over a library which makes reads and writes mutually exclusive -- is a pretty important design decision, particularly for something that's going to be the basis of Dgraph. I'm not sure what a benchmark is going to prove here, it's basic computer science. On the other hand, you haven't given any logic for your PR, other than you don't "want" to include any Cgo. |
@pdf Thanks for your PR and effort. I am sorry that you feel disappointed on the decision of it not getting accepting, which is completely understandable given your effort for this PR. Whether or not to support multiple backends is an important design decision, and there are successful projects with both ideologies -- eg. Postgres and MySQL. At this point, Dgraph is very young, and have decided to focus on tightly couple backend, as apart from engineering choices (which I am not going to get into, as I am not most qualified when it comes to Go eco-system) it allows to utilise the limited available bandwidth more effectively. You can find more about the roadmap of dgraph at #1 and discuss major ideas at discuss.dgraph.io. Once again, we are thankful for your contribution, and sorry for your disappointment. I do however hope you continue to contribute. |
@mohitranka thanks, this is a sensible response. The implementation to support multiple backends is certainly fairly trivial though, as you can see here. To add some actual data WRT performance, below are the Get/Set benchmarks that were already in the codebase, using 8 cores, for each of the backends. By default you're using async writes on RocksDB, which may not be safe. I should also note here that the RocksDB write benchmarks would not complete with less than ~12GB of RAM free, otherwise triggering the OOM killer - something is definitely not right in RocksDB-land (obviously can't use the go memory profiling tools, because cgo). RocksDB vs BoltDB (default configuration: RocksDB async, BoltDB sync)
RocksDB vs BoltDB (RocksDB async, BoltDB async)
Frequent growth of the BoltDB backing store can hurt performance, but growth size can be increased to reduce this churn: RocksDB vs BoltDB (RocksDB async, BoltDB async+AllocSize)
And finally, the safe configuration for both options, using sync writes, without any tuning: RocksDB vs BoltDB (RocksDB sync, BoltDB sync)
|
@pdf Moving the discussion to Dgraph's discourse |
The first additional backend is BoltDB. A
store
flag has been addedto commands, that accepts the name of a store (as of this commit, valid
values are
boltdb
androcksdb
).Stores register their availability via
init()
, which means that newstores should be imported for side-effects wherever they may be
required directly (predominantly in commands or tests).
Store implementations can be included via build tags.
BREAKING: The default store has been set to
boltdb
, primarily becausethis store is native Go, and can be built against even when CGO is not
available. To produce builds that include rocksdb support, the
rocksdb
tag must be supplied, ie:This initial implementation does not include a functional
merge
ordlist
command for boltdb.
I saw the comment on #71 regarding the reasoning for choosing RocksDB (it should be noted that while BoltDB only allows a single writer, it does support multiple readers), but the
ability to build without CGO can be preferable for portability, and the important
performance characteristics (read vs write perf for instance) will differ from
project to project, so choice is also desirable.
I have not done significant testing or updated the documentation yet, but discussion would be
welcomed.
The tests/benches should be updated with Go 1.7, as multiple backends are particularly suited to sub-tests/-benchmarks.
This change is