-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
This lib is currently incomplete, although it is not far off being worthy of publishing.
This lib stands to replace both pelias/dbclient and the older pelias/esclient modules.
The key points of differentiation from other streaming elasticsearch indexers are:
- batching via the bulk API
- retry failed batches
- flooding upstream propagating downstream (most important)
Other libraries are not well suited for large datasets containing complex properties (such as country size polygons) which take some time to process on the java-side, as a result, naive indexers cause elasticsearch to fill up the bulk indexing threadpool which results in those batches being rejected and data loss.
What's left to do:
- Write readme and explain how concurrency, retries and the cli work
- Rethink and test the concurrency control mechanism to achieve optimum load
- Refactor some of the code to emit events
- Write a stats module which captures Transaction events and emits stat digests.
Module Goals:
☑ batched writes
☑ adjustable batch size
☑ partialy retry failed batches
☑ backpressure (flood control)
☑ concurrency setting, better highwatermark
☐ actionable error reporting
☑ elasticsearch client injectable
☑ well tested via unit tests & in production
☑ bin file, input streams from cli with id, type mapper
☑ minimal dependencies, dependency injection
☑ usable outside pelias project & not strictly tied to pelias config
☑ ensure no data loss due to ES errors or failure to flush batches
☐ healthcheck via threadpool status
☐ compatibility with different nodejs stream versions
☑ better logging - via winston
Issues with dbclient:
☑ badly named, doesnt describe purpose
☑ not abstracted from pelias
☑ strictly dependency on other pelias modules
☑ not generally useful to 3rd parties
☑ difficult for 3rd party developers to contribute
☑ untidy code
☑ not fully unit tested
☐ not well documented
Duplication across modules (causing confusion):
- https://github.com/geopipes/elasticsearch-backend
- https://github.com/pelias/esclient
- https://github.com/pelias/dbclient
Dependants:
- dat-elasticsearch-upload
- pelias-geonames
- pelias-openaddresses
- pelias-openstreetmap
Similar projects / implementations:
https://github.com/hmalphettes/elasticsearch-streams
https://www.npmjs.com/package/elasticstream
https://github.com/simianhacker/bunyan-elasticsearch/blob/master/index.jsrunning unit tests
$> npm testrunning integration tests
$> npm run integrationReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels