Support Sharding

**Sharding concept information**

Sharding provides the potential for large performance improvements as the overhead for creating the federated query accross each shard and the merging of the results is significantly less costly than the query execution on a server.

Given the potential size of the data we would start with shard sizes of 250.000 patients per shard each shard with 8 cores and 64GB of RAM (see also: [Tuning Guide](https://github.com/samply/blaze/blob/master/docs/tuning-guide.md)).
=> for 2 million patients this would result in 8 shards with a total cost of 64 cores and 512GB of RAM.

In a first step we will investigate sharding based on tooling around the standard blaze server, see: 
https://github.com/medizininformatik-initiative/fdpg-plus/issues/13

In a next step sharding should be implemented in blaze directly allowing the users of blaze to use the sharded installation analogous to a non-sharded installation.

Sharding should be based on patient compartments and in case resources are used across multiple patient compartments be duplicated to each shard.

As a first step the concepts sorrounding this type of sharding should be developed and the implications this has for the fhir api (for example parallel paging across shards) should be investigated.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Sharding #1758

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support Sharding #1758

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions