Skip to content

Support Sharding #1758

@juliangruendner

Description

@juliangruendner

Sharding concept information

Sharding provides the potential for large performance improvements as the overhead for creating the federated query accross each shard and the merging of the results is significantly less costly than the query execution on a server.

Given the potential size of the data we would start with shard sizes of 250.000 patients per shard each shard with 8 cores and 64GB of RAM (see also: Tuning Guide).
=> for 2 million patients this would result in 8 shards with a total cost of 64 cores and 512GB of RAM.

In a first step we will investigate sharding based on tooling around the standard blaze server, see:
medizininformatik-initiative/fdpg-plus#13

In a next step sharding should be implemented in blaze directly allowing the users of blaze to use the sharded installation analogous to a non-sharded installation.

Sharding should be based on patient compartments and in case resources are used across multiple patient compartments be duplicated to each shard.

As a first step the concepts sorrounding this type of sharding should be developed and the implications this has for the fhir api (for example parallel paging across shards) should be investigated.

Metadata

Metadata

Labels

epicA large body of work that can be broken down into a number of smaller issues.performancePerformance improvement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions