Skip to content

Performance Sharding (Simple) #13

@juliangruendner

Description

@juliangruendner

Sharding concept information

Sharding provides the potential for large performance improvements as the overhead for creating the federated query accross each shard and the

merging of the results is significantly less costly than the query execution on a server.

Given the potential size of the data we would start with shard sizes of 250.000 patients per shard each shard with 8 cores and 64GB of RAM (see also: Tuning Guide).

=> for 2 million patients this would result in 8 shards with a total cost of 64 cores and 512GB of RAM.

Sharding facades - a first prototype

A first solution for sharding will be to develop a FHIR server independent sharding by patient ID support across our tooling.

This will in a first step require the following prototypes to be build:

Loading transaction federation for loading of multiple FHIR Servers
Sharding by Patient ID
Note that resources that are not patient specific (do not reference a patient directly - e.g. Medication should simply be copied to each shard)

federated CQL query facade (first version only feasibility)
Library and Measure Resources are created on each shard (transaction endpoint will already do this)
$evaluate-measure operation will call $evaluate-measure on every shard, collect and merge all MeasureReports and return the merged MeasureReport

federated FLARE query
one FLARE per shard
a federation component that executes the query on each FLARE and sums the results

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions