Allow use of centralized datasource schema and segment metadata cache together #17996

kfaraz · 2025-05-12T15:17:28Z

Description

#17935 enables use of HeapMemorySegmentMetadataCache on the Coordinator.
But it cannot be used in conjunction with centralized datasource schema (i.e. SegmentSchemaCache)
This patch supports usage of both features on the Coordinator together.

Main Changes

Make SegmentSchemaCache a dependency of HeapMemorySegmentMetadataCache
Bind SegmentMetadataCache and SegmentSchemaCache in MetadataManagerModule
Add NoopSegmentSchemaCache to be used on the Overlord
Poll schemas in HeapMemorySegmentMetadataCache and update SegmentSchemaCache
Update the used_status_last_updated column of a segment record when its schema fingerprint is updated

Fix a race condition

Add a sync buffer duration of 10 seconds to HeapMemorySegmentMetadataCache

Handles a race condition between sync and insert to cache (caught in CompactionTaskRunTest)
Prevents removal of entries from cache if they have a last updated time just before sync start
and were added to the cache just after sync start
This means that non-leader nodes will continue to consider a segment as used if it was marked unused
within 10 seconds of any other update done it (created, marked used, schema info added)
10s is more than enough for this, since the cache is already performing as expected in several prod clusters.

Guice changes

Add MetadataManagerModule used only by Coordinator and Overlord to bind metadata managers
Restrict SQLMetadataStorageDruidModule to bind only SQL connector related stuff

This PR has:

…_metadata_cache

gianm · 2025-05-14T12:23:27Z

...er/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java

    updateUsedSegmentPayloadsInCache(datasourceToSummary);
    retrieveAllPendingSegments(datasourceToSummary);
    updatePendingSegmentsInCache(datasourceToSummary, syncStartTime);
+    retrieveAllSegmentSchemas(datasourceToSummary);


Should add something to the method-level javadoc of "stuff that happens every sync".

Do you mean the javadoc of each of these methods should say whether they are invoked in every sync or only the first sync? Or some other additional info too?

I meant that in the javadoc for this method itself, there's a section titled "The following actions are performed in every sync". It doesn't currently mention the schema syncing.

Ah, right, will do 👍🏻

gianm · 2025-05-14T12:25:14Z

...er/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java

+    final String sql = StringUtils.format(
+        "SELECT fingerprint, payload FROM %s WHERE version = %s",
+        tablesConfig.getSegmentSchemasTable(), CentralizedDatasourceSchemaConfig.SCHEMA_VERSION
+    );


I think this code is fetching the entire set of segment schemas on every call to syncWithMetadataStore. Is this going to be OK, performance-wise? It seems expensive.

The current implementation in SqlSegmentsMetadataManager polls all the schemas in every sync too.
But since the schemas table already has a used_status_last_updated as well as a created_time column,
we can try to do delta syncs in a fashion similar to the segments table.

Thanks for the suggestion, I will update the PR accordingly.

…_metadata_cache

gianm · 2025-05-15T18:28:58Z

...er/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java

+    if (syncFinishTime.get() == null) {
+      retrieveUsedSegmentSchemasUpdatedAfter(DateTimes.COMPARE_DATE_AS_STRING_MIN, datasourceToSummary);
+    } else {
+      retrieveUsedSegmentSchemasUpdatedAfter(syncStartTime, datasourceToSummary);


There should be some slack in this, to allow for the fact that clocks may not be perfectly synced across servers, and various factors (such as retries on insert) can cause records with timestamps in the past to appear. An hour should be more than enough.

Makes sense.

There is another bug here, anyway I should have been using the start time of the previous sync rather than the current sync.

(such as retries on insert)

Minor clarification on this point:
For the most part, in IndexerSQLMetadataStorageCoordinator, I have tried to ensure that each transaction retry uses a fresh timestamp, since retries can go on for a while.
But as you point out, there can still be cases where past records appear.

P.S: I think we should be able to do something similar for fetching used segment IDs too.
Currently, we fetch all of them. But we could potentially fetch only the recently updated ones,
thus improving the delta sync time even further.
We would just need an index on used + used_status_last_updated in druid_segments table
(same as schemas table already does).

gianm · 2025-05-15T18:32:18Z

...er/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java

    this.cacheMode = config.get().getCacheUsageMode();
    this.pollDuration = config.get().getPollDuration().toStandardDuration();
    this.tablesConfig = tablesConfig.get();
+    this.useSchemaCache = schemaConfig.get().isEnabled() && nodeRoles.contains(NodeRole.COORDINATOR);


Small point, but just wanted to comment. This sort of logic is an anti pattern in Guice usage. In an ideal world, decisions about which features to enable on which servers should live in the Guice modules, rather than the main code. Separating that way makes the main code more composable and testable.

Thanks for calling it out! Felt really hacky to me too.

Let me see how I can clean it up.

gianm

LGTM with the delta sync changes. Up to you if you want to adjust the Guice stuff.

kfaraz · 2025-05-16T06:03:48Z

Thanks for the review, @gianm .
I have updated the Guice bindings too. Waiting for CI to finish.

kfaraz · 2025-05-18T03:32:09Z

@gianm , I have had to fix up the delta sync logic since there were some bugs with the previous approach.

The sync for schemas now resembles the logic employed for used segments. It works as follows:

Full sync
- Fetch all schemas marked as "used" in the metadata store
Delta sync
- Fetch only the fingerprints (and not payloads) for all schemas marked as "used" in the metadata store
- Remove entries from cache if they are not present in the metadata store anymore
- Fetch payloads for the fingerprints which are not already present in the metadata store

…_metadata_cache

kfaraz added 3 commits May 3, 2025 21:26

WIP: Temp changes

f33b679

Merge branch 'master' of github.com:apache/druid into poll_schemas_in…

e055910

…_metadata_cache

Enable both

0ffd2da

github-actions bot added the Area - Ingestion label May 12, 2025

kfaraz added 2 commits May 12, 2025 21:10

Fix up compilation and metrics

a15cbd3

Enable ITs to use both features together

93dd1d1

gianm reviewed May 14, 2025

View reviewed changes

kfaraz added 2 commits May 15, 2025 19:58

Merge branch 'master' of github.com:apache/druid into poll_schemas_in…

cf5eab4

…_metadata_cache

Poll only used and recently updated schemas

a9973d8

gianm reviewed May 15, 2025

View reviewed changes

kfaraz added 2 commits May 16, 2025 10:52

Fix up cache bindings

2fcdabf

Remove extra binding

4c65101

gianm approved these changes May 16, 2025

View reviewed changes

kfaraz added 6 commits May 16, 2025 13:21

More dependency stuff

daa8e27

Fix up guice bindings for real

23bbd9a

Simplify some dependencies

6ce338b

Fix up tests

52db331

Fix logic for delta sync of segment schemas

9877493

Add tests, fix up logic

a18a16e

kfaraz added 7 commits May 18, 2025 10:06

Remove extra changes

c4e37ab

Merge branch 'master' of github.com:apache/druid into poll_schemas_in…

a486a87

…_metadata_cache

Attempt to fix test

368b0e6

Merge branch 'master' of github.com:apache/druid into poll_schemas_in…

ec33958

…_metadata_cache

Handle race conditions

e274065

Reduce buffer window to 10s

1a07db5

Fix tests

3ae591b

kfaraz merged commit 8be787a into apache:master May 19, 2025
74 checks passed

kfaraz deleted the poll_schemas_in_metadata_cache branch June 8, 2025 14:05

capistrant added this to the 34.0.0 milestone Jul 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow use of centralized datasource schema and segment metadata cache together #17996

Allow use of centralized datasource schema and segment metadata cache together #17996

Uh oh!

kfaraz commented May 12, 2025 •

edited

Loading

Uh oh!

gianm May 14, 2025

Uh oh!

kfaraz May 14, 2025

Uh oh!

gianm May 15, 2025

Uh oh!

kfaraz May 16, 2025

Uh oh!

gianm May 14, 2025

Uh oh!

kfaraz May 14, 2025

Uh oh!

gianm May 15, 2025

Uh oh!

kfaraz May 16, 2025

Uh oh!

kfaraz May 16, 2025

Uh oh!

gianm May 15, 2025

Uh oh!

kfaraz May 16, 2025

Uh oh!

gianm left a comment

Uh oh!

kfaraz commented May 16, 2025

Uh oh!

kfaraz commented May 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Allow use of centralized datasource schema and segment metadata cache together #17996

Allow use of centralized datasource schema and segment metadata cache together #17996

Uh oh!

Conversation

kfaraz commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Main Changes

Fix a race condition

Guice changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

kfaraz commented May 16, 2025

Uh oh!

kfaraz commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kfaraz commented May 12, 2025 •

edited

Loading

kfaraz commented May 18, 2025 •

edited

Loading