-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Implement the lazy_load_members room state filter parameter #2970
Conversation
and fix include_other_types thinko
|
Another thing that may need supporting is the context API if I'm not mistaken. Does this work as-is with Riot or should I hold off on deploying this to my test environment? |
|
i haven’t tested it against riot yet, but in theory it should work. good call on /context. i would be very interested in a “this sped up initial sync from X to Y, and reduced the json from P to Q” stat. |
richvdh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks plausible to me
when type filter includes wildcards on state_key
|
now updated to try to always add in the necessary member events for a chunk of timeline (although sytest is reporting 500s so clearly needs more work) |
|
So I just put this live with a custom sync worker against my Before (matrix-org-hotfixes branch)
After (matthew/filter_members branch)
So, it looks like (with this impl at least) we're seeing a roughly 2-3x improvement on various metrics. (more datapoints from @turt2live at https://gist.github.com/turt2live/a689cdf3cb0f2ddf3c93aa20f2440c16) |
|
It looks like this isn't sending the senders of events on incremental syncs. It's not the end of the world yet (as it doesn't actually break anything as far as I can tell), it just looks bad in Riot. (I also realize this isn't near complete yet - just lodging the bug now for consideration) |
To counteract the behaviour currently being demonstrated in matrix-org/synapse#2970
|
yeah, incremental syncs are borked. my next step here is to write some sytests to try to get some level of confidence it actually works properly. |
|
@richvdh ptal for a final time, i hope |
richvdh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are proving with 15 rounds of back-and-forth here, this stuff is finicky and hard to get right by inspection. Please add some tests to check the holes I am identifying.
synapse/storage/state.py
Outdated
|
|
||
| if ( | ||
| state_key is None or | ||
| filtered_types is not None and typ not in filtered_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lucky for you, and has higher precedence than or. I had to go and look it up though. Parens please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| return True | ||
| return False | ||
|
|
||
| got_all = is_all or not missing_types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just write is_all or (not missing_types and filtered_types is not None) rather than special-casing over a particular bug elsewhere in the algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find is_all or (not missing_types and filtered_types is not None) incredibly cryptic to reason about, to the extent i'm failing to convince myself it's even right.
For instance, if this is called with types=[] and filtered_types=[whatever], then is_all could well be false (if the cache is empty), and missing_types would be falsey, and filtered_types would not be None... so got_all would be true, which is the wrong answer.
Which is why I very deliberately spelt out the special case we're handling here where types=[], so we can't trust missing_types, which feels a lot easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but the fact that you are special-casing the empty list makes me suspect that there are bugs elsewhere.
and sorry, it should have been is_all or (not missing_types and filtered_types is None). Maybe it should be:
got_all = is_all
if not got_all:
# the cache is incomplete. We may still have got all the results we need, if
# we don't have any wildcards in the match list.
if not missing_types and filtered_types is None:
got_all = True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay.
| else: | ||
| got_all = is_all or not missing_types | ||
|
|
||
| return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth noting that missing_types isn't actually used in the result. Suggest removing it; I think it might open some clearer options in how to implement this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels like a useful thing for the function to be returning tbh, even if it isn't used, but given the filtered_types stuff means that we can't easily enumerate the types which are missing from the result, i've removed it.
| defer.returnValue({row["event_id"]: row["state_group"] for row in rows}) | ||
|
|
||
| def _get_some_state_from_cache(self, group, types): | ||
| def _get_some_state_from_cache(self, group, types, filtered_types=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think this is right; I don't think e22700c actually fixed the case I mentioned.
Prove me wrong with a test!
i strongly suspect i'd have missed it in a UT too.
Well, can you add one now, please.
|
@richvdh ptal. |
richvdh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
57 commits later... \o/
for the love of god please don't forget to squash when merging.
| if not got_all: | ||
| # the cache is incomplete. We may still have got all the results we need, if | ||
| # we don't have any wildcards in the match list. | ||
| if not missing_types and filtered_types is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and now that we aren't returning missing_types, I wonder why we are bothering to build a set rather than just using a boolean. But let's just land the damn thing.
|
As per #synapse-dev: |
This is a first cut at filtering out room_members from sync responses unless they're actually needed to render the timeline (as proposed at https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit#)
My hope is to get this merged so that client developers can experiment with lazy_loading and see how much it speeds up their clients, and to check how badly clients handle members trickling in on demand.
Must-have todo:
Lower priority: