Skip to content

Conversation

@sm-sayedi
Copy link
Collaborator

@sm-sayedi sm-sayedi commented Nov 15, 2025

Fixes: #1256
Fixes: #2104

Screen recordings

First batch - Empty

Before

Empty.first.batch.-.Before.mov

After

Empty.first.batch.-.After.mov

First batch - Less than a screenful

Before

First.batch.with.few.messages.-.Before.mov

After

First.batch.with.few.messages.-.After.mov

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch 5 times, most recently from f920082 to a53cbfa Compare November 20, 2025 19:20
@sm-sayedi sm-sayedi marked this pull request as ready for review November 20, 2025 19:23
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from a53cbfa to ef1fcc3 Compare November 20, 2025 19:50
@sm-sayedi sm-sayedi added the maintainer review PR ready for review by Zulip maintainers label Nov 20, 2025
@sm-sayedi sm-sayedi requested a review from chrisbobbe November 20, 2025 19:51
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from ef1fcc3 to 127f690 Compare November 20, 2025 20:02
Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here's a review of the first three commits:

542996e msglist [nfc]: Mention fetchNewer in the MessageListView dartdoc
ce56456 msglist [nfc]: Add a new point to _MessageSequence.messages dartdoc
3c10bed msglist [nfc]: Remove one nested try block in _fetchMore

and a partial review of the fourth:

03d61ca msglist: Fetch newer messages despite previous muted batch

For that fourth commit, can you say briefly in the commit message why it's not a complete fix for the issue, to help orient the reader to what comes next?


processResult(result);
} catch (e) {
hasFetchError = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msglist [nfc]: Remove one nested try block in `_fetchMore`

The nested `try` block doesn't seem to be making any difference,
so good to remove.

This isn't NFC: now, hasFetchError is about what happens when data is fetched and processed (with processResult), not just about what happens when data is fetched. That doesn't seem desirable, just from looking at its name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks for the catch. Dropped the commit.

Comment on lines 148 to 158
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ID of the oldest known message so far in this narrow.

There isn't necessarily a message with this ID in the narrow, though, right? In a quick skim, I don't see event-handling code to update _oldMessageId when the corresponding message is deleted or moved out of the narrow. We should avoid saying the message is in the narrow when it might not be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid saying the message is in the narrow when it might not be.

Some nits in this revision:

  • The dartdoc summary line still says the inaccurate thing.
  • "A non-null value for this doesn't always mean [haveOldest] is true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, like oldestFetchedMessageId (which might be helpful in any case).
  • I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)
  • A few wording nits for clarity; see below.
Suggested change
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
/// The ID of the oldest message fetched so far in this narrow.
///
/// This is used as the anchor for fetching the next batch of older messages
/// and will be `null` if no messages of this narrow have been fetched yet.
///
/// A message with this ID might not appear in [messages]:
/// - The message may be in a muted conversation.
/// - The message may have been moved or deleted after it was fetched.
int? get oldestFetchedMessageId => _oldestFetchedMessageId;

(Similarly for the corresponding newet-fetched-message code.)

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Similar to #1951 (review), here's some high-level comments before I go on vacation.

Comment on lines 1219 to 1221
// TODO perhaps offer mark-as-read even when not done fetching?
MarkAsReadWidget(narrow: widget.narrow),
if (model.messages.isNotEmpty)
MarkAsReadWidget(narrow: widget.narrow),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship of this change to the main changes happening in this commit? What's the user-visible change in behavior it causes?

Copy link
Collaborator Author

@sm-sayedi sm-sayedi Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with the new changes in the commit, if the initial newest batch is all muted (model.messages.isEmpty), and then when older messages are being fetched, the "Mark as read" widget will be shown along with a progress indicator and no messages, so I thought it may be a good idea to hide "Mark as read" until there are visible messages populated.

Before (this one-line change) After (this one-line change)
Screenshot 2025-11-26 at 7 30 45 AM Screenshot 2025-11-26 at 7 32 13 AM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see.

What does that situation look like before this branch / before even the first batch is fetched? I believe we show a different progress indicator (centered), and don't show the mark-read button.

Can we have that behavior continue when we've had some fetch requests finish, but haven't yet actually found any messages? I think I'd like to think of that state as equivalent to when the initial fetch is still going.

IOW I'd like to think of it as "still working on fetching messages", like in the doc comment on MessageListView.fetched (in main):

  /// Whether [messages] and [items] represent the results of a fetch.
  ///
  /// This allows the UI to distinguish "still working on fetching messages"
  /// from "there are in fact no messages here".
  bool get fetched => switch (_status) {
    FetchingStatus.unstarted || FetchingStatus.fetchInitial => false,
    _ => true,
  };

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that will be an improvement. It will also avoid the abrupt change from a centered progress indicator to a bottom-aligned indicator.

Comment on lines 229 to 231
FetchingInitialStatus _fetchInitialStatus = .unstarted;
FetchingMoreStatus _fetchOlderStatus = .unstarted;
FetchingMoreStatus _fetchNewerStatus = .unstarted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another benefit of this change is that if there's a backoff in one
direction, it will not affect the other one.

Hmm — that sounds like a problem, not a benefit. 🙂 The fetches are all going to the same server, so if one type of fetch has trouble that suggests we should hold off on our next requests, then the same need applies to requests at the other end of the list.

(In fact ideally we'd be sharing backoff information across all requests on a given account. See also #946. But these are potentially some of the more frequent requests, so it's good that we at least share it among these.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More broadly:

But imagine if the number of messages in the initial batch occupies less
than a screenful, and then `fetchOlder` returns no messages or a few
messages that combined with the initial batch messages are still less
than a screenful; in that case, there will be no change in the scroll
metrics to call `fetchNewer`.

This change feels like the wrong layer for solving this problem.

When the fetchOlder returns, that will notify the view-model's listeners, right? What if we have the _MessageListState react to that by looking to see if more fetching is needed, and calling fetchOlder/fetchNewer if so?

IOW, we could have its _modelChanged method call the same logic that's at the end of _handleScrollMetrics. Probably that means pulling that part out as its own smaller helper method.

Comment on lines 967 to 968
} while (visibleMessageCount < kMessageListFetchBatchSize / 2
&& this.generation == generation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the widget state react in the way I suggested in my last comment just now, does that also let us skip these loops? We'd have the state retain responsibility for taking the initiative to call fetchOlder and fetchNewer when potentially needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do that; removed them in the new revision.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 127f690 to c43eaad Compare January 9, 2026 20:28
@sm-sayedi sm-sayedi changed the title msglist: Fetch newer messages despite previous muted batch msglist: Fetch more messages despite previous muted batch Jan 9, 2026
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch 2 times, most recently from 958fe11 to 8c76491 Compare January 9, 2026 21:24
@sm-sayedi
Copy link
Collaborator Author

Thanks @chrisbobbe and @gnprice for the previous reviews. Pushed a new revision, PTAL.

@sm-sayedi sm-sayedi requested a review from chrisbobbe January 9, 2026 21:29
check(messageListItemCount(tester)).equals(501);
});

testWidgets('observe double-fetch glitch', (tester) async {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case is now very similar to the test case above ("basic") because of the new double-fetch glitch in _modelChanged. Should we remove this one then?

check(messageListItemCount(tester)).equals(2 + 2 + 98);
});

testWidgets('mid-history and fetch-older with too few messages, fetch-newer request is made', (tester) async {
Copy link
Collaborator Author

@sm-sayedi sm-sayedi Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for handling an edge case, the handling of which required a lot of changes in the first revision of this PR, before Greg suggested an alternative, straightforward method of solving that problem. I explained that edge case in the description of a commit message, which is not included now.

This change is necessary when there is a need to fetch more messages
in both directions, older and newer, and when fetching in
one direction avoids fetching in another direction at the same time,
because of the `if (busyFetchingMore) return` line in
both `fetchOlder` and `fetchNewer`.

This scenario happens when a conversation is opened in its first unread,
such that the number of messages in the initial batch is so low (because
they're muted in one way or another) that it's already past the certain
point where the scroll metrics listener in the widget code triggers both
`fetchOlder` and `fetchNewer`. In 2025-11, that code first calls
`fetchOlder` then `fetchNewer`, and for the reason mentioned above,
`fetchNewer` will not fetch any new messages. But that's fine, because
as soon as older messages from `fetchOlder` arrives, there will be
a change in the scroll metrics, so `fetchNewer` will be called again,
fetching new messages.

But imagine if the number of messages in the initial batch occupies less
than a screenful, and then `fetchOlder` returns no messages or a few
messages that combined with the initial batch messages are still less
than a screenful; in that case, there will be no change in the scroll
metrics to call `fetchNewer`.

With the three fetch request types being separated, especially the two
request types for older and newer messages, each direction can fetch
more messages independently without interfering with one another.

@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 8c76491 to d0f5c23 Compare January 23, 2026 04:08
Copy link
Collaborator

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Comments below, from reading everything except the tests.

Comment on lines 1036 to 1054
// TODO: This ends up firing a second time shortly after we fetch a batch.
// The result is that each time we decide to fetch a batch, we end up
// fetching two batches in quick succession. This is basically harmless
// but makes things a bit more complicated to reason about.
// The cause seems to be that this gets called again with maxScrollExtent
// still not yet updated to account for the newly-added messages.
// This ends up firing a second time shortly after we fetch a batch while
// there is a fling going on.
// The result is that each time we decide to fetch a batch, we end up
// fetching two batches in quick succession. This is basically harmless
// but makes things a bit more complicated to reason about.
// The cause is that this gets called again with minScrollExtent
// still not yet updated to account for the newly-added messages.
// This relates to how [SchedulerBinding] executes different tasks when
// producing a new frame, like first executing transient callbacks
// (typically ticking animations) followed by persistent callbacks
// (typically the build/layout/paint pipeline), and so on.
// So when there is a new message batch received, the related widgets are
// marked dirty for the next frame. With the ongoing fling, the underlying
// animation registers transient callback(s) for [ScrollPosition.setPixels]
// to be executed in the transient callbacks phase at the start of
// the frame. It will then notify its listeners, eventually calling
// `_scrollChanged` and in turn current method with old minScrollExtent,
// causing the second batch fetch. Then in the persistent callbacks phase,
// minScrollExtent will be updated, effective in the next frame.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msglist [nfc]: Explain the reasoning behind the double-fetch glitch

Ideally we'd fix the glitch, right? (The old comment is a TODO, saying "This is basically harmless but makes things a bit more complicated to reason about.")

I haven't digested the new comment yet (I'm kind of getting lost in it), but can we use the details you've learned to fix the glitch? In that case I think a new issue is a better place to put them, and we can just point the existing TODO to that issue (and fix anything that's wrong; I see you've done s/maxScrollExtent/minScrollExtent/ which might have been intentional).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2104.

return findScrollView(tester).controller;
}

int? messageListItemCount(WidgetTester tester) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msglist test [nfc]: Move a few helpers to the top, for reusability

These are: itemCount, findPlaceholder, and findLoadingIndicator.

Also, rename itemCount to messageListItemCount, to make it easier what
it is about in the upcoming tests.

Commit-message nit: I think are some missing words in that last sentence?

Copy link
Collaborator Author

@sm-sayedi sm-sayedi Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure about the missing words — the phrasing may have been unclear. Reworded it a little; hope that’s better now 🙂

Also, rename itemCount to messageListItemCount, to make it clearer
what it refers to in the upcoming tests.

Comment on lines 148 to 158
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid saying the message is in the narrow when it might not be.

Some nits in this revision:

  • The dartdoc summary line still says the inaccurate thing.
  • "A non-null value for this doesn't always mean [haveOldest] is true." → This is written to prevent a misunderstanding that I think isn't very likely; does that seem right? Maybe instead of this line, we can clarify more directly by making this getter's name more explicit, like oldestFetchedMessageId (which might be helpful in any case).
  • I'd like to remove "that's fine for what this is used for" because it can't be verified except by reading the code that consumes it. (An alternative would be to restrict its allowed usages by adding some words to the dartdoc.)
  • A few wording nits for clarity; see below.
Suggested change
/// The ID of the oldest known message so far in this narrow.
///
/// This will be `null` if no messages of this narrow are fetched yet.
/// Having a non-null value for this doesn't always mean [haveOldest] is `true`.
///
/// The related message may not appear in [messages] because it
/// is muted in one way or another.
int? get oldMessageId => _oldMessageId;
/// The ID of the oldest message fetched so far in this narrow.
///
/// This is used as the anchor for fetching the next batch of older messages
/// and will be `null` if no messages of this narrow have been fetched yet.
///
/// A message with this ID might not appear in [messages]:
/// - The message may be in a muted conversation.
/// - The message may have been moved or deleted after it was fetched.
int? get oldestFetchedMessageId => _oldestFetchedMessageId;

(Similarly for the corresponding newet-fetched-message code.)

Comment on lines 847 to 848
/// This makes sure there is at least one non-muted message fetched; if any.
/// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This makes sure there is at least one non-muted message fetched; if any.
/// It may do so my repeatedly calling [fetchOlder] and [fetchNewer].
/// If the results don't include at least one non-muted message,
/// this will call [fetchOlder] and/or [fetchNewer]
/// until one is found or the narrow's oldest and newest messages are reached.

Comment on lines 130 to 132
/// This also may or may not represent all the message history that
/// conceptually belongs in this narrow because some messages might be
/// muted in one way or another and they may not appear in the message list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
/// This also may or may not represent all the message history that
/// conceptually belongs in this narrow because some messages might be
/// muted in one way or another and they may not appear in the message list.
/// Also, messages may be excluded if they are in muted conversations.

Comment on lines +906 to +911
while (messages.isEmpty && !(haveOldest && haveNewest)) {
await fetchOlder(partOfInitialFetch: true);
if (messages.isNotEmpty) break;
await fetchNewer(partOfInitialFetch: true);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also want to exit the loop if fetchNewer finds some non-muted messages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s already covered by the loop condition — it breaks as soon as messages aren’t empty 🙂 But if that’s good for explicitness, I’ll be happy to add it.

Comment on lines -1001 to +1050
if (this.generation == generation) {
if (this.generation == generation && !partOfInitialFetch) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial-fetch case, if one of these fetch-older/fetch-newer requests fails, I think we still want to use BackoffMachine to delay the next attempt, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing this, I was thinking about using backoff, but then looking at #945 (PR #1050) where the backoff logic was introduced, the motivation there was to avoid the retry storm caused by a failed fetchOlder (or fetchNewer) request in the widgets code, specifically by the listener for the scroll state. In this (initial-fetch) case though, there is no code path (at least as of now) to cause a retry storm. The while loop will break promptly whenever one of the fetch-older or fetch-newer requests fails. So I think handling of this is similar to the case when the initial-fetch request fails. How about adding a TODO pointing to #2085?

Comment on lines 1055 to 1057
// This relates to `_modelChanged` (and in turn the current method) being
// called right away as a listener of the model, before the next frame
// being drawn with the newly-added messages, updating minScrollExtent.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. If we want this condition check to run after a frame is drawn with the newly-added messages, how about putting this model.fetchOlder call in a SchedulerBinding.instance.addPostFrameCallback? I wonder if that might actually be the fix for both double-fetch glitches?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, addPostFrameCallback can avoid these glitches. Fix included in the new revision.

These are: itemCount, findPlaceholder, and findLoadingIndicator.

Also, rename itemCount to messageListItemCount, to make it clearer
what it refers to in the upcoming tests.
@sm-sayedi sm-sayedi force-pushed the 1256-fetch-older-fails branch from 64c0a6d to fb2b549 Compare January 27, 2026 21:17
@sm-sayedi sm-sayedi requested a review from chrisbobbe January 27, 2026 21:24
@sm-sayedi
Copy link
Collaborator Author

Thanks Chris for the review. Pushed new changes, PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer review PR ready for review by Zulip maintainers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

msglist: Fix the double-fetch glitch msglist: Fetch-older is defeated by a run of 100+ muted messages

3 participants