Skip to content

Mute list fallback to cache if timed out#5269

Open
DarlCat wants to merge 8 commits intosecondlife:developfrom
DarlCat:bugfix/blocklist-fail-recover
Open

Mute list fallback to cache if timed out#5269
DarlCat wants to merge 8 commits intosecondlife:developfrom
DarlCat:bugfix/blocklist-fail-recover

Conversation

@DarlCat
Copy link
Copy Markdown
Contributor

@DarlCat DarlCat commented Jan 14, 2026

Description

This PR aims to reduce user impact of unsent or dropped simulator messages during login regarding the mute list. The current behavior when this occurs is to have an empty mute list, and not fallback to cache even if it is present. This PR updates this behavior in the following ways:

  • fall back to the cached mute list on request timeout or transfer failure so the list never stays empty if we can help it
  • retry mute list requests on region change with a short cool-down to avoid being too noisy, if still in timeout window
  • cancel pending timeout once the list loads to stop unnecessary potential fallback behaviors
  • push an update to the blocked list people floater when a cached or deferred load populates the list
  • notify the user of a forced cache fallback because recent changes may be missing

Related Issues

https://feedback.secondlife.com/bug-reports/p/block-list-empties


Checklist

Please ensure the following before requesting review:

  • I have provided a clear title and detailed description for this pull request.
  • If useful, I have included media such as screenshots and video to show off my changes.
  • The PR is linked to a relevant issue with sufficient context.
  • I have tested the changes locally and verified they work as intended.
  • All new and existing tests pass.
  • Code follows the project's style guidelines.
  • Documentation has been updated if needed.
  • Any dependent changes have been merged and published in downstream modules
  • I have reviewed the contributing guidelines.

Additional Notes

I've tested this locally with success, however I am not able to test all platforms. Bugs based on packet loss are a challenge to reproduce so my testing was based on artificially induced drops. I do not think that this is a perfect solution as the existing comments in code suggest this would ideally be moved to a capability, but this is intended keep people from too much hardship until that more comprehensive effort can be prioritized.

Very much open to feedback and willing to make reasonable requested changes.

@github-actions github-actions bot added the c/cpp label Jan 14, 2026
return;
}
s_notified = true;
LLNotificationsUtil::add("MuteListFallbackCache");
Copy link
Copy Markdown
Contributor

@akleshchev akleshchev Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be shown to a user. At least in some cases it's a bug that needs to be fixed server side in others viewer should rerequest.

If viewer timeouts, got an empty list or errors getting the list, you probably can do something about it via MuteCRC field. But I agree that if something got wrong we at least should get the data from cache.

Or read cache first (issues with this approach if outdated?), mark as 'no send', apply server response on top...

P.S. Related: #4267

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be shown to a user.

I added a notification because I felt the user impact of potentially missing recent uncached mutes that the server could be aware of that we are not could be highly disruptive to the user. In a perfect world they should never see the notification because the replies to MuteListRequest are expected to arrive, but the user complaints about this I have seen are describing a great level of distress over harassment the mute list should prevent.

The notification will absolutely be removed if it's decided to not be acceptable. I felt it was appropriate at the time given the potential impact of missing mutes.

If viewer timeouts, got an empty list or errors getting the list, you probably can do something about it via MuteCRC field. But I agree that if something got wrong we at least should get the data from cache.

That is my thought, because for some unknown reason these messages are being lost or remain unsent at random for some users.

Or read cache first (issues with this approach if outdated?), mark as 'no send', apply server response on top...

The UpdateMuteListEntry and RemoveMuteListEntry messages both refer to mutes by their ID or name, so as far as I can tell there should be no negative impact on the server mute list if the user moves forward with a session of adding/removing mutes based on their cached copy. The client would just be missing the local record of what is on the server.

What I considered when deciding not to go for the cached data first, then layer on the server once (if?) we get it was the merging logic. If we were to load from cache first and later learn of a newer server version, it is not guaranteed to be reconcilable with what we have in our cache.

For merging the server list with the cached list, the assumptions I would have to make could potentially result in unintended re-mutes or unintended unmutes based on which we determine to be the correct state of a mute that is present vs not or different in one list compared to the other.


I was very anxious to repeatedly request a mute list from the region because I do not want to generate undue load with loops. There are two timeouts/cooldowns in place within my PR

  1. The existing conceptual 30 second timeout, after which for the duration of the session I completely rely on the cached mute list and stop caring about what the server may ultimately come back with to avoid merging or recreating the mute list, and stop any effort to re-request an update
  2. The 5 second cooldown between MuteListRequest dispatch attempts, which can be triggered by region change. This was to prevent asking every region an agent may be passing through for a mute list that they may not stay around long enough to receive.

Copy link
Copy Markdown
Contributor

@akleshchev akleshchev Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for some unknown reason these messages are being lost or remain unsent at random for some users.

I will create a server ticket for that. I know at least one case with a repro where server isn't responding yet should.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was very anxious to repeatedly request a mute list from the region because I do not want to generate undue load with loops.

Makes sense. But region change can end up requesting indefinetely either way. Better add some kind of retry limit there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for some unknown reason these messages are being lost

SendReliable supports callbacks. Might be possible to refine this by detecting send failures to log better and to rerequest on failures.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements fallback mechanisms for mute list loading to improve reliability when simulator messages are lost or delayed during login. The changes add timeout-based cache fallback, retry logic on region changes, and better UI updates when cached data is used.

Changes:

  • Added timeout mechanism (30 seconds) to fall back to cached mute list if server request doesn't respond
  • Implemented retry logic on region changes with 5-second cooldown to avoid request spam
  • Added user notification when falling back to cached data
  • Enhanced blocked list UI to refresh when cached or deferred data loads

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
indra/newview/skins/default/xui/en/notifications.xml Added new notification to inform users when cached mute list is used due to server communication issues
indra/newview/llmutelist.h Added timer management, region change callback, and helper methods to support timeout and retry mechanisms
indra/newview/llmutelist.cpp Implemented timeout handling, cache fallback logic, region change retry, timer cleanup, and refactored cache loading
indra/newview/llblocklist.h Updated method signatures with override keywords for better code quality
indra/newview/llblocklist.cpp Implemented onChange() to detect and handle transitions from empty to populated mute list

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@akleshchev akleshchev linked an issue Jan 20, 2026 that may be closed by this pull request
@akleshchev
Copy link
Copy Markdown
Contributor

@Geenz What do you think about this one?
I think notification is not needed, but a fallback in case server failed to respond or caps failed is needed as a mitigation.

P.S. Created a server ticket, because that's where the problem comes from at least in some cases https://github.com/secondlife/server/issues/2305

@DarlCat
Copy link
Copy Markdown
Contributor Author

DarlCat commented Jan 20, 2026

I plan to be making an update to this PR that addresses feedback given, so unless the exact state it's in now becomes satisfactory it may be best set to draft.

There's excellent feedback here, and my mind is personally already changed that a notification isn't ideal for this when we can fail-safe and never alert or scare the user.

I'm working on an implementation for this PR that expands the state machine design already in place to integrate graceful retry logic, and use of the cache as a starting point rather than an eventual fallback. This approach makes a notification unnecessary in my opinion.

@akleshchev
Copy link
Copy Markdown
Contributor

akleshchev commented Jan 21, 2026

Thank you! Moved to draft.

@akleshchev akleshchev marked this pull request as draft January 21, 2026 17:06
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

This pull request is stale because it has been open 30 days with no activity. Remove stale label or comment or it will be closed in 7 days

@github-actions github-actions bot added the stale label Mar 5, 2026
@akleshchev akleshchev removed the stale label Mar 5, 2026
…mehow isn't received

Signed-off-by: Darl <me@darl.cat>
@DarlCat DarlCat force-pushed the bugfix/blocklist-fail-recover branch from 1f18378 to 6f224ca Compare April 1, 2026 16:41
@DarlCat
Copy link
Copy Markdown
Contributor Author

DarlCat commented Apr 1, 2026

Sat down after a break away with a fresh mindset and blank slate, feeling a lot better about the approach and behaviors this time. Thanks for keeping this around!

@DarlCat DarlCat marked this pull request as ready for review April 1, 2026 16:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 95 to 97
LLMuteList::getInstance()->setLoaded(LLMuteList::MLS_SERVER_EMPTY);
return true;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "emptymutelist" dispatch now only marks the list as loaded (source=server-empty) but does not clear any existing entries. If a cache fallback was loaded earlier (e.g., after a request timeout) and the server message arrives late, this will incorrectly keep stale cached mutes even though the server is explicitly saying the list is empty. Consider clearing mMutes/mLegacyMutes (and possibly writing an empty cache) before calling setLoaded(MLS_SERVER_EMPTY).

Suggested change
LLMuteList::getInstance()->setLoaded(LLMuteList::MLS_SERVER_EMPTY);
return true;
}
LLMuteList* mute_list = LLMuteList::getInstance();
if (mute_list)
{
// The server explicitly reports an empty mute list: clear any
// locally cached mutes (including legacy mutes) and ensure the
// cache reflects an empty list before marking as loaded.
mute_list->clear();
mute_list->setLoaded(LLMuteList::MLS_SERVER_EMPTY);
}
return true;
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. There is a case where user already did some changes this session, but probably not worth accounting for.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on not worth accounting for, I operate on the belief that add()/remove() calls made in a degraded state still get received and accounted for by the simulator in lieu of maintaining some kind of clean/dirty transactional mess like I had thought up prior.

DarlCat added 3 commits April 2, 2026 10:26
… in a cache fallback status.

This yields to the simulator's responsibility as source-of-truth.

e.g. Bob has Alice blocked for a while across all his devices
1. Bob unblocked Alice on his laptop
2. Bob logs in on his desktop with a cached mutelist
3. LLDispatchEmptyMuteList fires
4. LLMuteList becomes eventually-correct, reflecting most recent signaled user intent

Signed-off-by: Darl <me@darl.cat>
getLoadFailed -> updateLoadState
- No longer labeled as a plain getter, but instead as a state machine advancement point
- This name reflects its role in advancing the state according to design parameters when called from the idle loop
- Call site in LLIMProcessing::requestOfflineMessages simplified by internalizing our readiness checks

isFailed
- Reintroduced const to match isLoaded for determining state

Signed-off-by: Darl <me@darl.cat>
… set state to guard against a possible cache write to disk to be safe

Signed-off-by: Darl <me@darl.cat>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +837 to +838
mLoadState = ML_LOADED;
mLoadSource = MLS_FALLBACK_CACHE;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the gDisconnected path, requestFromServer() marks the mute list as ML_LOADED with source=MLS_FALLBACK_CACHE, but it never actually loads the cached mute list. This can leave the session with an empty mute list even when a cache file exists, which contradicts the PR goal of avoiding an empty list. Consider calling tryLoadCacheFallback(agent_id, "disconnected") (or otherwise loading the cache) instead of forcing ML_LOADED here.

Suggested change
mLoadState = ML_LOADED;
mLoadSource = MLS_FALLBACK_CACHE;
tryLoadCacheFallback(agent_id, "disconnected");

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By setting the load state as loaded and source as cache, we're putting the state of the list in one that will not change again because it is loaded, and that won't be written to disk from cache(). I believe that calling tryLoadCacheFallback() in this code path unnecessary since we will no longer be handling messages from a server so the block list is no longer important to have in a reliable state.

Yes this is don't and say we did situation, but it's a safe lie in this case.

Comment on lines 217 to +233
@@ -221,9 +225,81 @@ bool LLMuteList::getLoadFailed() const
constexpr F64 WAIT_SECONDS = 30;
if (mRequestStartTime + WAIT_SECONDS < LLTimer::getTotalSeconds())
{
return true;
LL_WARNS() << "Mute list request timed out; trying cache fallback once" << LL_ENDL;
tryLoadCacheFallback(gAgent.getID(), "request timeout");
return isLoaded() || isFailed();
}
}
return false;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description mentions retrying mute list requests on region change with a short cooldown, but updateLoadState() currently only performs a one-time cache fallback after a fixed timeout and does not trigger any resend/retry behavior on region change. If retries on region transition are still intended, the retry hook/timer logic seems to be missing; otherwise consider updating the PR title/description to avoid implying behavior that isn't present.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting the state machine on region change is still on the table, if maintainers think this is worthwhile still I'd be happy to restore this to the concept. I omitted this from this reimplementation to lighten the load/diff, and leave the real fix to the server team as was noted in the existing comments in code.

I should've changed the PR title to reflect that, and will. I forget I can do that 😆

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance a rerequest on a region change will fix the issue? From my observation it seemed like if server did not respond, it will keep it that way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is actually a chance a server request retry on region change could yield a successful request response. I have observed this to actually work a while back, during the few times I was able to reproduce this issue for testing.

This would would add a form of coverage for the currently unaddressed case of a fresh login without a cache being unlucky and also not getting a successful mute list response. I do agree though, if the login simulator did not respond it is not likely to do so if retried. I think to be fruitful this would only be attempted on a region change.

I recognize that at this point it's not a lot of extra work to wire back up a basic callback on region change to send another request to the new region, but I don't see the point of doing it every time. Only one extra request should do because in the reports of this that I've seen, this issue is often most prominent when logging into a busy/crowded region.

If the login simulator request fails, AND the next region also fails, it's likely there's something else going on and we're not going to get anything out of repeated attempts past that.

@DarlCat DarlCat changed the title Mute list fallback to cache + retry on region change if still pending Mute list fallback to cache if timed out Apr 2, 2026
void LLMuteList::setLoaded(EMuteListSource source)
{
mLoadState = ML_LOADED;
mLoadSource = source;
Copy link
Copy Markdown
Contributor

@akleshchev akleshchev Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a warns if we are setting ML_LOADED when it was already in the same state. It would be good for diagnosis to see potential 'overwrite' in logs.

{
LLMuteList::getInstance()->setLoaded(LLMuteList::MLS_SERVER_EMPTY);
LLMuteList* mute_list = LLMuteList::getInstance();
mute_list->clearCachedMutes();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment about the purpose of the 'clear'. That we received list from server, might have something from cache, but server takes priority?

DarlCat added 4 commits April 3, 2026 21:30
This allows the UI to update if already initialized before fallback or load occurs

Signed-off-by: Darl <me@darl.cat>
…er strings to denote loading or failed states

Signed-off-by: Darl <me@darl.cat>
@DarlCat
Copy link
Copy Markdown
Contributor Author

DarlCat commented Apr 4, 2026

After the last few comments I got to work addressing the feedback and during testing determined the current opaqueness of the UI to the state of the mutelist sucks from a user perspective. I made a very small change to the UI for this to avoid confusing users who may notice the mutelist in a loading or failed state.

❗ The UI improvements did require an adjustment to the observer configuration in LLBlockList, which I am not the most confident about, but in my testing it functions perfectly so 🤷‍♂️ it can't be that bad. Please give that extra attention in review though so I can improve it if possible. If you'd rather handle UI changes or improvements for this internally, I'm happy to drop the related changes from the PR.

This latest batch of changes also introduces the mentioned single retry on region change, guarded both by a bool check and the disconnect of the region change callback itself upon completion. Don't need it anymore anyway. Happy to drop this commit as well if it's determined you don't want this, just putting it on the table for consideration 😄


Before screenshots based on develop prior to my commits:
Is this a loading mutelist? Is it actually empty? Who knows!
Screenshot 2026-04-04 at 9 07 45 AM
Screenshot 2026-04-04 at 9 07 53 AM

After screenshots:
I couldn't help myself, if someone looks at this after logging in and being unlucky to be awaiting timeout this is going to put them at ease that something is working in the background to load the data
Screenshot 2026-04-03 at 10 30 42 PM

Presents list once loaded in some form, no user facing indication of if it's cache or authoritative from server so as to avoid unnecessary concern or panic, and especially unnecessary support cases
Screenshot 2026-04-03 at 10 31 02 PM

Also adds text for empty mute list, which I thought weird since it it's common to have
Screenshot 2026-04-03 at 10 35 02 PM

In the unlikely event that the cache isn't working, AND the simulator messages get lost somewhere we do show a failed to load state. Think it might be good in this case to make the user aware so y'all can get logs via support / canny and let the new logging in the last batch of changes do some work to trace the issue down.
Screenshot 2026-04-04 at 9 00 33 AM

I only know English, but can confirm that it accepts xui translations
Screenshot 2026-04-03 at 10 40 13 PM

Have a good weekend!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Block list empties

3 participants