Mute list fallback to cache if timed out#5269
Mute list fallback to cache if timed out#5269DarlCat wants to merge 8 commits intosecondlife:developfrom
Conversation
indra/newview/llmutelist.cpp
Outdated
| return; | ||
| } | ||
| s_notified = true; | ||
| LLNotificationsUtil::add("MuteListFallbackCache"); |
There was a problem hiding this comment.
I don't think this should be shown to a user. At least in some cases it's a bug that needs to be fixed server side in others viewer should rerequest.
If viewer timeouts, got an empty list or errors getting the list, you probably can do something about it via MuteCRC field. But I agree that if something got wrong we at least should get the data from cache.
Or read cache first (issues with this approach if outdated?), mark as 'no send', apply server response on top...
P.S. Related: #4267
There was a problem hiding this comment.
I don't think this should be shown to a user.
I added a notification because I felt the user impact of potentially missing recent uncached mutes that the server could be aware of that we are not could be highly disruptive to the user. In a perfect world they should never see the notification because the replies to MuteListRequest are expected to arrive, but the user complaints about this I have seen are describing a great level of distress over harassment the mute list should prevent.
The notification will absolutely be removed if it's decided to not be acceptable. I felt it was appropriate at the time given the potential impact of missing mutes.
If viewer timeouts, got an empty list or errors getting the list, you probably can do something about it via MuteCRC field. But I agree that if something got wrong we at least should get the data from cache.
That is my thought, because for some unknown reason these messages are being lost or remain unsent at random for some users.
Or read cache first (issues with this approach if outdated?), mark as 'no send', apply server response on top...
The UpdateMuteListEntry and RemoveMuteListEntry messages both refer to mutes by their ID or name, so as far as I can tell there should be no negative impact on the server mute list if the user moves forward with a session of adding/removing mutes based on their cached copy. The client would just be missing the local record of what is on the server.
What I considered when deciding not to go for the cached data first, then layer on the server once (if?) we get it was the merging logic. If we were to load from cache first and later learn of a newer server version, it is not guaranteed to be reconcilable with what we have in our cache.
For merging the server list with the cached list, the assumptions I would have to make could potentially result in unintended re-mutes or unintended unmutes based on which we determine to be the correct state of a mute that is present vs not or different in one list compared to the other.
I was very anxious to repeatedly request a mute list from the region because I do not want to generate undue load with loops. There are two timeouts/cooldowns in place within my PR
- The existing conceptual 30 second timeout, after which for the duration of the session I completely rely on the cached mute list and stop caring about what the server may ultimately come back with to avoid merging or recreating the mute list, and stop any effort to re-request an update
- The 5 second cooldown between
MuteListRequestdispatch attempts, which can be triggered by region change. This was to prevent asking every region an agent may be passing through for a mute list that they may not stay around long enough to receive.
There was a problem hiding this comment.
because for some unknown reason these messages are being lost or remain unsent at random for some users.
I will create a server ticket for that. I know at least one case with a repro where server isn't responding yet should.
There was a problem hiding this comment.
I was very anxious to repeatedly request a mute list from the region because I do not want to generate undue load with loops.
Makes sense. But region change can end up requesting indefinetely either way. Better add some kind of retry limit there.
There was a problem hiding this comment.
because for some unknown reason these messages are being lost
SendReliable supports callbacks. Might be possible to refine this by detecting send failures to log better and to rerequest on failures.
There was a problem hiding this comment.
Pull request overview
This pull request implements fallback mechanisms for mute list loading to improve reliability when simulator messages are lost or delayed during login. The changes add timeout-based cache fallback, retry logic on region changes, and better UI updates when cached data is used.
Changes:
- Added timeout mechanism (30 seconds) to fall back to cached mute list if server request doesn't respond
- Implemented retry logic on region changes with 5-second cooldown to avoid request spam
- Added user notification when falling back to cached data
- Enhanced blocked list UI to refresh when cached or deferred data loads
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| indra/newview/skins/default/xui/en/notifications.xml | Added new notification to inform users when cached mute list is used due to server communication issues |
| indra/newview/llmutelist.h | Added timer management, region change callback, and helper methods to support timeout and retry mechanisms |
| indra/newview/llmutelist.cpp | Implemented timeout handling, cache fallback logic, region change retry, timer cleanup, and refactored cache loading |
| indra/newview/llblocklist.h | Updated method signatures with override keywords for better code quality |
| indra/newview/llblocklist.cpp | Implemented onChange() to detect and handle transitions from empty to populated mute list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@Geenz What do you think about this one? P.S. Created a server ticket, because that's where the problem comes from at least in some cases https://github.com/secondlife/server/issues/2305 |
|
I plan to be making an update to this PR that addresses feedback given, so unless the exact state it's in now becomes satisfactory it may be best set to draft. There's excellent feedback here, and my mind is personally already changed that a notification isn't ideal for this when we can fail-safe and never alert or scare the user. I'm working on an implementation for this PR that expands the state machine design already in place to integrate graceful retry logic, and use of the cache as a starting point rather than an eventual fallback. This approach makes a notification unnecessary in my opinion. |
|
Thank you! Moved to draft. |
|
This pull request is stale because it has been open 30 days with no activity. Remove stale label or comment or it will be closed in 7 days |
…mehow isn't received Signed-off-by: Darl <me@darl.cat>
1f18378 to
6f224ca
Compare
|
Sat down after a break away with a fresh mindset and blank slate, feeling a lot better about the approach and behaviors this time. Thanks for keeping this around! |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
indra/newview/llmutelist.cpp
Outdated
| LLMuteList::getInstance()->setLoaded(LLMuteList::MLS_SERVER_EMPTY); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The "emptymutelist" dispatch now only marks the list as loaded (source=server-empty) but does not clear any existing entries. If a cache fallback was loaded earlier (e.g., after a request timeout) and the server message arrives late, this will incorrectly keep stale cached mutes even though the server is explicitly saying the list is empty. Consider clearing mMutes/mLegacyMutes (and possibly writing an empty cache) before calling setLoaded(MLS_SERVER_EMPTY).
| LLMuteList::getInstance()->setLoaded(LLMuteList::MLS_SERVER_EMPTY); | |
| return true; | |
| } | |
| LLMuteList* mute_list = LLMuteList::getInstance(); | |
| if (mute_list) | |
| { | |
| // The server explicitly reports an empty mute list: clear any | |
| // locally cached mutes (including legacy mutes) and ensure the | |
| // cache reflects an empty list before marking as loaded. | |
| mute_list->clear(); | |
| mute_list->setLoaded(LLMuteList::MLS_SERVER_EMPTY); | |
| } | |
| return true; | |
| } |
There was a problem hiding this comment.
Makes sense. There is a case where user already did some changes this session, but probably not worth accounting for.
There was a problem hiding this comment.
Agreed on not worth accounting for, I operate on the belief that add()/remove() calls made in a degraded state still get received and accounted for by the simulator in lieu of maintaining some kind of clean/dirty transactional mess like I had thought up prior.
… in a cache fallback status. This yields to the simulator's responsibility as source-of-truth. e.g. Bob has Alice blocked for a while across all his devices 1. Bob unblocked Alice on his laptop 2. Bob logs in on his desktop with a cached mutelist 3. LLDispatchEmptyMuteList fires 4. LLMuteList becomes eventually-correct, reflecting most recent signaled user intent Signed-off-by: Darl <me@darl.cat>
getLoadFailed -> updateLoadState - No longer labeled as a plain getter, but instead as a state machine advancement point - This name reflects its role in advancing the state according to design parameters when called from the idle loop - Call site in LLIMProcessing::requestOfflineMessages simplified by internalizing our readiness checks isFailed - Reintroduced const to match isLoaded for determining state Signed-off-by: Darl <me@darl.cat>
… set state to guard against a possible cache write to disk to be safe Signed-off-by: Darl <me@darl.cat>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mLoadState = ML_LOADED; | ||
| mLoadSource = MLS_FALLBACK_CACHE; |
There was a problem hiding this comment.
In the gDisconnected path, requestFromServer() marks the mute list as ML_LOADED with source=MLS_FALLBACK_CACHE, but it never actually loads the cached mute list. This can leave the session with an empty mute list even when a cache file exists, which contradicts the PR goal of avoiding an empty list. Consider calling tryLoadCacheFallback(agent_id, "disconnected") (or otherwise loading the cache) instead of forcing ML_LOADED here.
| mLoadState = ML_LOADED; | |
| mLoadSource = MLS_FALLBACK_CACHE; | |
| tryLoadCacheFallback(agent_id, "disconnected"); |
There was a problem hiding this comment.
By setting the load state as loaded and source as cache, we're putting the state of the list in one that will not change again because it is loaded, and that won't be written to disk from cache(). I believe that calling tryLoadCacheFallback() in this code path unnecessary since we will no longer be handling messages from a server so the block list is no longer important to have in a reliable state.
Yes this is don't and say we did situation, but it's a safe lie in this case.
| @@ -221,9 +225,81 @@ bool LLMuteList::getLoadFailed() const | |||
| constexpr F64 WAIT_SECONDS = 30; | |||
| if (mRequestStartTime + WAIT_SECONDS < LLTimer::getTotalSeconds()) | |||
| { | |||
| return true; | |||
| LL_WARNS() << "Mute list request timed out; trying cache fallback once" << LL_ENDL; | |||
| tryLoadCacheFallback(gAgent.getID(), "request timeout"); | |||
| return isLoaded() || isFailed(); | |||
| } | |||
| } | |||
| return false; | |||
There was a problem hiding this comment.
The PR title/description mentions retrying mute list requests on region change with a short cooldown, but updateLoadState() currently only performs a one-time cache fallback after a fixed timeout and does not trigger any resend/retry behavior on region change. If retries on region transition are still intended, the retry hook/timer logic seems to be missing; otherwise consider updating the PR title/description to avoid implying behavior that isn't present.
There was a problem hiding this comment.
Restarting the state machine on region change is still on the table, if maintainers think this is worthwhile still I'd be happy to restore this to the concept. I omitted this from this reimplementation to lighten the load/diff, and leave the real fix to the server team as was noted in the existing comments in code.
I should've changed the PR title to reflect that, and will. I forget I can do that 😆
There was a problem hiding this comment.
Is there a chance a rerequest on a region change will fix the issue? From my observation it seemed like if server did not respond, it will keep it that way.
There was a problem hiding this comment.
Yes there is actually a chance a server request retry on region change could yield a successful request response. I have observed this to actually work a while back, during the few times I was able to reproduce this issue for testing.
This would would add a form of coverage for the currently unaddressed case of a fresh login without a cache being unlucky and also not getting a successful mute list response. I do agree though, if the login simulator did not respond it is not likely to do so if retried. I think to be fruitful this would only be attempted on a region change.
I recognize that at this point it's not a lot of extra work to wire back up a basic callback on region change to send another request to the new region, but I don't see the point of doing it every time. Only one extra request should do because in the reports of this that I've seen, this issue is often most prominent when logging into a busy/crowded region.
If the login simulator request fails, AND the next region also fails, it's likely there's something else going on and we're not going to get anything out of repeated attempts past that.
| void LLMuteList::setLoaded(EMuteListSource source) | ||
| { | ||
| mLoadState = ML_LOADED; | ||
| mLoadSource = source; |
There was a problem hiding this comment.
Please add a warns if we are setting ML_LOADED when it was already in the same state. It would be good for diagnosis to see potential 'overwrite' in logs.
indra/newview/llmutelist.cpp
Outdated
| { | ||
| LLMuteList::getInstance()->setLoaded(LLMuteList::MLS_SERVER_EMPTY); | ||
| LLMuteList* mute_list = LLMuteList::getInstance(); | ||
| mute_list->clearCachedMutes(); |
There was a problem hiding this comment.
Please add comment about the purpose of the 'clear'. That we received list from server, might have something from cache, but server takes priority?
Signed-off-by: Darl <me@darl.cat>
This allows the UI to update if already initialized before fallback or load occurs Signed-off-by: Darl <me@darl.cat>
…er strings to denote loading or failed states Signed-off-by: Darl <me@darl.cat>
…change Signed-off-by: Darl <me@darl.cat>







Description
This PR aims to reduce user impact of unsent or dropped simulator messages during login regarding the mute list. The current behavior when this occurs is to have an empty mute list, and not fallback to cache even if it is present. This PR updates this behavior in the following ways:
Related Issues
https://feedback.secondlife.com/bug-reports/p/block-list-empties
Checklist
Please ensure the following before requesting review:
Additional Notes
I've tested this locally with success, however I am not able to test all platforms. Bugs based on packet loss are a challenge to reproduce so my testing was based on artificially induced drops. I do not think that this is a perfect solution as the existing comments in code suggest this would ideally be moved to a capability, but this is intended keep people from too much hardship until that more comprehensive effort can be prioritized.
Very much open to feedback and willing to make reasonable requested changes.