Skip to content

fix(spellcheck): some chatters and emotes not ignored#6780

Merged
pajlada merged 12 commits intoChatterino:masterfrom
4rneee:fix/spellcheck
Jan 31, 2026
Merged

fix(spellcheck): some chatters and emotes not ignored#6780
pajlada merged 12 commits intoChatterino:masterfrom
4rneee:fix/spellcheck

Conversation

@4rneee
Copy link
Copy Markdown
Contributor

@4rneee 4rneee commented Jan 27, 2026

Previously some emotes or usernames that didn't match the wordRegex were not properly ignored. Fixes #6776

Now instead of iterating over the words that match wordRegex we first iterate over all non-whitespace strings and filter out ignored words such as emotes, usernames and links using isIgnoredWord and only apply wordRegex for non-ignored words.

This approach is a bit ugly because of the nested while loops but I didn't find an alternative yet.

isIgnoredWord now also ignores the channel name if the Always include broadcaster in user completions setting is set to have consistency between autocompletion and spellchecking.

@4rneee 4rneee marked this pull request as ready for review January 28, 2026 18:19
@8thony
Copy link
Copy Markdown
Contributor

8thony commented Jan 29, 2026

If you add a dot, comma, or slash right after the name/emote, it does underline red.

And like you mentioned, "test/man" could also be fixed, right?

QRegularExpression outerRegex(R"(\S+)");
// maybe change to R"(\p{L}+)" to allow for more/any seperators in words (would fix #6762)

@4rneee
Copy link
Copy Markdown
Contributor Author

4rneee commented Jan 29, 2026

If you add a dot, comma, or slash right after the name/emote, it does underline red.

Right, I didn't catch that. I first wanted to argue that it's fine for emotes because they won't be rendered but apparently that's not the case for twitch emotes. And for usernames it should definitely be ignored.

And like you mentioned, "test/man" could also be fixed, right?

Not sure how true that claim was. I thought that if we filter out all special cases (usernames, links, emotes, ...) before, then the wordRegex could be changed to fix this. The one I proposed in the comment just matches any unicode letters so even for 1576word/-#@asdf?.$ it would only match "word" and "asdf" but I'm not even sure if that's something we would want.

Comment thread src/widgets/splits/InputHighlighter.cpp Outdated
Comment thread src/widgets/splits/InputHighlighter.cpp Outdated
Comment thread src/widgets/splits/InputHighlighter.cpp Outdated
@4rneee
Copy link
Copy Markdown
Contributor Author

4rneee commented Jan 29, 2026

I changed the wordRegex to (?<=^|(?!_)\p{P})\p{L}+(?=$|(?!_)\p{P}). You can try it here: https://regex101.com/r/OK1L36/1. This now matches unicode letters separated by punctuation.

Now, a token is ignored if it's an emote, chatter or link. If it is not ignored, the regex is applied and words are checked again if they are emotes or chatters. I think this is necessary to handle cases like "user1,user2". However, because the new wordRegex does not match words containing numbers or underscores, we don't have the problem where usernames or emotes are only partially matched (which is also the reason why I had to exclude '_' from the regex).

I think there might still be the very niche edgecase where a third party emote has punctuation in it's name some/emote if someone would then type 'some/emote,otherEmote' some/emote would not be recognized as emote but I'd argue that that's fine.

@8thony
Copy link
Copy Markdown
Contributor

8thony commented Jan 30, 2026

Much better now.

However, because the new wordRegex does not match words containing numbers or underscores

If I understand correctly, that means that every word with a number or an underscore in it is ignored even if it's not a real chatter/emote.

And like you mentioned, "test/man" could also be fixed, right?

Not sure how true that claim was.

It seems like this issue is also fixed.

@4rneee
Copy link
Copy Markdown
Contributor Author

4rneee commented Jan 30, 2026

If I understand correctly, that means that every word with a number or an underscore in it is ignored even if it's not a real chatter/emote.

Yes. From my experience firefox does this too for numbers. The underscores were necessary to not partially match usernames.

And like you mentioned, "test/man" could also be fixed, right?

Not sure how true that claim was.

It seems like this issue is also fixed.

Yes, with the new regex thats fixed too.

@4rneee
Copy link
Copy Markdown
Contributor Author

4rneee commented Jan 31, 2026

Which of the InputHighlighterTest.getSpellCheckedWords tests from #6779 can I change to match my new implementation? My implementation fixes the ones marked as FIXME but now fails word?word and word-word.

Since I changed the wordRegex all of InputHighlight.wordRegex would probably need to be rewritten.

@pajlada
Copy link
Copy Markdown
Member

pajlada commented Jan 31, 2026

Which of the InputHighlighterTest.getSpellCheckedWords tests from #6779 can I change to match my new implementation? My implementation fixes the ones marked as FIXME but now fails word?word and word-word.

Since I changed the wordRegex all of InputHighlight.wordRegex would probably need to be rewritten.

Yeah, go for it if this aims to improve things.
The tests are great because it can document our shortcomings for future fixes

we can do this becasue usernames either match the wordregex or are
ignored due to containing numbers or underscores
{.input = "word?word", .words = {"word?word"}},
{.input = "word-word", .words = {"word-word"}},
{.input = "word?word", .words = {"word", "word"}},
{.input = "word-word", .words = {"word", "word"}},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect, if you can try to fix this in this PR that would be nice

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I change the regex to (?<=^|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd})\p{Pd}?((?:\p{L}\p{Pd}?)*\p{L})\p{Pd}?(?=$|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd}) (https://regex101.com/r/OK1L36/3) this would work but is pretty unreadable.

A patch with this regex would be:

diff --git a/src/widgets/splits/InputHighlighter.cpp b/src/widgets/splits/InputHighlighter.cpp
index 7e350277..b1ab035a 100644
--- a/src/widgets/splits/InputHighlighter.cpp
+++ b/src/widgets/splits/InputHighlighter.cpp
@@ -101,7 +101,7 @@ namespace inputhighlight::detail {
 QRegularExpression wordRegex()
 {
     static QRegularExpression regex{
-        R"((?<=^|(?!_)\p{P})\p{L}+(?=$|(?!_)\p{P}))",
+        R"((?<=^|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd})\p{Pd}?((?:\p{L}\p{Pd}?)*\p{L})\p{Pd}?(?=$|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd}))",
         QRegularExpression::PatternOption::UseUnicodePropertiesOption,
     };
     return regex;
@@ -182,7 +182,7 @@ void InputHighlighter::visitWords(
         while (wordIt.hasNext())
         {
             auto wordMatch = wordIt.next();
-            auto word = wordMatch.captured();
+            auto word = wordMatch.captured(1);
 
             if (!isIgnoredWord(channel, word))
             {
diff --git a/tests/src/InputHighlighter.cpp b/tests/src/InputHighlighter.cpp
index 4fb00fb5..e6e38af7 100644
--- a/tests/src/InputHighlighter.cpp
+++ b/tests/src/InputHighlighter.cpp
@@ -186,7 +186,7 @@ TEST_F(InputHighlighterTest, getSpellCheckedWords)
         {.input = "   word word  ", .words = {"word", "word"}},
         {.input = "word?", .words = {"word"}},
         {.input = "word?word", .words = {"word", "word"}},
-        {.input = "word-word", .words = {"word", "word"}},
+        {.input = "word-word", .words = {"word-word"}},
         {
             .input = "channel emotes 7TVEmote a BTTVEmote b FFZEmote c",
             .words = {"channel", "emotes", "a", "b", "c"},
@@ -265,6 +265,8 @@ TEST(InputHighlight, wordRegex)
          .words = {u"abc", u"foo", u"bar", u"baz"}},
         {.input = u"1234567,word/a123", .words = {u"word"}},
         {.input = u"'quotes\"", .words = {u"quotes"}},
+        {.input = u"word-word-", .words = {u"word-word"}},
+        {.input = u"-word--word", .words = {u"word", u"word"}},
     };
 
     auto re = inputhighlight::detail::wordRegex();
@@ -275,7 +277,7 @@ TEST(InputHighlight, wordRegex)
         auto match = re.globalMatchView(c.input);
         while (match.hasNext())
         {
-            got.emplace_back(match.next().capturedView());
+            got.emplace_back(match.next().capturedView(1));
         }
         ASSERT_EQ(got, c.words) << "index=" << i;
     }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to be this complex, it's probably better to write a manual loop similar to what Firefox does in the referenced function (in hopes that's more readable).

imo, doesn't have to be in this PR: We can keep the current state, make sure that it's properly tested, and then replace it with a manual loop.

Comment thread tests/src/InputHighlighter.cpp
@pajlada pajlada merged commit 8f660b6 into Chatterino:master Jan 31, 2026
19 checks passed
@4rneee 4rneee deleted the fix/spellcheck branch April 4, 2026 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spellcheck not consistently ignoring usernames

4 participants