fix(spellcheck): some chatters and emotes not ignored by 4rneee · Pull Request #6780 · Chatterino/chatterino2

4rneee · 2026-01-27T23:12:55Z

Previously some emotes or usernames that didn't match the wordRegex were not properly ignored. Fixes #6776

Now instead of iterating over the words that match wordRegex we first iterate over all non-whitespace strings and filter out ignored words such as emotes, usernames and links using isIgnoredWord and only apply wordRegex for non-ignored words.

This approach is a bit ugly because of the nested while loops but I didn't find an alternative yet.

isIgnoredWord now also ignores the channel name if the Always include broadcaster in user completions setting is set to have consistency between autocompletion and spellchecking.

still needs cleanup

8thony · 2026-01-29T16:04:57Z

If you add a dot, comma, or slash right after the name/emote, it does underline red.

And like you mentioned, "test/man" could also be fixed, right?

chatterino2/src/widgets/splits/InputHighlighter.cpp

Lines 111 to 112 in 093aa99

    
           QRegularExpression outerRegex(R"(\S+)"); 
        
           // maybe change to  R"(\p{L}+)" to allow for more/any seperators in words (would fix #6762)

4rneee · 2026-01-29T17:15:14Z

If you add a dot, comma, or slash right after the name/emote, it does underline red.

Right, I didn't catch that. I first wanted to argue that it's fine for emotes because they won't be rendered but apparently that's not the case for twitch emotes. And for usernames it should definitely be ignored.

And like you mentioned, "test/man" could also be fixed, right?

Not sure how true that claim was. I thought that if we filter out all special cases (usernames, links, emotes, ...) before, then the wordRegex could be changed to fix this. The one I proposed in the comment just matches any unicode letters so even for 1576word/-#@asdf?.$ it would only match "word" and "asdf" but I'm not even sure if that's something we would want.

4rneee · 2026-01-29T20:29:28Z

I changed the wordRegex to (?<=^|(?!_)\p{P})\p{L}+(?=$|(?!_)\p{P}). You can try it here: https://regex101.com/r/OK1L36/1. This now matches unicode letters separated by punctuation.

Now, a token is ignored if it's an emote, chatter or link. If it is not ignored, the regex is applied and words are checked again if they are emotes or chatters. I think this is necessary to handle cases like "user1,user2". However, because the new wordRegex does not match words containing numbers or underscores, we don't have the problem where usernames or emotes are only partially matched (which is also the reason why I had to exclude '_' from the regex).

I think there might still be the very niche edgecase where a third party emote has punctuation in it's name some/emote if someone would then type 'some/emote,otherEmote' some/emote would not be recognized as emote but I'd argue that that's fine.

8thony · 2026-01-30T15:37:33Z

Much better now.

However, because the new wordRegex does not match words containing numbers or underscores

If I understand correctly, that means that every word with a number or an underscore in it is ignored even if it's not a real chatter/emote.

And like you mentioned, "test/man" could also be fixed, right?

Not sure how true that claim was.

It seems like this issue is also fixed.

4rneee · 2026-01-30T16:30:25Z

If I understand correctly, that means that every word with a number or an underscore in it is ignored even if it's not a real chatter/emote.

Yes. From my experience firefox does this too for numbers. The underscores were necessary to not partially match usernames.

And like you mentioned, "test/man" could also be fixed, right?

Not sure how true that claim was.

It seems like this issue is also fixed.

Yes, with the new regex thats fixed too.

4rneee · 2026-01-31T10:07:00Z

Which of the InputHighlighterTest.getSpellCheckedWords tests from #6779 can I change to match my new implementation? My implementation fixes the ones marked as FIXME but now fails word?word and word-word.

Since I changed the wordRegex all of InputHighlight.wordRegex would probably need to be rewritten.

pajlada · 2026-01-31T10:17:26Z

Which of the InputHighlighterTest.getSpellCheckedWords tests from #6779 can I change to match my new implementation? My implementation fixes the ones marked as FIXME but now fails word?word and word-word.

Since I changed the wordRegex all of InputHighlight.wordRegex would probably need to be rewritten.

Yeah, go for it if this aims to improve things.
The tests are great because it can document our shortcomings for future fixes

we can do this becasue usernames either match the wordregex or are ignored due to containing numbers or underscores

pajlada · 2026-01-31T11:15:33Z

-        {.input = "word?word", .words = {"word?word"}},
-        {.input = "word-word", .words = {"word-word"}},
+        {.input = "word?word", .words = {"word", "word"}},
+        {.input = "word-word", .words = {"word", "word"}},


This seems incorrect, if you can try to fix this in this PR that would be nice

If I change the regex to (?<=^|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd})\p{Pd}?((?:\p{L}\p{Pd}?)*\p{L})\p{Pd}?(?=$|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd}) (https://regex101.com/r/OK1L36/3) this would work but is pretty unreadable.

A patch with this regex would be:

diff --git a/src/widgets/splits/InputHighlighter.cpp b/src/widgets/splits/InputHighlighter.cpp index 7e350277..b1ab035a 100644 --- a/src/widgets/splits/InputHighlighter.cpp +++ b/src/widgets/splits/InputHighlighter.cpp @@ -101,7 +101,7 @@ namespace inputhighlight::detail { QRegularExpression wordRegex() { static QRegularExpression regex{ - R"((?<=^|(?!_)\p{P})\p{L}+(?=$|(?!_)\p{P}))", + R"((?<=^|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd})\p{Pd}?((?:\p{L}\p{Pd}?)*\p{L})\p{Pd}?(?=$|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd}))", QRegularExpression::PatternOption::UseUnicodePropertiesOption, }; return regex; @@ -182,7 +182,7 @@ void InputHighlighter::visitWords( while (wordIt.hasNext()) { auto wordMatch = wordIt.next(); - auto word = wordMatch.captured(); + auto word = wordMatch.captured(1); if (!isIgnoredWord(channel, word)) { diff --git a/tests/src/InputHighlighter.cpp b/tests/src/InputHighlighter.cpp index 4fb00fb5..e6e38af7 100644 --- a/tests/src/InputHighlighter.cpp +++ b/tests/src/InputHighlighter.cpp @@ -186,7 +186,7 @@ TEST_F(InputHighlighterTest, getSpellCheckedWords) {.input = " word word ", .words = {"word", "word"}}, {.input = "word?", .words = {"word"}}, {.input = "word?word", .words = {"word", "word"}}, - {.input = "word-word", .words = {"word", "word"}}, + {.input = "word-word", .words = {"word-word"}}, { .input = "channel emotes 7TVEmote a BTTVEmote b FFZEmote c", .words = {"channel", "emotes", "a", "b", "c"}, @@ -265,6 +265,8 @@ TEST(InputHighlight, wordRegex) .words = {u"abc", u"foo", u"bar", u"baz"}}, {.input = u"1234567,word/a123", .words = {u"word"}}, {.input = u"'quotes\"", .words = {u"quotes"}}, + {.input = u"word-word-", .words = {u"word-word"}}, + {.input = u"-word--word", .words = {u"word", u"word"}}, }; auto re = inputhighlight::detail::wordRegex(); @@ -275,7 +277,7 @@ TEST(InputHighlight, wordRegex) auto match = re.globalMatchView(c.input); while (match.hasNext()) { - got.emplace_back(match.next().capturedView()); + got.emplace_back(match.next().capturedView(1)); } ASSERT_EQ(got, c.words) << "index=" << i; }

If we need to be this complex, it's probably better to write a manual loop similar to what Firefox does in the referenced function (in hopes that's more readable).

imo, doesn't have to be in this PR: We can keep the current state, make sure that it's properly tested, and then replace it with a manual loop.

4rneee added 4 commits January 28, 2026 19:07

proof of concept

093aa99

changelog and formatting

8752091

still needs cleanup

variable names

f6fe1d4

remove debug prints

573a745

4rneee force-pushed the fix/spellcheck branch from a10d6f9 to 573a745 Compare January 28, 2026 18:08

4rneee marked this pull request as ready for review January 28, 2026 18:19

Nerixyz reviewed Jan 29, 2026

View reviewed changes

Comment thread src/widgets/splits/InputHighlighter.cpp Outdated

Comment thread src/widgets/splits/InputHighlighter.cpp Outdated

Comment thread src/widgets/splits/InputHighlighter.cpp Outdated

4rneee added 2 commits January 29, 2026 20:05

always use globalMatchView

31534d5

change wordRegex and double check chatters and emotes

c17eb11

4rneee and others added 2 commits January 30, 2026 17:46

Merge branch 'master' into fix/spellcheck

87c46cb

Merge branch 'master' into fix/spellcheck

e02ad24

4rneee added 2 commits January 31, 2026 11:56

update testcases

e81d9b8

dont't check if token is chatter and remove @chatter handling

ad1da4c

we can do this becasue usernames either match the wordregex or are ignored due to containing numbers or underscores

pajlada reviewed Jan 31, 2026

View reviewed changes

Nerixyz reviewed Jan 31, 2026

View reviewed changes

Comment thread tests/src/InputHighlighter.cpp

4rneee added 2 commits January 31, 2026 13:28

add testcase for links

03291eb

add comment for dash testcase

15a396e

pajlada approved these changes Jan 31, 2026

View reviewed changes

pajlada merged commit 8f660b6 into Chatterino:master Jan 31, 2026
19 checks passed

This was referenced Jan 31, 2026

Words separated by some punctuation characters are spellchecked as a single word #6762

Closed

Spellchecking: Hyphenated compound words not recognized as one word #6789

Closed

4rneee deleted the fix/spellcheck branch April 4, 2026 12:09

Uh oh!

Conversation

4rneee commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

8thony commented Jan 29, 2026

Uh oh!

4rneee commented Jan 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

4rneee commented Jan 29, 2026

Uh oh!

8thony commented Jan 30, 2026

Uh oh!

4rneee commented Jan 30, 2026

Uh oh!

4rneee commented Jan 31, 2026

Uh oh!

pajlada commented Jan 31, 2026

Uh oh!

pajlada Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

4rneee Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Nerixyz Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

4rneee commented Jan 27, 2026 •

edited

Loading