fix(spellcheck): some chatters and emotes not ignored#6780
fix(spellcheck): some chatters and emotes not ignored#6780pajlada merged 12 commits intoChatterino:masterfrom
Conversation
still needs cleanup
a10d6f9 to
573a745
Compare
|
If you add a dot, comma, or slash right after the name/emote, it does underline red. And like you mentioned, "test/man" could also be fixed, right? chatterino2/src/widgets/splits/InputHighlighter.cpp Lines 111 to 112 in 093aa99 |
Right, I didn't catch that. I first wanted to argue that it's fine for emotes because they won't be rendered but apparently that's not the case for twitch emotes. And for usernames it should definitely be ignored.
Not sure how true that claim was. I thought that if we filter out all special cases (usernames, links, emotes, ...) before, then the |
|
I changed the Now, a token is ignored if it's an emote, chatter or link. If it is not ignored, the regex is applied and words are checked again if they are emotes or chatters. I think this is necessary to handle cases like "user1,user2". However, because the new I think there might still be the very niche edgecase where a third party emote has punctuation in it's name |
|
Much better now.
If I understand correctly, that means that every word with a number or an underscore in it is ignored even if it's not a real chatter/emote.
It seems like this issue is also fixed. |
Yes. From my experience firefox does this too for numbers. The underscores were necessary to not partially match usernames.
Yes, with the new regex thats fixed too. |
|
Which of the Since I changed the |
Yeah, go for it if this aims to improve things. |
we can do this becasue usernames either match the wordregex or are ignored due to containing numbers or underscores
| {.input = "word?word", .words = {"word?word"}}, | ||
| {.input = "word-word", .words = {"word-word"}}, | ||
| {.input = "word?word", .words = {"word", "word"}}, | ||
| {.input = "word-word", .words = {"word", "word"}}, |
There was a problem hiding this comment.
This seems incorrect, if you can try to fix this in this PR that would be nice
There was a problem hiding this comment.
If I change the regex to (?<=^|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd})\p{Pd}?((?:\p{L}\p{Pd}?)*\p{L})\p{Pd}?(?=$|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd}) (https://regex101.com/r/OK1L36/3) this would work but is pretty unreadable.
A patch with this regex would be:
diff --git a/src/widgets/splits/InputHighlighter.cpp b/src/widgets/splits/InputHighlighter.cpp
index 7e350277..b1ab035a 100644
--- a/src/widgets/splits/InputHighlighter.cpp
+++ b/src/widgets/splits/InputHighlighter.cpp
@@ -101,7 +101,7 @@ namespace inputhighlight::detail {
QRegularExpression wordRegex()
{
static QRegularExpression regex{
- R"((?<=^|(?!_)\p{P})\p{L}+(?=$|(?!_)\p{P}))",
+ R"((?<=^|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd})\p{Pd}?((?:\p{L}\p{Pd}?)*\p{L})\p{Pd}?(?=$|(?!_|\p{Pd})\p{P}|\p{Pd}\p{Pd}))",
QRegularExpression::PatternOption::UseUnicodePropertiesOption,
};
return regex;
@@ -182,7 +182,7 @@ void InputHighlighter::visitWords(
while (wordIt.hasNext())
{
auto wordMatch = wordIt.next();
- auto word = wordMatch.captured();
+ auto word = wordMatch.captured(1);
if (!isIgnoredWord(channel, word))
{
diff --git a/tests/src/InputHighlighter.cpp b/tests/src/InputHighlighter.cpp
index 4fb00fb5..e6e38af7 100644
--- a/tests/src/InputHighlighter.cpp
+++ b/tests/src/InputHighlighter.cpp
@@ -186,7 +186,7 @@ TEST_F(InputHighlighterTest, getSpellCheckedWords)
{.input = " word word ", .words = {"word", "word"}},
{.input = "word?", .words = {"word"}},
{.input = "word?word", .words = {"word", "word"}},
- {.input = "word-word", .words = {"word", "word"}},
+ {.input = "word-word", .words = {"word-word"}},
{
.input = "channel emotes 7TVEmote a BTTVEmote b FFZEmote c",
.words = {"channel", "emotes", "a", "b", "c"},
@@ -265,6 +265,8 @@ TEST(InputHighlight, wordRegex)
.words = {u"abc", u"foo", u"bar", u"baz"}},
{.input = u"1234567,word/a123", .words = {u"word"}},
{.input = u"'quotes\"", .words = {u"quotes"}},
+ {.input = u"word-word-", .words = {u"word-word"}},
+ {.input = u"-word--word", .words = {u"word", u"word"}},
};
auto re = inputhighlight::detail::wordRegex();
@@ -275,7 +277,7 @@ TEST(InputHighlight, wordRegex)
auto match = re.globalMatchView(c.input);
while (match.hasNext())
{
- got.emplace_back(match.next().capturedView());
+ got.emplace_back(match.next().capturedView(1));
}
ASSERT_EQ(got, c.words) << "index=" << i;
}
There was a problem hiding this comment.
If we need to be this complex, it's probably better to write a manual loop similar to what Firefox does in the referenced function (in hopes that's more readable).
imo, doesn't have to be in this PR: We can keep the current state, make sure that it's properly tested, and then replace it with a manual loop.
Previously some emotes or usernames that didn't match the
wordRegexwere not properly ignored. Fixes #6776Now instead of iterating over the words that match
wordRegexwe first iterate over all non-whitespace strings and filter out ignored words such as emotes, usernames and links usingisIgnoredWordand only applywordRegexfor non-ignored words.This approach is a bit ugly because of the nested while loops but I didn't find an alternative yet.
isIgnoredWordnow also ignores the channel name if theAlways include broadcaster in user completionssetting is set to have consistency between autocompletion and spellchecking.