Skip to content

Typeahead remove diacritics #5292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 16, 2022
Merged

Conversation

Fingel
Copy link
Contributor

@Fingel Fingel commented Mar 11, 2022

When invoking the typeahead filter for users it is desirable to be able
to see results for users with diacritics in their names even if the
filter does not contain the matching diacritics. This PR leverages
@zulip/shared typeahead module in order to strip diacritics out of the
filtering process, unless the user explicitly uses them in the filter.

Fixes: #3710

@chrisbobbe
Copy link
Contributor

Hmm, looks like the CI failure is a Flow error:

Error ------------------------------------------------------------------------------------- src/users/userHelpers.js:4:1

Importing from an untyped module makes it `any` and is not safe! Did you mean to add `// @flow` to the top of
`@zulip/shared/js/typeahead`? [untyped-import]

   4| import * as typeahead from '@zulip/shared/js/typeahead';
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^



Found 1 error

If it's reasonably easy, I think we'd want to add a js/typeahead.js.flow to @zulip/shared, like we have an internal_url.js.flow and typing_status.js.flow.

Otherwise, I think we'd add a // $FlowFixMe[untyped-import] line above the import.

Greg, is that right?

@Fingel
Copy link
Contributor Author

Fingel commented Mar 11, 2022

Hmm, looks like the CI failure is a Flow error:

Error ------------------------------------------------------------------------------------- src/users/userHelpers.js:4:1

Importing from an untyped module makes it `any` and is not safe! Did you mean to add `// @flow` to the top of
`@zulip/shared/js/typeahead`? [untyped-import]

   4| import * as typeahead from '@zulip/shared/js/typeahead';
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^



Found 1 error

If it's reasonably easy, I think we'd want to add a js/typeahead.js.flow to @zulip/shared, like we have an internal_url.js.flow and typing_status.js.flow.

Otherwise, I think we'd add a // $FlowFixMe[untyped-import] line above the import.

Greg, is that right?

This is correct, there will be a PR for zulip/zulip with a .flow file for typeahead.js shortly, I'll make this a draft PR for now.

@Fingel Fingel marked this pull request as draft March 11, 2022 20:46
Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Fingel! This will be a good change. Comments below.

user2, // name contains in 'ma'
user11, // have priority because of 'ma' contains in name
user4, // email contains 'ma'
user10, // have priority because of initials condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment needs updating

@@ -1,6 +1,7 @@
/* @flow strict-local */
// $FlowFixMe[untyped-import]
import uniqby from 'lodash.uniqby';
import * as typeahead from '@zulip/shared/js/typeahead';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build fails because this doesn't have types.

Have you sent a zulip/zulip PR with the .js.flow file? Then once merged I can make a shared release, and this PR can use that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Probably makes sense to add types for all the exports in that file while you're at it, though on the call yesterday we only did the one.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/zulip/zulip/pull/21397/files

Hmm, that didn't cause GitHub to show a link from that PR back to this one!

That is… a trick I may use in the future 😉 . Sometimes those back-links are unwanted.

Here, though, I think it's helpful when looking at that PR to see its context from this one. So here's a link that will cause a back-link: zulip/zulip#21397

const expectedUsers = [user4, user5];
expect(filterUserByInitials(users, 'ma', selfUser.user_id)).toEqual(expectedUsers);
test('returns users whose name contains diacritics but otherwise starts with filter', () => {
const with_diacritis = eg.makeUser({ name: 'Frödö', email: '[email protected]' });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spelling, "diacritics" (here and another test)

Comment on lines 324 to 342
});
test('returns users whose full_name has diacritics but otherwise contains filter', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: blank line between these

I don't have a great bright-line rule I can articulate for when a blank line is called for in general. But when there's two multi-line blocks next to each other, like these two test cases, those almost always call for a blank line to separate them.

The Google style guide describes rules I agree with:
https://google.github.io/styleguide/jsguide.html#formatting-vertical-whitespace
Applying those literally to this case, it's a question of whether these tests belong together as a "logical grouping". But also, the tests are a lot like separate class method definitions themselves, and they benefit from a space for much the same reasons.

Comment on lines 262 to 264
const with_diacritis = eg.makeUser({ name: 'Frödö', email: '[email protected]' });
const without_diacritics = eg.makeUser({ name: 'Frodo', email: '[email protected]' });
const non_matching_user = eg.makeUser({ name: 'Zalix', email: '[email protected]' });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: camelCase for these names -- they're locals that don't come from the server API

const with_diacritis = eg.makeUser({ name: 'Frödö', email: '[email protected]' });
const without_diacritics = eg.makeUser({ name: 'Frodo', email: '[email protected]' });
const non_matching_user = eg.makeUser({ name: 'Zalix', email: '[email protected]' });
const selfUser = eg.makeUser({ name: 'Bob', email: '[email protected]' });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and email of this self-user aren't playing any role in the test -- it'd come out the same regardless of what they were.

In general, we try to minimize the amount of detail that a test specifies beyond what's relevant to it. That's helpful for reading and understanding the tests, and also just for making it less tedious to write the tests. So when there is such detail, we try to find ways to lean on the eg module to provide it instead.

Here, this could just be eg.makeUser(). Then a step further: instead of saying const selfUser at all, we could use eg.selfUser in the spot where this is used below.

An alternate direction would be to make the details of the self-user relevant: if they had a name that did match, then we'd be exercising the user.user_id !== ownUserId in the function. But I think actually best here is to leave that out -- that's covered already in a separate test, the one just above this.

Comment on lines 90 to 93
): $ReadOnlyArray<AutocompleteOption> => {
const loweredFilter = filter.toLowerCase();
const isAscii = /^[a-z]+$/.test(loweredFilter);

export const filterUserByInitials = (
users: $ReadOnlyArray<AutocompleteOption>,
filter: string = '',
ownUserId: UserId,
): $ReadOnlyArray<AutocompleteOption> =>
users.filter(
user =>
user.user_id !== ownUserId
&& user.full_name
.replace(/(\s|[a-z])/g, '')
.toLowerCase()
.startsWith(filter.toLowerCase()),
);
return users.filter(user => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is slightly cleaner without the blank line.

Partly that's because this function is so short -- as that Google style item puts it, use "sparingly" within a function body, to create logical groupings. This short function doesn't need separate logical groupings.

It's also helpful in visually parsing this file in general, as this is one of a string of short functions -- it's helpful that each of them is one stanza.

.startsWith(filter.toLowerCase()),
);
return users.filter(user => {
const full_name = isAscii ? typeahead.remove_diacritics(user.full_name) : user.full_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a test to cover the "else" case of this too -- the fact that if you do type a diacritic, it won't match a name that has that letter without the diacritic; or has a different diacritic; or on some other letter where the query is just plain ASCII, does have a diacritic.

@gnprice
Copy link
Member

gnprice commented Mar 12, 2022

Merged zulip/zulip#21397 , and released shared-0.0.9 with that change. So this PR should be able to use that now.

@Fingel Fingel force-pushed the typeahead-remove-diacritics branch from 0b3802e to 7bb3486 Compare March 14, 2022 22:28
@Fingel Fingel marked this pull request as ready for review March 14, 2022 22:39
@Fingel
Copy link
Contributor Author

Fingel commented Mar 14, 2022

OK, I think this should be ready for review, for realsies this time. Sorry about that!

Copy link
Member

@gnprice gnprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Fingel for the revisions! Just a couple of small comments remaining.

Comment on lines -128 to +127
const user10 = eg.makeUser({ name: 'Mobile App', email: '[email protected]' }); // satisfy initials and email condition
const user10 = eg.makeUser({ name: 'Mobile App', email: '[email protected]' }); // satisfy email condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this and the remaining changes in this test should go in the first commit, together with the change in the code they describe

The commands I'd use for making this sort of revision -- pulling some changes forward from a later commit in a branch to an earlier one -- would be:

  • git rebase -i; say to edit the first commit
  • when the rebase reaches that commit, use git checkout -p to select the later changes to add.
    • For example, git checkout -p main if your local branch is main; or git checkout -p typeahead-remove-diacritics if you're using the same name locally as you have here.
    • In general you could also add the filename, or a directory containing it: git checkout -p main -- src/users/__tests__/. In this case there aren't a lot of other files to skip past, so you might not bother.
    • To get your bearings before doing the git checkout -p, you can use git diff --stat -p -R main. That will show you all the changes you'll get to choose from if you do git checkout -p main.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just 2 or 3 changes in the comments, so I just edited the first commit manually and then the rest of the rebase was clean, but thanks for the info. I could definitely see it getting more complicated with larger changesets.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

I just edited the first commit manually and then the rest of the rebase was clean

Yeah, I forgot to mention: that's also how the git checkout -p approach works, except for the word "manually".

That is, it's just a way of editing what you have in the tree. The rebase doesn't know anything special about whether you edited with git checkout -p vs. with your editor or anything else.

But then when you continue the rebase and it goes to apply the later commit, when some of the hunks in that diff are already present, it doesn't complain, and instead just makes a commit using the remaining hunks.

Comment on lines 271 to 283
const withDiacritics = eg.makeUser({ name: 'Frödö', email: '[email protected]' });
const withoutDiacritics = eg.makeUser({ name: 'Frodo', email: '[email protected]' });
const nonMatchingUser = eg.makeUser({ name: 'Zalix', email: '[email protected]' });
const users = deepFreeze([withDiacritics, withoutDiacritics, nonMatchingUser]);
const expectedUsers = [withDiacritics];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's exercise a couple more cases as mentioned here: #5292 (comment)

  • (done) if you do type a diacritic, it won't match a name that has that letter without the diacritic.
  • if you do type a diacritic, it won't match a name that has that letter with a different diacritic.
  • if you do type a diacritic, it won't match a name that, on some other letter where the query is just plain ASCII, does have a diacritic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these cases should be covered now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This one isn't, though:

  • if you do type a diacritic, it won't match a name that, on some other letter where the query is just plain ASCII, does have a diacritic.

For example, searching for "Frödo" won't find "Frödö". This is true even though the individual pieces of the query would match: "Fröd" (without the last "o") would match, because it matches exactly, and "do" would match (because it matches the "dö" part with diacritics removed.)

Let's also test the corresponding cases on filterUserThatContains as well as filterUserStartWith.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think I follow you, check the latest revision.

@Fingel Fingel force-pushed the typeahead-remove-diacritics branch 3 times, most recently from 5b20a46 to 35872ea Compare March 16, 2022 17:44
Fingel added 3 commits March 16, 2022 10:48
This commit removes the unintuitive behavior of filtering users in a
typeahead by their initials; for example 'az' would match Abba Zee. This is more
likely to cause confusion than to be of real practical value.
This release added flow types for the shared `typeahead` module.
When invoking the typeahead filter for users it is desirable to be able
to see results for users with diacritics in their names even if the
filter does not contain the matching diacritics. This PR leverages
@zulip/shared typeahead module in order to strip diacritics out of the
filtering process, unless the user explicitly uses them in the filter.

Fixes: zulip#3710
@gnprice gnprice force-pushed the typeahead-remove-diacritics branch from 35872ea to d73e764 Compare March 16, 2022 17:49
@gnprice gnprice merged commit d73e764 into zulip:main Mar 16, 2022
@gnprice
Copy link
Member

gnprice commented Mar 16, 2022

Thanks @Fingel for the revisions! This looks good -- merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Search people's names modulo diacritics, in autocomplete
3 participants