Skip to content

Hash completion items to properly match them during /resolve #18653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 11, 2024

Conversation

SomeoneToIgnore
Copy link
Contributor

Closes #18547
Closes #18640

Follows #18547 (comment) proposal and adds completion items hashing, using all their data possible, that does not change between the edits.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 9, 2024
Copy link
Contributor Author

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is how relatively fragile this is, if another field is added or some fields start to alter between text edits, but otherwise it seems to work on VSCode and Helix during my manual testing.

cc @Veykril as it would be good to use this for some while to see if that sporadic error is gone with this build.

@@ -346,8 +346,7 @@ pub enum CompletionItemKind {
impl_from!(SymbolKind for CompletionItemKind);

impl CompletionItemKind {
#[cfg(test)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be quite an odd change, but it was quite tempting to reuse existing &str producer for this large enum.

As an alternative, I can copy this into the module with the hashing function, are there better ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just make CompletionItemKind and SymbolKind Hash?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, tenthash doesn't accept hash things, only byte slices...

@@ -826,7 +826,8 @@ pub struct CompletionResolveData {
pub imports: Vec<CompletionImport>,
pub version: Option<i32>,
pub trigger_character: Option<char>,
pub completion_item_index: usize,
pub for_ref: bool,
pub hash: [u8; 20],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce the JSON size with https://github.com/cessen/tenthash/blob/159da57b120bf2abcdd42532d17a7ae2957f98b8/tenthash-rust/tests/test_vectors.rs#L17 but not sure how appropriate this conversion is, any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do something like base64, that encoding is just a hex string which is pretty standard to use for hashes but not necessairily the most information dense (compared to base64)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, had applied Base64 and it does trim the character number by half:

Before:
"hash":[215,92,116,28,211,148,173,32,58,55,147,15,75,30,169,165,93,8,60,63]

After:
"hash":"11x0HNOUrSA6N5MPSx6ppV0IPD8="

Seems like a good thing to do.

@@ -346,8 +346,7 @@ pub enum CompletionItemKind {
impl_from!(SymbolKind for CompletionItemKind);

impl CompletionItemKind {
#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just make CompletionItemKind and SymbolKind Hash?

@@ -346,8 +346,7 @@ pub enum CompletionItemKind {
impl_from!(SymbolKind for CompletionItemKind);

impl CompletionItemKind {
#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, tenthash doesn't accept hash things, only byte slices...

Copy link
Contributor

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a much more robust approach, thanks for the fast turnaround!


let Some(corresponding_completion) = completions.into_iter().find(|completion_item| {
let hash = completion_item_hash(completion_item, resolve_data.for_ref);
hash == resolve_data.hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think may be a good idea to first check if the labels are identical and only compute and compare the hash if they are:

  • this will reduce computational requirements a bit since hash is only computed for any realistic candidates
  • as a bonus it makes collisons even more unlikely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea.

It's a bit tricky because LSP completion items get their labels changed compared to the original ones, but luckily only by adding a suffix, so we can still reduce the computations with a starts_with check.

@@ -826,7 +826,8 @@ pub struct CompletionResolveData {
pub imports: Vec<CompletionImport>,
pub version: Option<i32>,
pub trigger_character: Option<char>,
pub completion_item_index: usize,
pub for_ref: bool,
pub hash: [u8; 20],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do something like base64, that encoding is just a hex string which is pretty standard to use for hashes but not necessairily the most information dense (compared to base64)

* Exclude documentation field from hashing
* Do less cloning during initial completion list generation
* Use Base64 to minify the hash representation in the JSON data
* Do hash checks only for items with similar labels
@Veykril Veykril added this pull request to the merge queue Dec 11, 2024
Merged via the queue into rust-lang:master with commit 087cb62 Dec 11, 2024
9 checks passed
@SomeoneToIgnore SomeoneToIgnore deleted the hash-completions branch December 11, 2024 08:17
@tgross35
Copy link
Contributor

tgross35 commented Dec 11, 2024

Would it be possible to do a sync soonish? I don't think either of the fixes have made it to r-l/rust so nightly is still buggy.

Edit: thanks Inicola, done in rust-lang/rust#134170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request completionItem/resolve failed Autocomplete suggestion changes to something random once selected
5 participants