Skip to content

fix: 🐛 fix conversion of client_position into offset_at_position#567

Merged
tombh merged 2 commits into
openlawlibrary:mainfrom
wolfskaempf:push-mwtoonmtxzlq
Jul 16, 2025
Merged

fix: 🐛 fix conversion of client_position into offset_at_position#567
tombh merged 2 commits into
openlawlibrary:mainfrom
wolfskaempf:push-mwtoonmtxzlq

Conversation

@wolfskaempf

@wolfskaempf wolfskaempf commented Jul 15, 2025

Copy link
Copy Markdown
Contributor

Description

offset_at_position was incorrectly using code units instead of code points, leading to incorrect offset calculations for UTF-[8|16].

client_num_units returns code units, but the correct calculation needs to use code points, as returned by len().

This bug affects UTF-16 only in cases where code points exceeding the basic multilingual plane are present in lines preceding the given position.

Adapt SAMPLE_STRING to cover this case.

Introduce a test that checks all existent client positions' offset calculations in SAMPLE_STRING using an actually encoded string for comparison.

Code review checklist (for code reviewer to complete)

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Commit messages are meaningful (see this for details)
  • Tests have been included and/or updated, as appropriate
  • Docstrings have been included and/or updated, as appropriate
  • Standalone docs have been updated accordingly

Automated linters

You can run the lints that are run on CI locally with:

uv run --all-extras poe lint

@wolfskaempf

wolfskaempf commented Jul 15, 2025

Copy link
Copy Markdown
Contributor Author

WIP: The test needs to be improved, as it currently also passes for the previous implementation of offset_at_position (this process may reveal further implementation considerations).

Edit: No implementation considerations were found, only the test case needed to be adapted.

@wolfskaempf wolfskaempf force-pushed the push-mwtoonmtxzlq branch 3 times, most recently from bb7138c to cc7408f Compare July 16, 2025 11:36
@wolfskaempf wolfskaempf marked this pull request as ready for review July 16, 2025 11:40
`offset_at_position` was incorrectly using code units instead of code
points, leading to incorrect offset calculations for UTF-[8|16].

`client_num_units` returns code units, but the correct calculation needs
to use code points, as returned by `len()`.

This bug affects UTF-16 only in cases where code points exceeding
the basic multilingual plane are present in lines preceding the given
position.

Adapt SAMPLE_STRING to cover this case.

Introduce a test that checks all existent client positions' offset
calculations in SAMPLE_STRING using an actually encoded string for
comparison.

Co-authored-by: Linus Heckemann <git@sphalerite.org>
@wolfskaempf

Copy link
Copy Markdown
Contributor Author

The version I just uploaded is ready for review :)

@tombh

tombh commented Jul 16, 2025

Copy link
Copy Markdown
Collaborator

It looks good to me! I'd also like to get @lheckemann's opinion on this too seeing as they were working on it so recently.

@lheckemann

lheckemann commented Jul 16, 2025

Copy link
Copy Markdown
Contributor

@wolfskaempf and I worked on this together, and found this after finding that my change wasn't enough to get the language server to work with helix :)

@tombh

tombh commented Jul 16, 2025

Copy link
Copy Markdown
Collaborator

Oh! That's an approval then ✅🤓

@tombh tombh self-requested a review July 16, 2025 15:18
@tombh tombh merged commit a374b24 into openlawlibrary:main Jul 16, 2025
18 checks passed
@tombh tombh mentioned this pull request Jul 16, 2025
@tombh

tombh commented Jul 18, 2025

Copy link
Copy Markdown
Collaborator

Released in https://pypi.org/project/pygls/2.0.0a6 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants