Skip to content

Fix text not serializing correctly when having utf8 boundaries #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bnchi
Copy link
Contributor

@bnchi bnchi commented Mar 9, 2025

Fixes #169

@bnchi bnchi changed the title Fix text not serializing correctly when having utf8 boundries Fix text not serializing correctly when having utf8 boundaries Mar 9, 2025
@johnlepikhin
Copy link

Hi there,

Any chance to merge it?

@bnchi
Copy link
Contributor Author

bnchi commented Apr 18, 2025

I'm just waiting on the maintainers to review and merge this.

@wooorm
Copy link
Owner

wooorm commented Apr 18, 2025

I missed/forgot about this. Will get to it soon!

@ChristianMurphy ChristianMurphy requested a review from Copilot April 18, 2025 16:06
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where text was not serialized correctly when encountering UTF-8 boundaries. It adds a new test to verify proper handling of UTF-8 characters and updates state management to correctly handle multi-byte character slicing.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
mdast_util_to_markdown/tests/roundtrip.rs Added a new test ensuring correct serialization of UTF-8 text.
mdast_util_to_markdown/src/state.rs Updated character slicing to correctly extract the last UTF-8 character.

Comment on lines +408 to +409
"should support utf8 in boundries when serializing"
);
Copy link
Preview

Copilot AI Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: please change 'boundries' to 'boundaries' for clarity.

Suggested change
"should support utf8 in boundries when serializing"
);
"should support utf8 in boundaries when serializing"

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

@johnlepikhin
Copy link

I just found another case which is not resolved with suggested fix:

    let doc = "я_𝄞";
    assert_eq!(to(&from(doc, &Default::default()).unwrap()).unwrap(), doc);
---- roundtrip stdout ----
thread 'roundtrip' panicked at mdast_util_to_markdown/src/state.rs:540:51:
byte index 7 is not a char boundary; it is inside '𝄞' (bytes 4..8) of `
я_𝄞

@wooorm wooorm closed this in add76bd Apr 23, 2025
wooorm added a commit that referenced this pull request Apr 23, 2025
@wooorm
Copy link
Owner

wooorm commented Apr 23, 2025

Found them, thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mdast_util_to_markdown panics while trying to format UTF text with trailing spaces
3 participants