Skip to content

'wrap_comments' do not support Chinese or Japanese comments. #2797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
garyhai opened this issue Jun 20, 2018 · 2 comments
Closed

'wrap_comments' do not support Chinese or Japanese comments. #2797

garyhai opened this issue Jun 20, 2018 · 2 comments
Labels
a-comments only-with-option requires a non-default option value to reproduce

Comments

@garyhai
Copy link

garyhai commented Jun 20, 2018

When set 'wap_comments = true', it works fine for English comments. But when write comments in Chinese or Japanese, the long lines more than comment_width do not be wrapped.

@nrc nrc added a-comments only-with-option requires a non-default option value to reproduce labels Jun 20, 2018
@wada314
Copy link
Contributor

wada314 commented Jun 22, 2018

I believe this is kinda already recognized problem but I just write this for my note.
TL;DR: This is very difficult problem.

https://github.com/rust-lang-nursery/rustfmt/blob/87edd75ecf26c9084969f431bb5e363693a8a4ca/src/string.rs#L46-L148
https://github.com/rust-lang-nursery/rustfmt/blob/87edd75ecf26c9084969f431bb5e363693a8a4ca/src/utils.rs#L329-L350

Currently rustfmt's line wrapping is not considering much about East-Asian scripts like Japanese, Chinese.
I just took a quick glance but I can point out at least 2 problems:

  1. The styleguide is not saying anything about this but I think neither str::len() nor Chars::count() should be considered as "line length". The string "ゆゆ式" 's str::len() is 9 and Chars::count() is 3 but in normal monospace glyph it occupies 6 width. Unicode has a TR for this: UAX#11 (rust impl). Though this spec is still leaving us to decide which "context" we should use between CJK and non-CJK contexts. Maybe we can add this setting into rustfmt.toml?

  2. Current line-breaking algorithm is not considering into non-space-separating languages (e.g. CJK). The current implementation is partially using Unicode TR [UAX#29] (rust impl) but it's not something working for CJK. I also feel like we would rather use [UAX#14] to decide the place to insert newline, but unfortunately we don't have official implementation under the official unicode-rs directory today because the TR is not trivial.

So... maybe we should start from implementing [UAX#14]? (just kidding)

@nrc
Copy link
Member

nrc commented Jun 24, 2018

Thanks for the useful info! I'm sorry this still doesn't work. This is covered by #6, so closing this issue, but I do want to fix this.

@nrc nrc closed this as completed Jun 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-comments only-with-option requires a non-default option value to reproduce
Projects
None yet
Development

No branches or pull requests

3 participants