Skip to content

[BUG] Error parsing UTF-8 character literal that is not a hex character #1142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bluetarpmedia opened this issue Jun 24, 2024 · 4 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@bluetarpmedia
Copy link
Contributor

Describe the bug
cppfront produces an error when parsing a UTF-8 character literal (u8) which is not a hex character.

To Reproduce
Run cppfront on this code:

main: () -> int = {

    a:= u8'a';  // ok
    b:= u8'b';  // ok
    c:= u8'c';  // ok
    d:= u8'd';  // ok
    e:= u8'e';  // ok
    f:= u8'f';  // ok
    g:= u8'g';  // error: line ended before character literal was terminated

    return 0;
}

Repro

@bluetarpmedia bluetarpmedia added the bug Something isn't working label Jun 24, 2024
@sookach
Copy link
Contributor

sookach commented Jun 25, 2024

Pardon my ignorance, but isn't u8 an unsigned 8 bit integer, not a utf-8 character literal?

@bluetarpmedia
Copy link
Contributor Author

Yeah, Cpp2 has the type u8 (which lowers to cpp2::u8) but C++17 introduced the UTF-8 character literal so you can write u8'a'.

https://en.cppreference.com/w/cpp/language/character_literal

From my reading of the lexer, Cpp2 does support it:

else if (peek1 == '8' && peek2 == next) { return 3; } // u8"

@hsutter
Copy link
Owner

hsutter commented Jun 27, 2024

Thanks! I'll take a look.

I hadn't noticed that the literal prefix and the unsigned type alias used the same name. Interesting!

@hsutter
Copy link
Owner

hsutter commented Jul 12, 2024

Thanks, I found the problem. It turns out I need to also check the encoding prefixes when doing the load.h brace-match to find the end of the Cpp2 definition, which also needs to be aware of literals (in case braces we should ignore are hiding inside a literal). Fixing...

@hsutter hsutter self-assigned this Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants