Skip to content

Problem with String::from_utf8 #54845

Closed
Closed
@AljoschaMeyer

Description

@AljoschaMeyer

The byte sequence [34, 228, 166, 164, 110, 237, 166, 164, 44, 34] ("䦤n���,", quotes are part of the string itself) is considered valid utf8 by ECMAScript (or at least nodejs and firefox), but not by the rust std library.

Not knowing enough about unicode and utf8, I'm just assuming that rust is doing this incorrectly, since both v8 and spidermonkey accept it as valid utf8.

JSON.parse('"䦤n���,"') in javascript returns a string, whereas in rust:

println!("{:?}", String::from_utf8(vec![34u8, 228, 166, 164, 110, 237, 166, 164, 44, 34]));

> Err(FromUtf8Error { bytes: [34, 228, 166, 164, 110, 237, 166, 164, 44, 34], error: Utf8Error { valid_up_to: 5, error_len: Some(1) } })

rustc --version --verbose

binary: rustc
commit-hash: de3d640f59c4fa4a09faf2a8d6b0a812aaa6d6cb
commit-date: 2018-10-01
host: x86_64-unknown-linux-gnu
release: 1.31.0-nightly
LLVM version: 8.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions