JSON encoder silently corrupts some 64-bit ints #18319

kmcallister · 2014-10-25T20:56:26Z

_Because of JavaScript_™, the JSON number type can't represent all 64-bit numbers. This code

extern crate serialize;

use serialize::json;

fn main() {
    let n: u64 = 0x747865747a6f6d71;
    let j = json::encode(&n); 
    let m: u64 = json::decode(j.as_slice()).unwrap();
    assert_eq!(n, m);  
}

fails with

task '<main>' failed at 'assertion failed: `(left == right) && (right == left)` (left: `8392569456549653873`, right: `8392569456549654464`)', foo.rs:9

This is pretty surprising, especially when the u64 field is buried within a struct that's deriving Encodable. If the encoder is going to implement emit_u64 and emit_i64 at all, it should do something that doesn't silently corrupt data. There's no universal standard for 64-bit numbers in JSON but a decimal ASCII string seems like a reasonable default. emit_uint and emit_int should also switch to that.

The text was updated successfully, but these errors were encountered:

maz · 2014-10-27T00:55:42Z

Certainly, the current behavior is incorrect; an incorrect output is not acceptable behavior. However, it may not make sense to silently output a 64-bit integer. Per RFC 7159, outputting numbers with greater precision than double precision floating point numbers may "indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available." As such, it may make sense to deprecate the encoding (into JSON) of 64-bit integers altogether. Alternatively, it may make sense to have a flag in the JSON encoder causing it to either throw upon encountering a 64-bit integer (which exceeds the precision typically allowed in JSON), or output potentially non-portable JSON. That being said, by default, it appears that Go's JSON library follows your recommendation by default.

mahkoh · 2014-10-28T13:57:34Z

JSON is a lossy format and encoding is not invertible.

serialize::json::Encoder currently uses f64 to emit any integral type. This is possibly due to the behavior of JavaScript, which uses f64 to represent any numeric value. This leads to a problem that only the integers in the range of [-2^53+1, 2^53-1] can be encoded. Therefore, i64 and u64 cannot be used reliably in the current implementation. RFC 7159 suggests that good interoperability can be achieved if the range is respected by implementations. However, it also says that implementations are allowed to set the range of number accepted. And it seems that the JSON encoders outside of the JavaScript world usually make use of i64 values. This commit removes the float preprocessing done in the emit_* methods. It also increases performance, because transforming f64 into String costs more than that of an integral type. Fixes rust-lang#18319 [breaking-change]

This pull request tries to improve type safety of `serialize::json::Encoder`. Looking at #18319, I decided to test some JSON implementations in other languages. The results are as follows: * Encoding to JSON | Language | 111111111111111111 | 1.0 | | --- | --- | --- | | JavaScript™ | "111111111111111100" | "1" | | Python | "111111111111111111" | **"1.0"** | | Go | "111111111111111111" | "1" | | Haskell | "111111111111111111" | "1" | | Rust | **"111111111111111104"** | "1" | * Decoding from JSON | Language | "1" | "1.0" | "1.6" | | --- | --- | --- | --- | | JavaScript™ | 1 (Number) | 1 (Number) | 1.6 (Number) | | Python | 1 (int) | 1.0 (float) | 1.6 (float) | | Go | **1 (float64)** | 1 (float64) | 1.6 (float64) | | Go (expecting `int`) | 1 (int) | **error** | error | | Haskell (with `:: Int`) | 1 (Int) | 1 (Int) | **2 (Int)** | | Haskell (with `:: Double`) | 1.0 (Double) | 1.0 (Double) | 1.6 (Double) | | Rust (with `::<int>`) | 1 (int) | 1 (Int) | **1 (Int)** | | Rust (with `::<f64>`) | 1 (f64) | 1 (f64) | 1.6 (f64) | * The tests on Haskell were done using the [json](http://hackage.haskell.org/package/json) package. * The error message printed by Go was: `cannot unmarshal number 1.0 into Go value of type int` As you see, there is no uniform behavior. Every implementation follows its own principle. So I think it is reasonable to find a desirable set of behaviors for Rust. Firstly, every implementation except the one for JavaScript is capable of handling `i64` values. It is even practical, because [Twitter API uses an i64 number to represent a tweet ID](https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake), although it is recommended to use the string version of the ID. Secondly, looking into the Go's behavior, implicit type conversion is not allowed in their decoder. If the user expects an integer value to follow, decoding a float value will raise an error. This behavior is desirable in Rust, because we are pleased to follow the principles of strong typing. Thirdly, Python's JSON module forces a decimal point to be printed even if the fractional part does not exist. This eases the distinction of a float value from an integer value in JSON, because by the spec there is only one type to represent numbers, `Number`. So, I suggest the following three breaking changes: 1. Remove float preprocessing in serialize::json::Encoder `serialize::json::Encoder` currently uses `f64` to emit any integral type. This is possibly due to the behavior of JavaScript, which uses `f64` to represent any numeric value. This leads to a problem that only the integers in the range of [-2^53+1, 2^53-1] can be encoded. Therefore, `i64` and `u64` cannot be used reliably in the current implementation. [RFC 7159](http://tools.ietf.org/html/rfc7159) suggests that good interoperability can be achieved if the range is respected by implementations. However, it also says that implementations are allowed to set the range of number accepted. And it seems that the JSON encoders outside of the JavaScript world usually make use of `i64` values. This commit removes the float preprocessing done in the `emit_*` methods. It also increases performance, because transforming `f64` into String costs more than that of an integral type. Fixes #18319 2. Do not coerce to integer when decoding a float value When an integral value is expected by the user but a fractional value is found, the current implementation uses `std::num::cast()` to coerce to an integer type, losing the fractional part. This behavior is not desirable because the number loses precision without notice. This commit makes it raise `ExpectedError` when such a situation arises. 3. Always use a decimal point when emitting a float value JSON doesn't distinguish between integer and float. They are just numbers. Also, in the current implementation, a fractional number without the fractional part is encoded without a decimal point. Thereforce, when the value is decoded, it is first rendered as `Json`, either `I64` or `U64`. This reduces type safety, because while the original intention was to cast the value to float, it can also be casted to integer. As a workaround of this problem, this commit makes the encoder always emit a decimal point even if it is not necessary. If the fractional part of a float number is zero, ".0" is padded to the end of the result.

kmcallister added A-libs labels Oct 25, 2014

barosl mentioned this issue Nov 23, 2014

Improve type safety of serialize::json::Encoder #19249

Merged

bors closed this as completed in #19249 Dec 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON encoder silently corrupts some 64-bit ints #18319

JSON encoder silently corrupts some 64-bit ints #18319

kmcallister commented Oct 25, 2014

maz commented Oct 27, 2014

mahkoh commented Oct 28, 2014

JSON encoder silently corrupts some 64-bit ints #18319

JSON encoder silently corrupts some 64-bit ints #18319

Comments

kmcallister commented Oct 25, 2014

maz commented Oct 27, 2014

mahkoh commented Oct 28, 2014