Skip to content

[Security] Silent coercion of Unicode non-ASCII digit variants in nextInt()/nextLong() bypasses application-level input validation #2994

@the5orcerer

Description

@the5orcerer

Attachments

GsonFullWidthHonestTest.java
GsonTypeTest.java
manifest-2.13.2.txt
test-output-2.13.1.txt
test-output-2.13.2.txt

Summary

Gson silently accepts Unicode non-ASCII digit variants (Full-Width, Arabic-Indic, Bengali,
and other Unicode decimal digit blocks) when deserializing JSON values into int and long
fields. This occurs without any exception, warning, or error — even under Strictness.STRICT
mode. The behavior creates a dangerous gap between what application-layer validators observe
in the raw JSON string and what Gson ultimately produces as an integer value.

Affected Versions

  • Gson 2.13.1
  • Gson 2.13.2 (latest, verified via Bundle-Version: 2.13.2 in MANIFEST.MF)
  • Likely all prior versions

Root Cause

Two behaviors combine unsafely:

  1. JsonReader.nextInt() / nextLong() accept quoted string tokens for numeric fields
    and pass the raw string directly to Integer.parseInt() / Long.parseLong() without
    any ASCII pre-validation.

  2. Integer.parseInt() internally calls Character.digit(c, 10), which recognizes
    any Unicode character designated as a decimal digit — not just ASCII 0-9. This is
    Java spec behavior, but Gson does not guard against it.

This means the string "123456" (U+FF10–FF15, Full-Width digits) is silently parsed
as the integer 123456.

Notably, double / float fields are not affected because Double.parseDouble() uses
FloatingDecimal internally, which is ASCII-strict and throws NumberFormatException. This
divergence confirms the exact code path and demonstrates this is a Gson-level gap, not a
Java platform issue.

Strictness.STRICT mode does not prevent this. The coercion happens downstream of all
strictness checks, inside parseInt().

Confirmed Vulnerable Input Examples

JSON Input Field Type Parsed Value Unicode Block
{"value": "123456"} int 123456 Full-Width Digits (U+FF10–FF19)
{"value": "١٢٣"} int 123 Arabic-Indic Digits (U+0660–0669)
{"value": "১২৩"} int 123 Bengali Digits (U+09E6–09EF)
{"value": "123"} int 123 Mixed ASCII + Full-Width
{"value": 123456} int 123456 Unquoted Full-Width (LENIENT)

Proof of Concept

import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.ToNumberPolicy;

public class GsonUnicodeDigitPOC {

    static class Payload {
        int value;
    }

    public static void main(String[] args) {
        Gson gson = new Gson();

        // Full-Width digits (U+FF10-FF19)
        String json1 = "{\"value\": \"123456\"}";
        Payload r1 = gson.fromJson(json1, Payload.class);
        System.out.println("Full-Width: " + r1.value);  // prints 123456

        // Arabic-Indic digits (U+0660-0669)
        String json2 = "{\"value\": \"١٢٣\"}";
        Payload r2 = gson.fromJson(json2, Payload.class);
        System.out.println("Arabic-Indic: " + r2.value);  // prints 123

        // Bengali digits (U+09E6-09EF)
        String json3 = "{\"value\": \"১২৩\"}";
        Payload r3 = gson.fromJson(json3, Payload.class);
        System.out.println("Bengali: " + r3.value);  // prints 123

        // Strictness.STRICT does NOT prevent this
        Gson strictGson = new GsonBuilder()
            .setStrictness(com.google.gson.Strictness.STRICT)
            .create();
        Payload r4 = strictGson.fromJson("{\"value\": \"123\"}", Payload.class);
        System.out.println("STRICT mode: " + r4.value);  // still prints 123

        // double is SAFE — for comparison
        try {
            new GsonBuilder().create().fromJson("{\"value\": \"123\"}", 
                new Object() { double value; }.getClass());
        } catch (Exception e) {
            System.out.println("double throws: " + e.getClass().getSimpleName()); // throws
        }
    }
}

Security Impact

Any application that validates the raw JSON string before passing it to Gson for
deserialization is vulnerable to a validation bypass. Common patterns at risk:

  • OTP / 2FA bypass — regex [0-9]+ passes on raw string, Gson returns valid integer
  • WAF / firewall bypass — firewall sees non-numeric Unicode string, backend processes
    as integer
  • Financial amount limit bypass — input sanitation runs on raw value, business logic
    runs on parsed integer
  • Audit log evasion — raw logs contain Unicode characters, processed values are ASCII
    integers; forensic trail is broken

The validate-then-parse pattern is extremely common in production Java applications, making
this a realistic and exploitable gap in libraries that process untrusted JSON.

Suggested Fix

Add an ASCII digit pre-validation step inside nextInt() and nextLong() before calling
parseInt() / parseLong():

private void validateAsciiDigits(String s) throws IOException {
    int start = (!s.isEmpty() && s.charAt(0) == '-') ? 1 : 0;
    for (int i = start; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c < '0' || c > '9') {
            throw new MalformedJsonException(
                "Expected ASCII digits but found non-ASCII digit: U+" +
                Integer.toHexString(c).toUpperCase() + " at index " + i + locationString()
            );
        }
    }
}

This should be called on the extracted string before passing it to parseInt() in both
nextInt() and nextLong().

Environment

  • Gson: 2.13.2 (confirmed via MANIFEST.MF Bundle-Version: 2.13.2)
  • Java: OpenJDK 21
  • OS: Linux

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions