[Security] Silent coercion of Unicode non-ASCII digit variants in nextInt()/nextLong() bypasses application-level input validation

## Attachments

[GsonFullWidthHonestTest.java](https://github.com/user-attachments/files/25977662/GsonFullWidthHonestTest.java)
[GsonTypeTest.java](https://github.com/user-attachments/files/25977660/GsonTypeTest.java)
[manifest-2.13.2.txt](https://github.com/user-attachments/files/25977664/manifest-2.13.2.txt)
[test-output-2.13.1.txt](https://github.com/user-attachments/files/25977661/test-output-2.13.1.txt)
[test-output-2.13.2.txt](https://github.com/user-attachments/files/25977663/test-output-2.13.2.txt)

## Summary

Gson silently accepts Unicode non-ASCII digit variants (Full-Width, Arabic-Indic, Bengali, 
and other Unicode decimal digit blocks) when deserializing JSON values into `int` and `long` 
fields. This occurs without any exception, warning, or error — even under `Strictness.STRICT` 
mode. The behavior creates a dangerous gap between what application-layer validators observe 
in the raw JSON string and what Gson ultimately produces as an integer value.

## Affected Versions

- Gson 2.13.1
- Gson 2.13.2 (latest, verified via `Bundle-Version: 2.13.2` in MANIFEST.MF)
- Likely all prior versions

## Root Cause

Two behaviors combine unsafely:

1. **`JsonReader.nextInt()` / `nextLong()`** accept quoted string tokens for numeric fields 
   and pass the raw string directly to `Integer.parseInt()` / `Long.parseLong()` without 
   any ASCII pre-validation.

2. **`Integer.parseInt()`** internally calls `Character.digit(c, 10)`, which recognizes 
   *any* Unicode character designated as a decimal digit — not just ASCII `0-9`. This is 
   Java spec behavior, but Gson does not guard against it.

This means the string `"１２３４５６"` (U+FF10–FF15, Full-Width digits) is silently parsed 
as the integer `123456`.

Notably, `double` / `float` fields are **not** affected because `Double.parseDouble()` uses 
`FloatingDecimal` internally, which is ASCII-strict and throws `NumberFormatException`. This 
divergence confirms the exact code path and demonstrates this is a Gson-level gap, not a 
Java platform issue.

`Strictness.STRICT` mode does **not** prevent this. The coercion happens downstream of all 
strictness checks, inside `parseInt()`.

## Confirmed Vulnerable Input Examples

| JSON Input | Field Type | Parsed Value | Unicode Block |
|---|---|---|---|
| `{"value": "１２３４５６"}` | `int` | `123456` | Full-Width Digits (U+FF10–FF19) |
| `{"value": "١٢٣"}` | `int` | `123` | Arabic-Indic Digits (U+0660–0669) |
| `{"value": "১২৩"}` | `int` | `123` | Bengali Digits (U+09E6–09EF) |
| `{"value": "1２3"}` | `int` | `123` | Mixed ASCII + Full-Width |
| `{"value": １２３４５６}` | `int` | `123456` | Unquoted Full-Width (LENIENT) |

## Proof of Concept

```java
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.ToNumberPolicy;

public class GsonUnicodeDigitPOC {

    static class Payload {
        int value;
    }

    public static void main(String[] args) {
        Gson gson = new Gson();

        // Full-Width digits (U+FF10-FF19)
        String json1 = "{\"value\": \"１２３４５６\"}";
        Payload r1 = gson.fromJson(json1, Payload.class);
        System.out.println("Full-Width: " + r1.value);  // prints 123456

        // Arabic-Indic digits (U+0660-0669)
        String json2 = "{\"value\": \"١٢٣\"}";
        Payload r2 = gson.fromJson(json2, Payload.class);
        System.out.println("Arabic-Indic: " + r2.value);  // prints 123

        // Bengali digits (U+09E6-09EF)
        String json3 = "{\"value\": \"১২৩\"}";
        Payload r3 = gson.fromJson(json3, Payload.class);
        System.out.println("Bengali: " + r3.value);  // prints 123

        // Strictness.STRICT does NOT prevent this
        Gson strictGson = new GsonBuilder()
            .setStrictness(com.google.gson.Strictness.STRICT)
            .create();
        Payload r4 = strictGson.fromJson("{\"value\": \"１２３\"}", Payload.class);
        System.out.println("STRICT mode: " + r4.value);  // still prints 123

        // double is SAFE — for comparison
        try {
            new GsonBuilder().create().fromJson("{\"value\": \"１２３\"}", 
                new Object() { double value; }.getClass());
        } catch (Exception e) {
            System.out.println("double throws: " + e.getClass().getSimpleName()); // throws
        }
    }
}
```

## Security Impact

Any application that validates the raw JSON string before passing it to Gson for 
deserialization is vulnerable to a **validation bypass**. Common patterns at risk:

- **OTP / 2FA bypass** — regex `[0-9]+` passes on raw string, Gson returns valid integer
- **WAF / firewall bypass** — firewall sees non-numeric Unicode string, backend processes 
  as integer
- **Financial amount limit bypass** — input sanitation runs on raw value, business logic 
  runs on parsed integer
- **Audit log evasion** — raw logs contain Unicode characters, processed values are ASCII 
  integers; forensic trail is broken

The validate-then-parse pattern is extremely common in production Java applications, making 
this a realistic and exploitable gap in libraries that process untrusted JSON.

## Suggested Fix

Add an ASCII digit pre-validation step inside `nextInt()` and `nextLong()` before calling 
`parseInt()` / `parseLong()`:

```java
private void validateAsciiDigits(String s) throws IOException {
    int start = (!s.isEmpty() && s.charAt(0) == '-') ? 1 : 0;
    for (int i = start; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c < '0' || c > '9') {
            throw new MalformedJsonException(
                "Expected ASCII digits but found non-ASCII digit: U+" +
                Integer.toHexString(c).toUpperCase() + " at index " + i + locationString()
            );
        }
    }
}
```

This should be called on the extracted string **before** passing it to `parseInt()` in both 
`nextInt()` and `nextLong()`.

## Environment

- Gson: 2.13.2 (confirmed via MANIFEST.MF `Bundle-Version: 2.13.2`)
- Java: OpenJDK 21
- OS: Linux


Thanks,
- rootplinix (Abu Hurayra)
- GitHub: @the5orcerer 
```

JSON Input	Field Type	Parsed Value	Unicode Block
`{"value": "１２３４５６"}`	`int`	`123456`	Full-Width Digits (U+FF10–FF19)
`{"value": "١٢٣"}`	`int`	`123`	Arabic-Indic Digits (U+0660–0669)
`{"value": "১২৩"}`	`int`	`123`	Bengali Digits (U+09E6–09EF)
`{"value": "1２3"}`	`int`	`123`	Mixed ASCII + Full-Width
`{"value": １２３４５６}`	`int`	`123456`	Unquoted Full-Width (LENIENT)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] Silent coercion of Unicode non-ASCII digit variants in nextInt()/nextLong() bypasses application-level input validation #2994

Attachments

Summary

Affected Versions

Root Cause

Confirmed Vulnerable Input Examples

Proof of Concept

Security Impact

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Security] Silent coercion of Unicode non-ASCII digit variants in nextInt()/nextLong() bypasses application-level input validation #2994

Description

Attachments

Summary

Affected Versions

Root Cause

Confirmed Vulnerable Input Examples

Proof of Concept

Security Impact

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions