Attachments
GsonFullWidthHonestTest.java
GsonTypeTest.java
manifest-2.13.2.txt
test-output-2.13.1.txt
test-output-2.13.2.txt
Summary
Gson silently accepts Unicode non-ASCII digit variants (Full-Width, Arabic-Indic, Bengali,
and other Unicode decimal digit blocks) when deserializing JSON values into int and long
fields. This occurs without any exception, warning, or error — even under Strictness.STRICT
mode. The behavior creates a dangerous gap between what application-layer validators observe
in the raw JSON string and what Gson ultimately produces as an integer value.
Affected Versions
- Gson 2.13.1
- Gson 2.13.2 (latest, verified via
Bundle-Version: 2.13.2 in MANIFEST.MF)
- Likely all prior versions
Root Cause
Two behaviors combine unsafely:
-
JsonReader.nextInt() / nextLong() accept quoted string tokens for numeric fields
and pass the raw string directly to Integer.parseInt() / Long.parseLong() without
any ASCII pre-validation.
-
Integer.parseInt() internally calls Character.digit(c, 10), which recognizes
any Unicode character designated as a decimal digit — not just ASCII 0-9. This is
Java spec behavior, but Gson does not guard against it.
This means the string "123456" (U+FF10–FF15, Full-Width digits) is silently parsed
as the integer 123456.
Notably, double / float fields are not affected because Double.parseDouble() uses
FloatingDecimal internally, which is ASCII-strict and throws NumberFormatException. This
divergence confirms the exact code path and demonstrates this is a Gson-level gap, not a
Java platform issue.
Strictness.STRICT mode does not prevent this. The coercion happens downstream of all
strictness checks, inside parseInt().
Confirmed Vulnerable Input Examples
| JSON Input |
Field Type |
Parsed Value |
Unicode Block |
{"value": "123456"} |
int |
123456 |
Full-Width Digits (U+FF10–FF19) |
{"value": "١٢٣"} |
int |
123 |
Arabic-Indic Digits (U+0660–0669) |
{"value": "১২৩"} |
int |
123 |
Bengali Digits (U+09E6–09EF) |
{"value": "123"} |
int |
123 |
Mixed ASCII + Full-Width |
{"value": 123456} |
int |
123456 |
Unquoted Full-Width (LENIENT) |
Proof of Concept
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.ToNumberPolicy;
public class GsonUnicodeDigitPOC {
static class Payload {
int value;
}
public static void main(String[] args) {
Gson gson = new Gson();
// Full-Width digits (U+FF10-FF19)
String json1 = "{\"value\": \"123456\"}";
Payload r1 = gson.fromJson(json1, Payload.class);
System.out.println("Full-Width: " + r1.value); // prints 123456
// Arabic-Indic digits (U+0660-0669)
String json2 = "{\"value\": \"١٢٣\"}";
Payload r2 = gson.fromJson(json2, Payload.class);
System.out.println("Arabic-Indic: " + r2.value); // prints 123
// Bengali digits (U+09E6-09EF)
String json3 = "{\"value\": \"১২৩\"}";
Payload r3 = gson.fromJson(json3, Payload.class);
System.out.println("Bengali: " + r3.value); // prints 123
// Strictness.STRICT does NOT prevent this
Gson strictGson = new GsonBuilder()
.setStrictness(com.google.gson.Strictness.STRICT)
.create();
Payload r4 = strictGson.fromJson("{\"value\": \"123\"}", Payload.class);
System.out.println("STRICT mode: " + r4.value); // still prints 123
// double is SAFE — for comparison
try {
new GsonBuilder().create().fromJson("{\"value\": \"123\"}",
new Object() { double value; }.getClass());
} catch (Exception e) {
System.out.println("double throws: " + e.getClass().getSimpleName()); // throws
}
}
}
Security Impact
Any application that validates the raw JSON string before passing it to Gson for
deserialization is vulnerable to a validation bypass. Common patterns at risk:
- OTP / 2FA bypass — regex
[0-9]+ passes on raw string, Gson returns valid integer
- WAF / firewall bypass — firewall sees non-numeric Unicode string, backend processes
as integer
- Financial amount limit bypass — input sanitation runs on raw value, business logic
runs on parsed integer
- Audit log evasion — raw logs contain Unicode characters, processed values are ASCII
integers; forensic trail is broken
The validate-then-parse pattern is extremely common in production Java applications, making
this a realistic and exploitable gap in libraries that process untrusted JSON.
Suggested Fix
Add an ASCII digit pre-validation step inside nextInt() and nextLong() before calling
parseInt() / parseLong():
private void validateAsciiDigits(String s) throws IOException {
int start = (!s.isEmpty() && s.charAt(0) == '-') ? 1 : 0;
for (int i = start; i < s.length(); i++) {
char c = s.charAt(i);
if (c < '0' || c > '9') {
throw new MalformedJsonException(
"Expected ASCII digits but found non-ASCII digit: U+" +
Integer.toHexString(c).toUpperCase() + " at index " + i + locationString()
);
}
}
}
This should be called on the extracted string before passing it to parseInt() in both
nextInt() and nextLong().
Environment
- Gson: 2.13.2 (confirmed via MANIFEST.MF
Bundle-Version: 2.13.2)
- Java: OpenJDK 21
- OS: Linux
Thanks,
Attachments
GsonFullWidthHonestTest.java
GsonTypeTest.java
manifest-2.13.2.txt
test-output-2.13.1.txt
test-output-2.13.2.txt
Summary
Gson silently accepts Unicode non-ASCII digit variants (Full-Width, Arabic-Indic, Bengali,
and other Unicode decimal digit blocks) when deserializing JSON values into
intandlongfields. This occurs without any exception, warning, or error — even under
Strictness.STRICTmode. The behavior creates a dangerous gap between what application-layer validators observe
in the raw JSON string and what Gson ultimately produces as an integer value.
Affected Versions
Bundle-Version: 2.13.2in MANIFEST.MF)Root Cause
Two behaviors combine unsafely:
JsonReader.nextInt()/nextLong()accept quoted string tokens for numeric fieldsand pass the raw string directly to
Integer.parseInt()/Long.parseLong()withoutany ASCII pre-validation.
Integer.parseInt()internally callsCharacter.digit(c, 10), which recognizesany Unicode character designated as a decimal digit — not just ASCII
0-9. This isJava spec behavior, but Gson does not guard against it.
This means the string
"123456"(U+FF10–FF15, Full-Width digits) is silently parsedas the integer
123456.Notably,
double/floatfields are not affected becauseDouble.parseDouble()usesFloatingDecimalinternally, which is ASCII-strict and throwsNumberFormatException. Thisdivergence confirms the exact code path and demonstrates this is a Gson-level gap, not a
Java platform issue.
Strictness.STRICTmode does not prevent this. The coercion happens downstream of allstrictness checks, inside
parseInt().Confirmed Vulnerable Input Examples
{"value": "123456"}int123456{"value": "١٢٣"}int123{"value": "১২৩"}int123{"value": "123"}int123{"value": 123456}int123456Proof of Concept
Security Impact
Any application that validates the raw JSON string before passing it to Gson for
deserialization is vulnerable to a validation bypass. Common patterns at risk:
[0-9]+passes on raw string, Gson returns valid integeras integer
runs on parsed integer
integers; forensic trail is broken
The validate-then-parse pattern is extremely common in production Java applications, making
this a realistic and exploitable gap in libraries that process untrusted JSON.
Suggested Fix
Add an ASCII digit pre-validation step inside
nextInt()andnextLong()before callingparseInt()/parseLong():This should be called on the extracted string before passing it to
parseInt()in bothnextInt()andnextLong().Environment
Bundle-Version: 2.13.2)Thanks,