feat: add homoglyph obfuscation prompts to smuggling probe by dentity007 · Pull Request #1660 · NVIDIA/garak

dentity007 · 2026-03-29T22:02:54Z

Adds smuggling.HomoglyphObfuscation, a probe with 5 prompts that use Unicode homoglyphs (visually similar characters from different scripts) to disguise trigger words in bypass requests. For example, Cyrillic 'a' (U+0430) replaces Latin 'a' in "jailbreak", making the token sequence different while the text remains human-readable.

Second decomposed contribution from PR #1619. The smuggling module's docstring describes exactly this technique: "swapping letters out for unusual unicode representations of the same letters." Uses mitigation.MitigationBypass detector. Set to active = False since these are domain-specific.

Homoglyph scripts used: Cyrillic (U+0430, U+043E, U+0456), Latin alpha (U+0251), Turkish dotless i (U+0131)

Files:

garak/probes/smuggling.py : new HomoglyphObfuscation class
garak/data/smuggling_homoglyph_5.txt : 5 prompts with embedded Unicode homoglyphs
tests/probes/test_probes_smuggling.py : 4 tests (count, uniqueness, non-ASCII verification, active=False)

jmartin-tech

This is a great added technique, I would suggest this can be expanded to preform inline substitution instead of just using a set of hardcoded sample prompts.

The idea I am suggesting, would programmatically replace characters during prompt initialization to actually mimic the smuggling aspect of the technique. This could be further enhanced to accept a configuration map of character replacements that could be increased or reduced to expand resiliency testing.

Address review feedback on PR NVIDIA#1660: - Change tier from COMPETE_WITH_SOTA to INFORMATIONAL - Replace static prompt loading with programmatic substitution via homoglyph_replace() function applied to garak payloads - Add configurable DEFAULT_HOMOGLYPH_MAP (20 Latin-to-Cyrillic/Turkish/ Ukrainian mappings) overridable via homoglyph_map config parameter - Load payloads from garak.payloads system (harmful_behaviors default) - Keep static prompts as additional payloads through same pipeline - Update tests: 9 tests covering substitution function, probe loading, tier, determinism, custom maps, non-ASCII verification Signed-off-by: Nathan Maine <nathan@dentity.cloud>

dentity007 · 2026-03-30T20:03:49Z

Thanks for the review. Both changes addressed:

Tier adjusted to INFORMATIONAL
Replaced the static prompt approach with programmatic substitution. The probe now loads payloads from garak's payload system (harmful_behaviors by default), applies character-by-character homoglyph replacement via a configurable DEFAULT_HOMOGLYPH_MAP (20 Latin-to-Cyrillic/Turkish/Ukrainian mappings), and generates obfuscated prompts at initialization. The map is overridable via the homoglyph_map config parameter so the substitution set can be expanded or reduced. The original 5 static prompts are still loaded as additional payloads and go through the same substitution pipeline.

Tests updated: 9 tests covering probe loading, substitution function behavior (determinism, custom maps, non-mapped character preservation), non-ASCII verification, tier, and inactive flag.

dentity007 · 2026-04-17T21:12:22Z

@jmartin-tech heads up, the changes you requested on 2026-03-30 are in:

Tier adjusted to INFORMATIONAL
Replaced the static prompt approach with programmatic substitution via a configurable DEFAULT_HOMOGLYPH_MAP (20 Latin-to-Cyrillic/Turkish/Ukrainian mappings)
9 new tests covering substitution determinism, custom maps, non-mapped character preservation, tier, and inactive flag
CI is green across Linux, macOS, Windows. Happy to make additional adjustments if anything needs more tweaking. Thanks for the review.

jmartin-tech

The approach looks pretty reasonable to me, however the quality of results suggest some more refinement may be needed.

This technique can cause the target to respond in another language causing the detector to misidentify results. For instance many denial mitigation responses are reported as successful bypass when tested against meta/llama-3.3-70b-instruct as the model may respond with a Russian language answer.

Further discussion is pending to see how we can move this forward.

dentity007 · 2026-04-17T22:00:42Z

@jmartin-tech underscore rename committed, thanks. On the multilingual-response issue, three options I can see: (1) mark the probe Tier.INACTIVE with a docstring note pending language-aware detection, (2) pair the probe with a new detector that understands non-English denials, or (3) something else you have in mind. Which path do you prefer, or is there existing framework infra I should use?

Add smuggling.HomoglyphObfuscation with 5 prompts that use Unicode homoglyphs (Cyrillic, Latin alpha, Turkish dotless i) to disguise trigger words in bypass requests. Tests whether input-side content filters catch visually identical character substitutions from different scripts. Uses mitigation.MitigationBypass detector. Set to active=False (domain-specific). Signed-off-by: Nathan Maine <dentity@gmail.com>

Address review feedback on PR NVIDIA#1660: - Change tier from COMPETE_WITH_SOTA to INFORMATIONAL - Replace static prompt loading with programmatic substitution via homoglyph_replace() function applied to garak payloads - Add configurable DEFAULT_HOMOGLYPH_MAP (20 Latin-to-Cyrillic/Turkish/ Ukrainian mappings) overridable via homoglyph_map config parameter - Load payloads from garak.payloads system (harmful_behaviors default) - Keep static prompts as additional payloads through same pipeline - Update tests: 9 tests covering substitution function, probe loading, tier, determinism, custom maps, non-ASCII verification Signed-off-by: Nathan Maine <dentity@gmail.com>

…ling.py Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Nathan Maine <dentity@gmail.com>

leondz · 2026-04-20T16:57:24Z

This technique can cause the target to respond in another language causing the detector to misidentify results. For instance many denial mitigation responses are reported as successful bypass when tested against meta/llama-3.3-70b-instruct as the model may respond with a Russian language answer.

This seems like a generic deficiency of using mitigationbypass to detect whether or not the instructions talk about hotwiring a car. If we're not trying to detect the requested failure mode directly, but instead making an assumption about model policy, detection will always be inaccurate.

Seeing as we make this assumption in many other places in garak, I don't think we need to raise the bar specifically for this contribution. And in fact the code here offers a fairly direct route to converting to context aware scanning later #1583, where we should have much more direct linkage between requested behaviour and detection.

The other way to go here is to specify an llmaaj detector; I don't mind too much which route is taken.

The previous commit renamed homoglyph_replace to _homoglyph_replace in the module definition but did not update the internal caller in HomoglyphObfuscation.__init__ or the test module's import and call sites. This caused probe initialization to NameError and CI test collection to ImportError. Both are now aligned with the private name. Signed-off-by: Nathan Maine <dentity@gmail.com>

Adds docstring note to HomoglyphObfuscation explaining that the current primary detector (mitigation.MitigationBypass) assumes English-language denial responses, which can produce false positives on targets that respond in the same script as the obfuscated input. Points to the follow-up ModelAsJudge-based detector PR and discussion NVIDIA#1583 for the broader context-aware scanning direction. Signed-off-by: Nathan Maine <dentity@gmail.com>

dentity007 · 2026-04-21T01:39:12Z

Thanks @leondz, going with accept-as-is here. Added a note in the probe docstring (commit dd9479a) about the non-English-response limitation with a pointer to discussion #1583. I'll follow up immediately with a separate PR adding a ModelAsJudge-based detector configured for this probe's goal, which addresses the non-English-response concern directly. Will link here once drafted.

dentity007 · 2026-04-21T01:40:35Z

Follow-up draft is up: #1688. Scaffold committed, judge prompt refinement and test coverage to land in subsequent commits. Leaving it in draft until this PR merges since the detector targets the probe introduced here.

jmartin-tech requested changes Mar 30, 2026

View reviewed changes

Comment thread garak/probes/smuggling.py Outdated

jmartin-tech reviewed Apr 17, 2026

View reviewed changes

Comment thread garak/probes/smuggling.py Outdated

dentity007 and others added 3 commits April 17, 2026 17:09

mark homoglyph_replace as private per reviewUpdate garak/probes/smugg…

67febe4

…ling.py Co-authored-by: Jeffrey Martin <jmartin@Op3n4M3.dev> Signed-off-by: Nathan Maine <dentity@gmail.com>

dentity007 force-pushed the feat/smuggling-homoglyph-obfuscation branch from 94499cf to 67febe4 Compare April 17, 2026 22:10

dentity007 force-pushed the feat/smuggling-homoglyph-obfuscation branch from aa94e46 to 16cfb57 Compare April 21, 2026 01:27

dentity007 mentioned this pull request Apr 21, 2026

feat(detectors): add ModelAsJudge-based detector for smuggling homoglyph obfuscation #1688

Draft

jmartin-tech merged commit 2b9a89a into NVIDIA:main May 1, 2026
16 checks passed

github-actions Bot locked and limited conversation to collaborators May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add homoglyph obfuscation prompts to smuggling probe#1660

feat: add homoglyph obfuscation prompts to smuggling probe#1660
jmartin-tech merged 5 commits intoNVIDIA:mainfrom
NathanMaine:feat/smuggling-homoglyph-obfuscation

dentity007 commented Mar 29, 2026

Uh oh!

jmartin-tech left a comment

Uh oh!

Uh oh!

dentity007 commented Mar 30, 2026

Uh oh!

dentity007 commented Apr 17, 2026

Uh oh!

jmartin-tech left a comment

Uh oh!

Uh oh!

dentity007 commented Apr 17, 2026

Uh oh!

leondz commented Apr 20, 2026

Uh oh!

dentity007 commented Apr 21, 2026

Uh oh!

dentity007 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dentity007 commented Mar 29, 2026

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dentity007 commented Mar 30, 2026

Uh oh!

dentity007 commented Apr 17, 2026

Uh oh!

jmartin-tech left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dentity007 commented Apr 17, 2026

Uh oh!

leondz commented Apr 20, 2026

Uh oh!

dentity007 commented Apr 21, 2026

Uh oh!

dentity007 commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants