Problem
We hit encoding failures on Windows where content ended up mixed between Windows-1252/cp1252 and UTF-8. The deeper problem is broader than one code path: Spec Kitty appears biased toward assuming UTF-8 across many reads/writes instead of first establishing what encoding incoming content is actually using.
This showed up in a Windows + Gemini workflow, but the issue is not Gemini-specific and not limited to one artifact. Charter content, mission artifacts, generated markdown, templates, and other persisted text can all be affected if the system decodes too early under a UTF-8 assumption.
Core issue
The system needs an explicit encoding contract, not scattered best-effort assumptions.
Right now the repo already has some validation/sanitization utilities, but the lifecycle still appears to have gaps around:
- detecting the source encoding when content is first ingested or generated,
- recording the encoding decision or normalization decision as provenance/metadata,
- re-checking that contract at important lifecycle boundaries,
- failing clearly when content is mixed, ambiguous, or already corrupted,
- avoiding silent propagation of mis-decoded text into downstream prompts and artifacts.
Why this matters
If content is decoded under the wrong assumption once, corruption spreads. By the time a user notices garbled characters, the bad text may already be embedded in charter state, mission files, prompts, logs, or synced artifacts.
Windows-originated content makes this easier to trigger because cp1252 is still common in some editors, shells, copy/paste paths, and generated output. But the real bug is the product-level assumption that UTF-8 can be treated as the default truth without first detecting and recording the contract.
Requested behavior
- Introduce a general encoding-detection chokepoint for externally sourced or newly ingested text content.
- Record the decided encoding or normalization result in provenance/metadata where the lifecycle depends on it.
- Re-validate that contract at critical boundaries such as charter load/compile and mission begin/start.
- If content is mixed or ambiguous, fail with a targeted diagnostic that says what was detected, where, and how to repair it.
- Normalize persisted markdown/text to UTF-8 only after the source encoding decision is known.
- Audit broad UTF-8 assumptions so the system does not silently mis-decode content before validation happens.
Important scope note
This is not just a charter bug.
Charter and mission-begin are important checkpoints because they are high-leverage lifecycle boundaries, but the underlying issue is more general: encoding assumptions are distributed across the system and need a canonical policy plus provenance.
Acceptance criteria
- Windows-originated
cp1252 content is either safely normalized to UTF-8 or rejected with a precise diagnostic before corruption spreads.
- The system records what encoding contract or normalization decision it relied on for critical lifecycle inputs.
- Mission start and charter-related lifecycle steps do not silently consume mixed-encoding text.
- Users can inspect the recorded encoding decision later when debugging provenance.
- Broad UTF-8 assumptions are reduced behind explicit detection/validation chokepoints rather than remaining ad hoc.
Notes
Observed in a Windows + Gemini workflow, but the issue should be treated as a general encoding-mixup problem across Spec Kitty rather than a one-off Windows artifact or a single charter-path bug.
Problem
We hit encoding failures on Windows where content ended up mixed between
Windows-1252/cp1252and UTF-8. The deeper problem is broader than one code path: Spec Kitty appears biased toward assuming UTF-8 across many reads/writes instead of first establishing what encoding incoming content is actually using.This showed up in a Windows + Gemini workflow, but the issue is not Gemini-specific and not limited to one artifact. Charter content, mission artifacts, generated markdown, templates, and other persisted text can all be affected if the system decodes too early under a UTF-8 assumption.
Core issue
The system needs an explicit encoding contract, not scattered best-effort assumptions.
Right now the repo already has some validation/sanitization utilities, but the lifecycle still appears to have gaps around:
Why this matters
If content is decoded under the wrong assumption once, corruption spreads. By the time a user notices garbled characters, the bad text may already be embedded in charter state, mission files, prompts, logs, or synced artifacts.
Windows-originated content makes this easier to trigger because
cp1252is still common in some editors, shells, copy/paste paths, and generated output. But the real bug is the product-level assumption that UTF-8 can be treated as the default truth without first detecting and recording the contract.Requested behavior
Important scope note
This is not just a charter bug.
Charter and mission-begin are important checkpoints because they are high-leverage lifecycle boundaries, but the underlying issue is more general: encoding assumptions are distributed across the system and need a canonical policy plus provenance.
Acceptance criteria
cp1252content is either safely normalized to UTF-8 or rejected with a precise diagnostic before corruption spreads.Notes
Observed in a Windows + Gemini workflow, but the issue should be treated as a general encoding-mixup problem across Spec Kitty rather than a one-off Windows artifact or a single charter-path bug.