Encoding mixups: stop assuming UTF-8 by default and record encoding decisions at lifecycle boundaries

## Problem
We hit encoding failures on Windows where content ended up mixed between `Windows-1252`/`cp1252` and UTF-8. The deeper problem is broader than one code path: Spec Kitty appears biased toward assuming UTF-8 across many reads/writes instead of first establishing what encoding incoming content is actually using.

This showed up in a Windows + Gemini workflow, but the issue is not Gemini-specific and not limited to one artifact. Charter content, mission artifacts, generated markdown, templates, and other persisted text can all be affected if the system decodes too early under a UTF-8 assumption.

## Core issue
The system needs an explicit encoding contract, not scattered best-effort assumptions.

Right now the repo already has some validation/sanitization utilities, but the lifecycle still appears to have gaps around:
- detecting the source encoding when content is first ingested or generated,
- recording the encoding decision or normalization decision as provenance/metadata,
- re-checking that contract at important lifecycle boundaries,
- failing clearly when content is mixed, ambiguous, or already corrupted,
- avoiding silent propagation of mis-decoded text into downstream prompts and artifacts.

## Why this matters
If content is decoded under the wrong assumption once, corruption spreads. By the time a user notices garbled characters, the bad text may already be embedded in charter state, mission files, prompts, logs, or synced artifacts.

Windows-originated content makes this easier to trigger because `cp1252` is still common in some editors, shells, copy/paste paths, and generated output. But the real bug is the product-level assumption that UTF-8 can be treated as the default truth without first detecting and recording the contract.

## Requested behavior
1. Introduce a general encoding-detection chokepoint for externally sourced or newly ingested text content.
2. Record the decided encoding or normalization result in provenance/metadata where the lifecycle depends on it.
3. Re-validate that contract at critical boundaries such as charter load/compile and mission begin/start.
4. If content is mixed or ambiguous, fail with a targeted diagnostic that says what was detected, where, and how to repair it.
5. Normalize persisted markdown/text to UTF-8 only after the source encoding decision is known.
6. Audit broad UTF-8 assumptions so the system does not silently mis-decode content before validation happens.

## Important scope note
This is not just a charter bug.

Charter and mission-begin are important checkpoints because they are high-leverage lifecycle boundaries, but the underlying issue is more general: encoding assumptions are distributed across the system and need a canonical policy plus provenance.

## Acceptance criteria
- Windows-originated `cp1252` content is either safely normalized to UTF-8 or rejected with a precise diagnostic before corruption spreads.
- The system records what encoding contract or normalization decision it relied on for critical lifecycle inputs.
- Mission start and charter-related lifecycle steps do not silently consume mixed-encoding text.
- Users can inspect the recorded encoding decision later when debugging provenance.
- Broad UTF-8 assumptions are reduced behind explicit detection/validation chokepoints rather than remaining ad hoc.

## Notes
Observed in a Windows + Gemini workflow, but the issue should be treated as a general encoding-mixup problem across Spec Kitty rather than a one-off Windows artifact or a single charter-path bug.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding mixups: stop assuming UTF-8 by default and record encoding decisions at lifecycle boundaries #644

Problem

Core issue

Why this matters

Requested behavior

Important scope note

Acceptance criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Encoding mixups: stop assuming UTF-8 by default and record encoding decisions at lifecycle boundaries #644

Description

Problem

Core issue

Why this matters

Requested behavior

Important scope note

Acceptance criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions