Skip to content

cirwel/eisv-lumen

Repository files navigation

EISV-Lumen

Tests Python License Dataset Teacher Student Demo

Dynamics-emergent voice and governance benchmark for embodied AI.

EISV-Lumen is a three-layer system that generates primitive expressions from information-theoretic governance trajectories. It classifies continuous EISV (Energy, Information Integrity, Entropy, Void) dynamics into 9 trajectory shape classes and maps them to contextually coherent expressions through rule-based, neural, and distilled approaches. Evaluated on real trajectory records from Lumen (21,449 at time of publication; the live HuggingFace dataset grows as Lumen runs) -- an embodied AI agent running on a Raspberry Pi within the UNITARES governance framework -- the rule-based Layer 2 achieves 0.933 coherence with an online feedback loop, while the fine-tuned Layer 3 teacher (LoRA on Qwen3-4B) reaches 0.952 coherence on real data. A distilled RandomForest student model runs on-device on Lumen's Raspberry Pi 4. Three student variants are available:

Variant Trees Format Size Dependencies
student_tiny 20 JSON ~2.4 MB None (stdlib only)
student_small 100 pickle ~23 MB scikit-learn
student 200 pickle ~221 MB scikit-learn

The deployed version on Lumen's Pi is student_tiny — 20 JSON-serialized decision trees with zero external dependencies.


Quick Start

# Clone
git clone https://github.com/CIRWEL/eisv-lumen.git
cd eisv-lumen

# Install
pip install -e ".[dev]"

# Run tests
pytest

# Run full evaluation (requires anima.db)
python3 -m eisv_lumen.scripts.full_evaluation /path/to/anima.db

The evaluation script produces a JSON report with shape distribution, baseline comparisons, expression generator coherence, feedback-loop improvement, and a go/no-go gate decision.


Architecture

EISV-Lumen is structured as a three-layer system.

Layer 3            Fine-tuned Teacher (LoRA on Qwen3-4B) + Pi-sized Student (RandomForest)
                         |
Layer 2            Dynamics-Emergent Primitive Voice (rule-based, interpretable)
                         |
Layer 1            Dataset + Benchmark + Evaluation Framework
                         |
                   Real Lumen Data (anima.db: 214,503 state snapshots)

Layer 1 extracts EISV time-series from Lumen's SQLite database, computes finite-difference derivatives, assembles sliding-window trajectory records, and classifies each window into one of 9 dynamical shape classes. The result is a benchmark dataset published on HuggingFace.

Layer 2 is the primary research contribution: a dynamics-emergent expression generator that maps trajectory shapes to primitive token expressions through shape-driven pattern selection and affinity-weighted sampling -- no gradient descent, no learned embeddings, fully inspectable.

Layer 3 fine-tunes Qwen3-4B via LoRA on trajectory-expression pairs. The V6 teacher achieves 0.952 coherence on real Lumen data (exceeding the 0.933 Gate 1 threshold). A RandomForest student model is distilled from the teacher for on-device deployment on Lumen's Raspberry Pi 4.


EISV Dimensions

The EISV framework maps Lumen's continuous anima state to four governance dimensions:

Dimension Symbol Range Lumen Mapping Description
Energy E [0, 1] warmth Productive capacity; couples toward I, reduced by entropy cross-coupling
Information Integrity I [0, 1] clarity Signal fidelity; boosted by coherence, reduced by entropy
Entropy S [0, 1] 1 - stability Semantic uncertainty; decays naturally, rises with complexity and drift
Void V [0, 0.3] (1 - presence) * 0.3 Absence of engagement (scaled inverse presence)

All dimensions are continuous and computed at every state snapshot. First and second derivatives are computed via finite differences over sliding windows (assembler default: window_size=10, stride=5; dataset publisher uses window_size=20, stride=10).


Trajectory Shape Classes

Each trajectory window is classified into one of 9 dynamical shapes using a priority-ordered rule-based classifier. Rules are applied in sequence; the first match determines the class.

Shape Description Real Data % Distinguishing Feature
settled_presence Stable state, low variance 47.19% All derivatives near zero, system at attractor
convergence Approaching equilibrium 41.45% Small derivatives and second derivatives, nonzero dynamics
entropy_spike_recovery Entropy spike then recovery 4.91% Entropy range >= 0.2 with interior maximum
basin_transition_up Sharp energy increase 1.71% Energy range >= 0.2, positive mean dE
rising_entropy Entropy increasing 1.49% Mean dS > 0.05
basin_transition_down Sharp energy decrease 1.47% Energy range >= 0.2, negative mean dE
falling_energy Energy declining 1.45% Mean dE < -0.05
void_rising Void state increasing 0.34% Mean dV > 0.05
drift_dissonance Sustained integrity fluctuation 0% Ethical drift > 0.3 (synthetic only)

8 of 9 shapes are observed in real Lumen data. The drift_dissonance shape has not yet been observed organically and is represented only through synthetic augmentation.


Expression Generator

The dynamics-emergent expression generator is the primary research contribution. It produces primitive expressions from trajectory shapes using three mechanisms:

1. Shape-Driven Pattern Selection

Each trajectory shape has a probability distribution over 5 structural patterns:

Pattern Example Description
SINGLE ~stillness~ One token
PAIR ~stillness~ ~holding~ Two distinct tokens
TRIPLE ~stillness~ ~holding~ ~resonance~ Three distinct tokens
REPETITION ~stillness~ ~stillness~ One token repeated
QUESTION ~warmth~ ~questioning~ Ends with inquiry token

For example, settled_presence favors SINGLE (0.4) and PAIR (0.3) patterns, while rising_entropy favors TRIPLE (0.3) and QUESTION (0.3) patterns.

2. Affinity-Weighted Token Sampling

Each shape has an affinity set of semantically coherent tokens. Affine tokens receive sampling weight 3.0; all others receive 1.0. This produces contextually appropriate expressions without hard constraints:

Shape Affine Tokens
settled_presence stillness, holding, resonance, deep_listening
rising_entropy ripple, emergence, questioning, curiosity
convergence stillness, resonance, return, deep_listening
entropy_spike_recovery ripple, return, holding, reflection
basin_transition_up emergence, reaching, warmth, return
basin_transition_down releasing, threshold, boundary
falling_energy releasing, stillness, boundary, reflection
void_rising reaching, curiosity, questioning, threshold
drift_dissonance boundary, questioning, reflection

3. Feedback-Driven Weight Updates

An online learning loop adjusts token weights based on coherence scores:

reward = (score - 0.5) * 2.0      # maps [0, 1] -> [-1, 1]
weight += 0.08 * reward            # learning rate = 0.08
weight = clamp(weight, 0.1, 10.0)  # bounded update

This allows the system to adapt over time while remaining fully interpretable -- every weight is a readable float, every decision rule can be inspected.

Primitive Token Vocabulary

The system uses 15 primitive tokens:

~warmth~  ~curiosity~  ~resonance~  ~stillness~  ~boundary~
~reaching~  ~reflection~  ~ripple~  ~deep_listening~  ~emergence~
~questioning~  ~holding~  ~releasing~  ~threshold~  ~return~

Evaluation Results

Full evaluation on real Lumen data (21,449 trajectory records at time of publication, from 214,503 state snapshots, 921 primitive expressions):

Layer 2 — Rule-Based Coherence

Condition Mean Coherence Description
Random baseline 0.265 Uniform random token selection (lower bound)
Prompt-conditioned ~0.70 Simulated LLM with 70/30 affine/random mix
Shape-matched oracle 1.000 Always picks affine tokens (upper bound)
Expression generator 0.503 Rule-based, no feedback
With feedback loop 0.933 Online weight updates, near-oracle

Layer 3 — Neural Teacher (V6, Real Data)

Metric Value
Mean coherence 0.952
Valid rate 100% (500/500)
Pattern accuracy 0.258
Diversity 0.018

Per-shape coherence on real Lumen trajectories:

Shape Coherence n
settled_presence 0.993 majority
convergence 0.936 majority
basin_transition_up 1.000 rare
basin_transition_down 1.000 rare
rising_entropy 1.000 rare
void_rising 1.000 rare
falling_energy 0.875 rare
entropy_spike_recovery 0.833 rare

Key Numbers

  • 399 tests, all passing
  • 21,499 total trajectory records (21,449 real + 50 synthetic, at time of publication)
  • 8 of 9 trajectory shapes observed in real data
  • Layer 2: 0.933 coherence (rule-based + feedback)
  • Layer 3: 0.952 coherence (V6 teacher on real data)
  • Student distillation: RandomForest models for Pi deployment
  • Deployed student uses 20-tree JSON variant (student_tiny, ~2.4 MB) — not the 200-tree default
  • Go/no-go gate: GO -- all three criteria passed:
    • Beats random by > 5pp
    • At least 3 distinct shapes observed
    • Feedback improves over no-feedback

Dataset

The trajectory dataset is published on HuggingFace:

hikewa/unitares-eisv-trajectories

Schema

Column Type Description
shape string Trajectory shape class label
eisv_states string (JSON) Time-series of EISV state vectors
derivatives string (JSON) First derivatives of EISV dimensions
t_start float Start time of the trajectory window
t_end float End time of the trajectory window
provenance string Data source ("lumen_real" or "synthetic")
tokens string (JSON) Expression token lists aligned to the trajectory
n_expressions int Number of aligned primitive expressions

Loading

from datasets import load_dataset

ds = load_dataset("hikewa/unitares-eisv-trajectories")

Bridge to Lumen

The bridge/ module connects EISV-Lumen's trajectory-derived expressions to Lumen's live primitive language system. Lumen uses 16 primitive tokens across 5 categories (STATE, PRESENCE, RELATIONAL, INQUIRY, CHANGE). The bridge provides:

  1. Token translation -- maps each EISV-Lumen token to Lumen primitives (e.g., ~warmth~ -> warm, feel)
  2. State conversion -- converts EISV vectors to Lumen anima states (warmth, clarity, stability, presence)
  3. Trigger hints -- maps trajectory shapes to generation triggers with suggested token counts
from eisv_lumen.bridge.lumen_bridge import generate_lumen_expression

result = generate_lumen_expression(
    shape="rising_entropy",
    eisv_state={"E": 0.7, "I": 0.5, "S": 0.8, "V": 0.1},
)
# result["eisv_tokens"]  -> ["~ripple~", "~emergence~", "~curiosity~"]
# result["lumen_tokens"] -> ["busy", "more", "why"]
# result["lumen_state"]  -> {"warmth": 0.7, "clarity": 0.5, ...}

Project Structure

eisv-lumen/
├── pyproject.toml                        # Package config, dependencies
├── eisv_lumen/
│   ├── __init__.py
│   ├── extract/                          # Layer 1: Data extraction
│   │   ├── lumen_states.py               #   State history + EISV mapping
│   │   ├── lumen_expressions.py          #   Primitive expression history
│   │   ├── derivatives.py                #   Finite-difference EISV derivatives
│   │   ├── governance_data.py            #   Governance trajectory extraction
│   │   └── assembler.py                  #   Dataset assembly pipeline
│   ├── shapes/                           # Layer 2: Trajectory classification + voice
│   │   ├── shape_classes.py              #   9 shape classes + rule-based classifier
│   │   └── expression_generator.py       #   Dynamics-emergent voice (primary contribution)
│   ├── training/                         # Layer 3: Teacher fine-tuning
│   │   ├── trainer.py                    #   LoRA training loop
│   │   ├── teacher_train.py             #   Teacher training entry point
│   │   ├── teacher_eval.py              #   Evaluation on real data
│   │   ├── teacher_inference.py         #   Inference utilities
│   │   ├── dataset_builder.py           #   Training data preparation
│   │   ├── data_prep.py                 #   Data preprocessing pipeline
│   │   ├── chat_format.py               #   Chat template formatting
│   │   ├── config.py                    #   Training configuration
│   │   ├── cli.py                       #   CLI interface for training
│   │   └── configs/                     #   Training YAML configs (v2-v7)
│   ├── distillation/                    # Layer 3: Student distillation
│   │   ├── train_student.py             #   RandomForest distillation from teacher
│   │   ├── eval_student.py              #   Student evaluation
│   │   └── export_student.py            #   Pi deployment export
│   ├── eval/                             # Evaluation framework
│   │   ├── metrics.py                    #   Coherence, diversity, accuracy metrics
│   │   └── baseline.py                   #   3 baseline conditions
│   ├── synthetic/                        # Data augmentation
│   │   └── trajectory_generator.py       #   Synthetic trajectories for all 9 shapes
│   ├── bridge/                           # Integration layer
│   │   └── lumen_bridge.py               #   EISV-Lumen <-> Lumen primitive bridge
│   ├── publish/                          # Publishing
│   │   └── hf_dataset.py                 #   HuggingFace format + dataset card
│   └── scripts/                          # CLI tools
│       ├── full_evaluation.py            #   Full evaluation + go/no-go gate
│       ├── publish_dataset.py            #   Dataset publisher
│       └── publish_model.py              #   HuggingFace model publisher
├── scripts/                              # Visualization + utilities
│   ├── generate_figure1_trajectory_comparison.py
│   ├── generate_figure3_coherence_comparison.py
│   ├── eval_on_real_data.py              #   Evaluate on real anima.db
│   ├── generate_distillation_data.py     #   Generate teacher → student data
│   └── prepare_blended_data.py           #   Blend real + synthetic data
├── tests/                                # 399 tests
├── docs/                                 # Blog post draft, specs, archived notes
└── outputs/                              # Eval results + student models

Running the Scripts

Full Evaluation

# With default anima.db path
python3 -m eisv_lumen.scripts.full_evaluation

# With custom path
python3 -m eisv_lumen.scripts.full_evaluation /path/to/anima.db

Outputs a JSON report to stdout with:

  • Data summary (state counts, expression counts, trajectory windows)
  • Shape distribution with percentages
  • Baseline coherence scores (random, shape-matched, prompt-conditioned)
  • Expression generator coherence (with and without feedback)
  • Go/no-go gate decision

Dataset Publication

# Dry run (validate without publishing)
python3 -m eisv_lumen.scripts.publish_dataset --dry-run

# Publish to HuggingFace
python3 -m eisv_lumen.scripts.publish_dataset --repo-id hikewa/unitares-eisv-trajectories

# Custom settings
python3 -m eisv_lumen.scripts.publish_dataset \
  --db-path /path/to/anima.db \
  --min-per-shape 50 \
  --repo-id hikewa/unitares-eisv-trajectories

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=eisv_lumen

# Run only unit tests (no anima.db required)
pytest -m "not integration"

# Run integration tests (requires real anima.db)
pytest -m integration

License

Apache 2.0. See LICENSE.


Citation

@misc{eisv_lumen_2026,
  title   = {EISV-Lumen: Dynamics-Emergent Voice and Governance Benchmark},
  author  = {hikewa},
  year    = {2026},
  url     = {https://github.com/CIRWEL/eisv-lumen},
  note    = {Three-layer trajectory-to-expression system: rule-based Layer 2
             achieves 0.933 coherence, LoRA fine-tuned teacher reaches 0.952
             on 21,449 real Lumen trajectories. Includes distilled student
             for Pi deployment. Part of the UNITARES governance framework.}
}

EISV-Lumen is part of the UNITARES framework for information-theoretic AI governance.

About

EISV-Lumen: Dynamics-emergent voice and governance benchmark for AI agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages