fix(state): address PR #19508 P1+P2 review feedback

Hermes Sovereign AgentCore · claude · Hermes Sovereign AgentCore · commit fe9d7adee3a4 · 2026-05-05T09:08:42.000Z
Mac 3-lens review (architect / code-review / security) flagged 3 P1
correctness bugs. Fixes:

P1.1 — Calendar-aware EMA weighting (hermes_state.py)
  Was: raw_weights = [alpha * (1-alpha)**(n-1-i) for i in range(n)]
  Now: weights by (today_utc - day).days. Days with no data correctly
  get zero weight (absent from result); gaps don't compress older data.
  Defeats the index-based bias toward sparsely-running skills.

P1.2 — Volume-weighted cost / duration EMA (hermes_state.py + view)
  Was: ema_X = sum(w * avg_X) — biased equal-weight across days
       regardless of invocation count.
  Now: skill_stats_daily exposes total_duration_s / total_cost_usd
  alongside avg_*. EMA computed as
     sum(w_d * total_d) / sum(w_d * count_d)
  so a day with 100 fast calls correctly outweighs a day with 1 slow
  call within the same EMA term.

P1.3 — Surface telemetry write failures (cron/scheduler.py)
  Was: except: logger.debug(...) — silent swallow at DEBUG; broken
       writer invisible until dashboard goes empty.
  Now: logger.warning(..., exc_info=True). Operator sees regressions;
  cron still cannot fail (warning never raises).

Plus P2.1 (multi-skill cron cost split — divides session cost evenly
when len(job["skills"]) &gt; 1, preventing per-skill EMA cost-doubling),
P2.2 (DROP VIEW IF EXISTS before CREATE for migration idempotency),
and P3 nits (dead conditional on error_msg slice; docstring half-life
math corrected — α=0.3 ≈ 1.94d, not 5d; α≈0.129 for true 5-day).

Smoke-tested off-tree:
  skill-a-gappy (days 0,1,7) success_rate=0.954  ← gap correctly
                                                   down-weights day-7
  skill-b-tight (days 0,1,2) success_rate=1.000
  skill-c-vol  (1 slow + 100 fast) duration=1.69s cost=$0.0024/call
                                                   ← volume-weighted

Constraint: pure additive view change — DROP+CREATE on every connect
is safe because skill_stats_daily has no rowids/triggers depending on
its identity. Existing rows in skill_invocations untouched.
Confidence: high (3 scenarios validated)
Scope-risk: narrow (same file boundaries as original PR)
Not-tested: multi-skill cron live-fire (smoke uses synthetic data)
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/cron/scheduler.py b/cron/scheduler.py
@@ -1363,7 +1363,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
     except Exception as e:
         error_msg = f"{type(e).__name__}: {str(e)}"
         logger.exception("Job '%s' failed: %s", job_name, error_msg)
-        _skill_outcome = (False, error_msg[:200] if error_msg else None)
+        # error_msg is always a non-empty f-string here; slice unconditionally.
+        _skill_outcome = (False, error_msg[:200])
 
         output = f"""# Cron Job: {job_name} (FAILED)
 
@@ -1402,17 +1403,39 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
             # sourced from the agent session row. Slash-command and
             # ad-hoc skill_view calls are not tracked here in v1.
             if _skill_outcome is not None and _invoked_at is not None:
-                _job_skills = job.get("skills") or []
+                _job_skills = [
+                    str(_s).strip() for _s in (job.get("skills") or [])
+                    if str(_s).strip()
+                ]
                 if _job_skills:
                     try:
                         _completed_at = _hermes_now().timestamp()
                         _sess = _session_db.get_session(_cron_session_id) or {}
                         _success_flag, _end_reason_val = _skill_outcome
                         _duration = _completed_at - _invoked_at
-                        for _skill_name in _job_skills:
-                            _sn = str(_skill_name).strip()
-                            if not _sn:
-                                continue
+                        # P2.1 — multi-skill cost split. When a cron loads
+                        # >1 skill, the underlying agent run is shared and
+                        # the session-row cost is therefore SHARED. Splitting
+                        # evenly avoids each skill's ema_cost_per_call double-
+                        # counting the same dollars. Single-skill crons (the
+                        # current analyzer pattern) get exclusive attribution
+                        # unchanged. v2 candidate: introduce an `attribution`
+                        # column to make this explicit per-row.
+                        _n_skills = max(1, len(_job_skills))
+                        _split = lambda v: (
+                            (v / _n_skills) if _n_skills > 1 and v is not None
+                            else v
+                        )
+                        _split_int = lambda v: (
+                            (int(v) // _n_skills) if _n_skills > 1
+                            else int(v)
+                        )
+                        _shared_cost = _split(_sess.get("estimated_cost_usd"))
+                        _shared_in = _split_int(_sess.get("input_tokens") or 0)
+                        _shared_out = _split_int(_sess.get("output_tokens") or 0)
+                        _shared_cr = _split_int(_sess.get("cache_read_tokens") or 0)
+                        _shared_cw = _split_int(_sess.get("cache_write_tokens") or 0)
+                        for _sn in _job_skills:
                             _session_db.record_skill_invocation(
                                 skill_name=_sn,
                                 invoked_at=_invoked_at,
@@ -1422,18 +1445,23 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
                                 duration_seconds=_duration,
                                 model=_sess.get("model"),
                                 provider=_sess.get("billing_provider"),
-                                input_tokens=int(_sess.get("input_tokens") or 0),
-                                output_tokens=int(_sess.get("output_tokens") or 0),
-                                cache_read_tokens=int(_sess.get("cache_read_tokens") or 0),
-                                cache_write_tokens=int(_sess.get("cache_write_tokens") or 0),
-                                estimated_cost_usd=_sess.get("estimated_cost_usd"),
+                                input_tokens=_shared_in,
+                                output_tokens=_shared_out,
+                                cache_read_tokens=_shared_cr,
+                                cache_write_tokens=_shared_cw,
+                                estimated_cost_usd=_shared_cost,
                                 success=_success_flag,
                                 end_reason=_end_reason_val,
                             )
                     except (Exception, KeyboardInterrupt) as e:
-                        logger.debug(
+                        # Build #87 telemetry foundation: a silent failure here
+                        # means the dashboard goes blank with no signal of
+                        # the regression. Surface at WARNING so the operator
+                        # sees it; keep the swallow so a broken writer can't
+                        # fail a cron run.
+                        logger.warning(
                             "Job '%s': failed to record skill invocations: %s",
-                            job_id, e,
+                            job_id, e, exc_info=True,
                         )
             try:
                 _session_db.end_session(_cron_session_id, "cron_complete")
diff --git a/hermes_state.py b/hermes_state.py
@@ -21,6 +21,7 @@
 import sqlite3
 import threading
 import time
+from datetime import datetime, timezone
 from pathlib import Path
 
 from agent.memory_manager import sanitize_context
@@ -124,7 +125,8 @@
 CREATE INDEX IF NOT EXISTS idx_skill_invocations_cron ON skill_invocations(cron_id, invoked_at DESC);
 CREATE INDEX IF NOT EXISTS idx_skill_invocations_session ON skill_invocations(session_id);
 
-CREATE VIEW IF NOT EXISTS skill_stats_daily AS
+DROP VIEW IF EXISTS skill_stats_daily;
+CREATE VIEW skill_stats_daily AS
 SELECT
     skill_name,
     model,
@@ -133,6 +135,7 @@
     COUNT(*) AS invocation_count,
     SUM(CASE WHEN success = 1 THEN 1 ELSE 0 END) AS success_count,
     SUM(CASE WHEN success = 0 THEN 1 ELSE 0 END) AS failure_count,
+    SUM(duration_seconds) AS total_duration_s,
     AVG(duration_seconds) AS avg_duration_s,
     SUM(estimated_cost_usd) AS total_cost_usd,
     AVG(estimated_cost_usd) AS avg_cost_usd,
@@ -788,10 +791,20 @@ def query_skill_ema(
         """Per-(skill_name, model) exponentially-weighted moving averages.
 
         Reads ``skill_stats_daily`` for the last *window_days* and applies
-        exponential weighting where the most recent day has weight ``alpha``,
-        the next has ``alpha * (1-alpha)``, and so on. The default
-        ``alpha=0.3`` gives roughly a 5-day half-life — recent behaviour
-        dominates without dropping older data on a hard window.
+        **calendar-aware** exponential weighting: each day's weight is
+        ``alpha * (1-alpha) ** age_in_days`` where ``age_in_days`` is the
+        actual UTC-day distance from today, NOT the row's index in the
+        result set. Days with no data correctly get zero weight (they're
+        absent from the result), and gaps don't compress older data.
+
+        Cost / duration EMAs are **volume-weighted**: a day with 100 fast
+        calls weighs more than a day with 1 slow call within the same EMA
+        term, computed as ``sum(w_d * total_d) / sum(w_d * count_d)``
+        across days. Avoids the per-day-average × per-day-weight bias that
+        favoured low-volume early-shadow days.
+
+        Half-life heuristic for α=0.3 is ~1.94 calendar days (``ln(2) /
+        ln(1/(1-α))``). Use α≈0.129 for a true 5-day half-life.
 
         Returns a list of dicts (one per (skill_name, model) bucket) with
         keys::
@@ -803,15 +816,18 @@ def query_skill_ema(
 
         Sorted by ``last_invoked_at`` descending so the dashboard surfaces
         currently-active skills first. Buckets with zero rows in the window
-        are silently omitted.
+        are silently omitted. Buckets where every weighted-count term is
+        zero (e.g. all rows have NULL ``invocation_count``) get NaN-safe
+        zeros for cost/duration EMAs.
         """
         if window_days <= 0 or alpha <= 0 or alpha >= 1:
             return []
         cutoff_ts = time.time() - window_days * 86400.0
         sql = (
             "SELECT skill_name, model, provider, day, "
             "invocation_count, success_count, failure_count, "
-            "avg_duration_s, avg_cost_usd, avg_quality_score, last_invoked_at "
+            "total_duration_s, total_cost_usd, "
+            "avg_quality_score, last_invoked_at "
             "FROM skill_stats_daily "
             "WHERE last_invoked_at >= ? "
             "ORDER BY skill_name, model, day"
@@ -825,11 +841,25 @@ def query_skill_ema(
             key = (r["skill_name"], r["model"])
             groups.setdefault(key, []).append(r)
 
+        # Calendar-aware day-age helper. Cron writes ``DATE(invoked_at,
+        # 'unixepoch')`` which is UTC by SQLite contract, so we compare
+        # against UTC today.
+        today_utc = datetime.now(timezone.utc).date()
+
+        def _day_age(day_str: str) -> int:
+            try:
+                d = datetime.strptime(day_str, "%Y-%m-%d").date()
+            except (TypeError, ValueError):
+                return 0
+            return max(0, (today_utc - d).days)
+
         out: List[Dict[str, Any]] = []
         for (skill_name, model), rs in groups.items():
             rs.sort(key=lambda r: r["day"])
             n = len(rs)
-            raw_weights = [alpha * (1 - alpha) ** (n - 1 - i) for i in range(n)]
+
+            # Calendar-aware geometric weights.
+            raw_weights = [alpha * (1 - alpha) ** _day_age(r["day"]) for r in rs]
             wsum = sum(raw_weights)
             weights = (
                 [w / wsum for w in raw_weights]
@@ -847,14 +877,27 @@ def query_skill_ema(
                 )
                 for w, r in zip(weights, rs)
             )
-            ema_duration = sum(
-                w * float(r["avg_duration_s"] or 0.0)
-                for w, r in zip(weights, rs)
-            )
-            ema_cost = sum(
-                w * float(r["avg_cost_usd"] or 0.0)
+
+            # Volume-weighted EMA: numerator is sum(w_d * total_d), denominator
+            # is sum(w_d * count_d). When count_d = 0 for every day, fall back
+            # to 0.0 (skill ran but every row was excluded — rare).
+            weighted_count_sum = sum(
+                w * int(r["invocation_count"] or 0)
                 for w, r in zip(weights, rs)
             )
+            if weighted_count_sum > 0:
+                ema_duration = sum(
+                    w * float(r["total_duration_s"] or 0.0)
+                    for w, r in zip(weights, rs)
+                ) / weighted_count_sum
+                ema_cost = sum(
+                    w * float(r["total_cost_usd"] or 0.0)
+                    for w, r in zip(weights, rs)
+                ) / weighted_count_sum
+            else:
+                ema_duration = 0.0
+                ema_cost = 0.0
+
             last_invoked = max(float(r["last_invoked_at"] or 0.0) for r in rs)
             provider_latest = rs[-1].get("provider")