Skip to content

feat(state): track per-skill cron invocations + EMA query for labyrinth#19508

Open
Brecht-H wants to merge 3 commits into
NousResearch:mainfrom
Brecht-H:feat/skill-invocations-ema-clean-2026-05-04
Open

feat(state): track per-skill cron invocations + EMA query for labyrinth#19508
Brecht-H wants to merge 3 commits into
NousResearch:mainfrom
Brecht-H:feat/skill-invocations-ema-clean-2026-05-04

Conversation

@Brecht-H
Copy link
Copy Markdown
Contributor

@Brecht-H Brecht-H commented May 4, 2026

Summary

Adds a skill_invocations table to state.db, wires the cron runner to write one row per skill listed on a cron job at completion (success and failure paths), and exposes a SessionDB.query_skill_ema() method so observability dashboards can A/B local Qwen against external models.

This is the data layer for per-skill EMA tracking — needed before installing analyzer crons that route to paid providers (DeepSeek, Kimi-via-OpenRouter, etc.) without flying blind on quality vs cost.

Coexists cleanly with #18253's bump_use() cron-side curator integration (auto-merged on cherry-pick).

Changes

  • Schema (hermes_state.py)

    • New table skill_invocations (session_id, cron_id, skill_name, model, provider, duration, tokens, cost, success, end_reason, …)
    • Three indexes: by skill, by cron, by session
    • New view skill_stats_daily aggregating per (skill_name, model, provider, day)
    • SCHEMA_VERSION 11 → 12. Pure additive — no Alembic, no destructive ops on existing rows.
  • Writer (hermes_state.py)

    • SessionDB.record_skill_invocation(...) keyword-only API, uses the existing _execute_write WAL-safe helper.
  • Cron hook (cron/scheduler.py)

    • run_job() records timestamps around the agent run and writes one skill_invocations row per name in job["skills"] after the agent exits (both success and failure paths). Tokens / cost / duration are sourced from the existing sessions row, so no new instrumentation in the agent loop.
    • All writes are best-effort: any failure logs at DEBUG and is swallowed so it can never fail a cron.
  • Read API (hermes_state.py)

    • SessionDB.query_skill_ema(window_days=14, alpha=0.3) returns per-(skill_name, model) exponentially-weighted moving averages for success rate, cost-per-call, and duration. Default α≈5d half-life.

Smoke test (local, off-tree)

SCHEMA_VERSION = 12
Tables: ['messages', 'schema_version', 'sessions', 'skill_invocations', 'sqlite_sequence', 'state_meta']
Views: ['skill_stats_daily']
skill indexes: ['idx_skill_invocations_cron', 'idx_skill_invocations_session', 'idx_skill_invocations_skill']

Inserts validated for success=True / success=False / success=None paths. View aggregates cleanly. EMA query returns expected exponential weighting (verified 8-row dataset across 7 days, recent failures correctly down-weighted as they age out).

v1 scope notes

  • Cron-only. Slash-command and ad-hoc skill_view() calls are not tracked yet — out of scope for v1, additive in v2 if needed.
  • Multi-skill crons over-account. Each skill in job["skills"] gets the full session cost. The current analyzer pattern is one skill per cron (Pass 2 v2 Steps 3–5), so this corner case is rare.
  • Labyrinth HTTP endpoint. The data layer (query_skill_ema) lands here. The plumbing in plugins/hermes-labyrinth/dashboard/plugin_api.py (a small @router.get('/skills/ema') wrapper) is a follow-up because labyrinth is upstreamed at stainlu/hermes-labyrinth — separate PR target.

Test plan

  • Fresh-DB schema migration applies cleanly
  • record_skill_invocation() insert + skill_stats_daily aggregation
  • query_skill_ema() returns expected EMA values across multi-day data
  • Coexists with fix(curator): rewrite cron job skill refs after consolidation #18253's bump_use() integration (auto-merge clean)
  • Live cron fire emits one row per skill (validated by Pass 2 v2 Step 3, first analyzer cron — analyze:derivatives-anomaly)
  • Labyrinth dashboard endpoint follow-up (small wrapper PR to plugins/hermes-labyrinth)

History

Supersedes #19507, which carried an extra unrelated commit from local main. This PR is rebased clean onto current upstream main (363cc93).

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 4, 2026 03:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-skill cron invocation tracking into the SQLite state store and exposes an EMA query API for downstream observability (e.g., Labyrinth dashboards) to compare quality/cost/duration over time.

Changes:

  • Add skill_invocations table + supporting indexes and skill_stats_daily view; bump SCHEMA_VERSION to 12.
  • Add SessionDB.record_skill_invocation() writer and SessionDB.query_skill_ema() reader API.
  • Hook cron/scheduler.py::run_job() to write one invocation row per skill at job completion (success/failure).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
hermes_state.py Introduces the new skill invocation schema + daily aggregation view, and adds writer + EMA query methods.
cron/scheduler.py Records per-skill invocation rows at the end of each cron run using data sourced from the session row.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hermes_state.py
DEFAULT_DB_PATH = get_hermes_home() / "state.db"

SCHEMA_VERSION = 11
SCHEMA_VERSION = 12
Comment thread hermes_state.py
Comment on lines +823 to +827
groups: Dict[Tuple[str, Optional[str]], List[Dict[str, Any]]] = {}
for r in rows:
key = (r["skill_name"], r["model"])
groups.setdefault(key, []).append(r)

Comment thread hermes_state.py

ema_success_rate = sum(
w * (
(int(r["success_count"] or 0) / max(1, int(r["invocation_count"] or 0)))
Comment thread hermes_state.py
Comment on lines +812 to +822
"SELECT skill_name, model, provider, day, "
"invocation_count, success_count, failure_count, "
"avg_duration_s, avg_cost_usd, avg_quality_score, last_invoked_at "
"FROM skill_stats_daily "
"WHERE last_invoked_at >= ? "
"ORDER BY skill_name, model, day"
)
with self._lock:
cur = self._conn.execute(sql, (cutoff_ts,))
rows = [dict(r) for r in cur.fetchall()]

Comment thread hermes_state.py Outdated
Comment on lines +832 to +857
raw_weights = [alpha * (1 - alpha) ** (n - 1 - i) for i in range(n)]
wsum = sum(raw_weights)
weights = (
[w / wsum for w in raw_weights]
if wsum > 0
else [1.0 / n] * n
)

sample_count = sum(int(r["invocation_count"] or 0) for r in rs)
success_count = sum(int(r["success_count"] or 0) for r in rs)
failure_count = sum(int(r["failure_count"] or 0) for r in rs)

ema_success_rate = sum(
w * (
(int(r["success_count"] or 0) / max(1, int(r["invocation_count"] or 0)))
)
for w, r in zip(weights, rs)
)
ema_duration = sum(
w * float(r["avg_duration_s"] or 0.0)
for w, r in zip(weights, rs)
)
ema_cost = sum(
w * float(r["avg_cost_usd"] or 0.0)
for w, r in zip(weights, rs)
)
Comment thread cron/scheduler.py
Comment on lines +1272 to +1296
_completed_at = _hermes_now().timestamp()
_sess = _session_db.get_session(_cron_session_id) or {}
_success_flag, _end_reason_val = _skill_outcome
_duration = _completed_at - _invoked_at
for _skill_name in _job_skills:
_sn = str(_skill_name).strip()
if not _sn:
continue
_session_db.record_skill_invocation(
skill_name=_sn,
invoked_at=_invoked_at,
session_id=_cron_session_id,
cron_id=job_id,
completed_at=_completed_at,
duration_seconds=_duration,
model=_sess.get("model"),
provider=_sess.get("billing_provider"),
input_tokens=int(_sess.get("input_tokens") or 0),
output_tokens=int(_sess.get("output_tokens") or 0),
cache_read_tokens=int(_sess.get("cache_read_tokens") or 0),
cache_write_tokens=int(_sess.get("cache_write_tokens") or 0),
estimated_cost_usd=_sess.get("estimated_cost_usd"),
success=_success_flag,
end_reason=_end_reason_val,
)
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/cron Cron scheduler and job management labels May 4, 2026
@Brecht-H
Copy link
Copy Markdown
Contributor Author

Brecht-H commented May 4, 2026

Mac 3-lens review — REQUEST-CHANGES

Ran 3 parallel reviewers (architect / code-review / security). Architect + code-review converged on 3× P1 correctness bugs that defeat the EMA telemetry's stated purpose. Security clear.

Full feedback at ~/hermes-plan/PR19508_REVIEW_2026-05-04.md on Mac (rsync'd to /home/orion/hermes-plan/PR19508_REVIEW_2026-05-04.md for Orion access).

Must-fix before merge (P1 × 3)

  1. EMA weighting is index-based, not calendar-based (hermes_state.py:268-275). Days with gaps get treated as adjacent — defeats "5-day half-life" docstring claim. Fix: weight by (today - day).days instead of list index.

  2. EMA double-aggregates avg_duration_s / avg_cost_usd (hermes_state.py:288-295). Per-day average × per-day weight means a low-volume day gets equal weight to a high-volume day in the EMA term. Fix: weight by invocation_count inside, OR add total_* columns to view and EMA over those.

  3. Silent-failure on skill writes at DEBUG (cron/scheduler.py:1297-1301). For Build fix(cli): respect explicit --max-turns value even when it equals default #87 telemetry foundation, broken writer is invisible until dashboard goes empty. Fix: logger.warning(... exc_info=True).

Should-fix (P2 × 3)

  • Multi-skill cron over-accounting (cost doubles for 2-skill crons)
  • DROP VIEW IF EXISTS before CREATE VIEW for migration idempotency
  • Lock contention on query_skill_ema (release after fetchall)

Nits (P3)

  • Docstring "5-day half-life" — α=0.3 is actually ~1.94-day half-life
  • error_msg[:200] if error_msg else None — conditional is dead
  • Underscore-prefix inconsistent with sibling vars in same function

What's good

Schema additivity, parameterized SQL, keyword-only API, security clear, v1 scope is right.

Estimated fix effort: ~30-45 min for P1s, ~15 min more for P2s. Smoke tests should still pass.

Tag me when ready for re-review.

Brecht-H pushed a commit to Brecht-H/hermes-agent that referenced this pull request May 4, 2026
Mac 3-lens review (architect / code-review / security) flagged 3 P1
correctness bugs. Fixes:

P1.1 — Calendar-aware EMA weighting (hermes_state.py)
  Was: raw_weights = [alpha * (1-alpha)**(n-1-i) for i in range(n)]
  Now: weights by (today_utc - day).days. Days with no data correctly
  get zero weight (absent from result); gaps don't compress older data.
  Defeats the index-based bias toward sparsely-running skills.

P1.2 — Volume-weighted cost / duration EMA (hermes_state.py + view)
  Was: ema_X = sum(w * avg_X) — biased equal-weight across days
       regardless of invocation count.
  Now: skill_stats_daily exposes total_duration_s / total_cost_usd
  alongside avg_*. EMA computed as
     sum(w_d * total_d) / sum(w_d * count_d)
  so a day with 100 fast calls correctly outweighs a day with 1 slow
  call within the same EMA term.

P1.3 — Surface telemetry write failures (cron/scheduler.py)
  Was: except: logger.debug(...) — silent swallow at DEBUG; broken
       writer invisible until dashboard goes empty.
  Now: logger.warning(..., exc_info=True). Operator sees regressions;
  cron still cannot fail (warning never raises).

Plus P2.1 (multi-skill cron cost split — divides session cost evenly
when len(job["skills"]) > 1, preventing per-skill EMA cost-doubling),
P2.2 (DROP VIEW IF EXISTS before CREATE for migration idempotency),
and P3 nits (dead conditional on error_msg slice; docstring half-life
math corrected — α=0.3 ≈ 1.94d, not 5d; α≈0.129 for true 5-day).

Smoke-tested off-tree:
  skill-a-gappy (days 0,1,7) success_rate=0.954  ← gap correctly
                                                   down-weights day-7
  skill-b-tight (days 0,1,2) success_rate=1.000
  skill-c-vol  (1 slow + 100 fast) duration=1.69s cost=$0.0024/call
                                                   ← volume-weighted

Constraint: pure additive view change — DROP+CREATE on every connect
is safe because skill_stats_daily has no rowids/triggers depending on
its identity. Existing rows in skill_invocations untouched.
Confidence: high (3 scenarios validated)
Scope-risk: narrow (same file boundaries as original PR)
Not-tested: multi-skill cron live-fire (smoke uses synthetic data)
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Brecht-H
Copy link
Copy Markdown
Contributor Author

Brecht-H commented May 4, 2026

@Brecht-H ready for re-review.

P1 + P2 fixes pushed as 52ff992 (additive commit, no rebase — kept original 50bdf20 intact for diff trail).

P1.1 Calendar-aware EMA weighting → smoke verified: 3-day-gap skill correctly down-weights day-7 failure (success_rate=0.954 vs naive 0.667).

P1.2 Volume-weighted cost/duration EMA via new total_duration_s / total_cost_usd view columns → smoke verified: 1 slow call + 100 fast calls correctly produces duration=1.69s and cost=$0.0024/call (vs naive equal-weight 25.5s / $0.0505).

P1.3 logger.warning(..., exc_info=True) on skill-write failures.

P2.1 Multi-skill cron cost split (even division when len(skills) > 1).

P2.2 DROP VIEW IF EXISTS before CREATE for migration idempotency.

P3 nits: dead conditional removed on error_msg[:200]; docstring half-life math corrected (α=0.3 ≈ 1.94d, not 5d; α≈0.129 for true 5-day).

Diff: +98 / −27 across the same 2 files. No new tests added (existing test plan still applicable). Once you approve + merge, I'll cherry-pick locally → validate Step 3 cron-fire emits skill_invocations row → flip the 3 analyzer crons to enabled: true.

@Brecht-H
Copy link
Copy Markdown
Contributor Author

Brecht-H commented May 4, 2026

Mac re-review v2 — APPROVED ✅

All 3 P1 fixes verified clean against original review feedback:

  • P1.1 calendar-aware EMA ✅ — _day_age(day_str) helper at L319-324, weights now use alpha * (1-alpha)**day_age. Smoke evidence (gappy day-7 failure correctly low-weighted) confirms.
  • P1.2 volume-weighted aggregation ✅ — Added total_duration_s + total_cost_usd to view, EMA now Σ(w×total)/Σ(w×count). NaN-safe denominator guard at L358-368. Smoke evidence (1.69s for 1-slow+100-fast vs naive 25.5s) confirms correctness.
  • P1.3 logger.warning ✅ — L106 upgraded from DEBUG.

Both P2 fixes also landed:

  • P2.1 multi-skill cost split_split() lambda divides cost evenly when len(skills) > 1, single-skill keeps exclusive attribution. Good comment about v2 attribution column.
  • P2.2 DROP VIEW idempotency — L170 DROP VIEW IF EXISTS skill_stats_daily before CREATE.

Smoke evidence in re-review post is comprehensive (3 scenarios validating calendar-awareness baseline, gap handling, and volume weighting). Ship it.

Merging now.

@Brecht-H
Copy link
Copy Markdown
Contributor Author

Brecht-H commented May 4, 2026

@Brecht-H Bugs A + B + C from Orion CC's RED validation report fixed and validated.

Pushed as commit 50a1d51c5 (additive on top of the v2 fixes). Bug B is filesystem-only (cp instead of symlink) — documented in new memory file feedback_hermes-profile-mode-script-paths.md.

Validation reports:

  • ~/hermes-plan/CRON_FIRE_VALIDATION_DERIVATIVES_ANOMALY_2026-05-04_RERUN.md — 🟢 GREEN, ready for enabled: true
  • ~/hermes-plan/CRON_FIRE_VALIDATION_POLYMARKET_REGIME_2026-05-04.md — 🟡 scaffolding GREEN, downstream DeepSeek API key invalid (operator credential refresh needed)
  • ~/hermes-plan/CRON_FIRE_VALIDATION_MINIFLUX_DIGEST_2026-05-04.md — 🟡 scaffolding GREEN, OpenRouter HTTP 402 (max_tokens budget — agent runtime asks for 65536 vs 5312 affordable)

skill_invocations rows confirmed (global state.db v12):

id=3 derivatives-anomaly  success=1 model=qwen3.6-27b           end_reason=complete
id=4 polymarket-regime    success=0 model=deepseek-v4-flash     end_reason='RuntimeError: Error code: 401 - API key invalid'
id=5 miniflux-digest      success=0 model=moonshotai/kimi-k2.6  end_reason='RuntimeError: HTTP 402: ... 65536 > 5312 affordable'

Bug C correctly downgraded both polymarket and miniflux to success=0 despite the agent reporting non-FATAL responses — the marker scan caught the downstream provider failures and recorded them honestly. Without Bug C the EMA would have been polluted with two fake-success rows.

Pass 1 erratum added at top of ~/hermes-plan/HERMES_SKILL_INSTALL_SPEC_EXTERNAL_MODELS_2026-05-03.md.

Tagging for re-review of 50a1d51c5 + flip-decision on the 3 analyzer crons:

  • Derivatives-anomaly: ready to flip enabled: true (24h soak suggested before reading EMA values)
  • Polymarket-regime: BLOCKED on DeepSeek key refresh
  • Miniflux-digest: BLOCKED on OpenRouter credit top-up OR per-cron max_tokens cap

Hermes Sovereign AgentCore and others added 3 commits May 5, 2026 09:08
Adds a `skill_invocations` table to state.db and writes one row per
skill listed on a cron job at completion (success or failure paths).
Tokens, cost and duration are sourced from the existing session row.

Includes a `skill_stats_daily` view that buckets invocations by day and
(skill_name, model, provider), and a new `SessionDB.query_skill_ema()`
method that applies exponential weighting (default alpha=0.3, ~5d
half-life) so the dashboard can A/B local Qwen against external models
once analyzer crons start firing.

SCHEMA_VERSION 11 → 12. Pure additive: existing rows untouched, new
table created on next connection-open via the existing executescript()
path. No Alembic.

Slash-command and ad-hoc skill_view invocations are NOT tracked in v1.
Multi-skill crons over-account: each skill in `job["skills"]` gets the
full session cost. Both are acceptable for the analyzer-cron use case
(1 skill per cron) and can iterate later.

Constraint: no new external dependencies — uses sqlite3 + stdlib only.
Rejected: per-skill cost split (would require model attribution
inside a single agent run, which Hermes does not currently track) |
Reason: defer to v2 once NousResearch#87 surfaces real-world skew.
Confidence: high (smoke-tested end-to-end on tmp DB)
Scope-risk: narrow (additive table, no existing-row touchpoints)
Not-tested: live cron fire (validated by Step 3 — Pass 2 v2 handover)
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mac 3-lens review (architect / code-review / security) flagged 3 P1
correctness bugs. Fixes:

P1.1 — Calendar-aware EMA weighting (hermes_state.py)
  Was: raw_weights = [alpha * (1-alpha)**(n-1-i) for i in range(n)]
  Now: weights by (today_utc - day).days. Days with no data correctly
  get zero weight (absent from result); gaps don't compress older data.
  Defeats the index-based bias toward sparsely-running skills.

P1.2 — Volume-weighted cost / duration EMA (hermes_state.py + view)
  Was: ema_X = sum(w * avg_X) — biased equal-weight across days
       regardless of invocation count.
  Now: skill_stats_daily exposes total_duration_s / total_cost_usd
  alongside avg_*. EMA computed as
     sum(w_d * total_d) / sum(w_d * count_d)
  so a day with 100 fast calls correctly outweighs a day with 1 slow
  call within the same EMA term.

P1.3 — Surface telemetry write failures (cron/scheduler.py)
  Was: except: logger.debug(...) — silent swallow at DEBUG; broken
       writer invisible until dashboard goes empty.
  Now: logger.warning(..., exc_info=True). Operator sees regressions;
  cron still cannot fail (warning never raises).

Plus P2.1 (multi-skill cron cost split — divides session cost evenly
when len(job["skills"]) > 1, preventing per-skill EMA cost-doubling),
P2.2 (DROP VIEW IF EXISTS before CREATE for migration idempotency),
and P3 nits (dead conditional on error_msg slice; docstring half-life
math corrected — α=0.3 ≈ 1.94d, not 5d; α≈0.129 for true 5-day).

Smoke-tested off-tree:
  skill-a-gappy (days 0,1,7) success_rate=0.954  ← gap correctly
                                                   down-weights day-7
  skill-b-tight (days 0,1,2) success_rate=1.000
  skill-c-vol  (1 slow + 100 fast) duration=1.69s cost=$0.0024/call
                                                   ← volume-weighted

Constraint: pure additive view change — DROP+CREATE on every connect
is safe because skill_stats_daily has no rowids/triggers depending on
its identity. Existing rows in skill_invocations untouched.
Confidence: high (3 scenarios validated)
Scope-risk: narrow (same file boundaries as original PR)
Not-tested: multi-skill cron live-fire (smoke uses synthetic data)
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mac's Pass 2 v2 deploy validation (Orion CC, RED report
2026-05-04) caught two scaffolding bugs that prevented the
analyzer crons from firing end-to-end. Both fixed here.

Bug A — skill_view() didn't match by frontmatter `name:`
  Was: skill_view fell through directory-name match only.
       Skills with dir/frontmatter mismatch (e.g. dir
       `analyze/derivatives-anomaly`, frontmatter
       `analyze-derivatives-anomaly`) returned "not found"
       even though `hermes skills list` showed them. The
       cron's _build_job_prompt then logged
       "skill not found, skipping" and the activation
       banner never reached the agent.
  Now: tools/skills_tool.py adds a frontmatter-name fallback
       after the existing dir-name match, restoring parity
       with the LIST command's keying. ~17 lines.

Bug C — runner success classifier accepted scaffolding
  failures as success
  Was: cron/scheduler.py set
       _skill_outcome = (True, "complete") whenever the
       agent returned a non-FATAL response. A cron whose
       skill resolution failed and whose script never ran
       was reported success=True, polluting the EMA's
       A/B comparison foundation.
  Now: scan output + final_response for known scaffolding
       failure markers ("skill not found, skipping",
       "Blocked: script path resolves outside",
       "permission denied", "security check failed",
       "Skill(s) not found and skipped"). When matched,
       downgrade to (False, marker_excerpt) so the
       skill_invocations row records the real outcome.

(Bug B — symlink security check rejecting
~/.hermes/profiles/<p>/scripts/* symlinks whose targets
live under ~/.hermes/skills/<x>/scripts/ — is a
filesystem-only fix: replace the symlinks with `cp` real
files. Not in this commit; documented in
feedback_hermes-profile-mode-script-paths.md memory.)

Validation:
  After applying A+B+C and restarting the dashboard, all 3
  analyzer crons fired via run_job() and wrote
  skill_invocations rows with correctly-classified
  outcomes:
   - derivatives-anomaly: success=1, model=qwen3.6-27b
   - polymarket-regime: success=0, model=deepseek-v4-flash
     end_reason="HTTP 401: API key invalid" (genuine
     downstream credential failure, NOT scaffolding)
   - miniflux-digest: success=0,
     model=moonshotai/kimi-k2.6 end_reason="HTTP 402:
     credit budget exceeded" (genuine downstream OpenRouter
     limit, NOT scaffolding)

Confidence: high (3 live cron-fires, table inserts and
skill_invocations writes confirmed).
Scope-risk: narrow (additive only — Bug A is a new
fallback after existing matches; Bug C wraps the existing
success path in a marker scan).
Not-tested: the 65536-max_tokens setting in agent runtime
that triggered miniflux's 402 — likely a separate
agent-side config issue, not in this PR's scope.
Machine: orion-terminal

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Brecht-H Brecht-H force-pushed the feat/skill-invocations-ema-clean-2026-05-04 branch from 50a1d51 to 050f4d0 Compare May 5, 2026 09:09
@Brecht-H
Copy link
Copy Markdown
Contributor Author

Brecht-H commented May 5, 2026

@Brecht-H rebased onto upstream main (was 201 commits behind base 363cc936746c).

Conflict resolved: cron/scheduler.py had one chunk-conflict at the top of run_job where upstream added an early-return when the prerun script produces no output (PR #19744 fix(cron): skip AI call when script produces no output) and our PR added _invoked_at / _skill_outcome capture. Resolved by keeping both in order: upstream early-return runs FIRST (no AI call → no skill_invocation row needed), our capture runs SECOND (only when the AI call will fire).

hermes_state.py auto-merged cleanly (upstream's +382 LOC was strictly additive; our SCHEMA_VERSION 11→12 bump preserved — upstream is still at 11). tools/skills_tool.py applied unchanged (upstream untouched).

Smoke test on rebased branch (off-tree fresh DB):

SCHEMA_VERSION = 12
inserted skill_invocations id=1 | EMA buckets: 1
analyze:test success_rate=1.000

New SHAs:

  • ffdc5815c (was 50bdf20) feat(state): per-skill EMA
  • fe9d7adee (was 52ff992) P1+P2 fixes
  • 050f4d0fc (was 50a1d51) Bug A + C

Force-pushed via --force-with-lease to Brecht-H:feat/skill-invocations-ema-clean-2026-05-04.

Tagging for re-review of the resolved conflict + final approve. Local Hermes-agent install on Orion is unaffected (cherry-picked commits live there independent of upstream merge state).

@Brecht-H
Copy link
Copy Markdown
Contributor Author

Brecht-H commented May 5, 2026

Mac re-review post-rebase — APPROVE OK

Rebase verified clean:

  • 201 upstream commits absorbed onto base 363cc93 -> ffdc581 -> fe9d7ad -> 050f4d0
  • mergeable=True, changed_files=3 (only the contribution files: cron/scheduler.py, hermes_state.py, tools/skills_tool.py)
  • LOC delta unchanged from pre-rebase: +101 -3 / +243 -2 / +21 -0 (intent preserved through conflict resolution)
  • Smoke verified on rebased branch: SCHEMA_VERSION=12, EMA query green
  • Force-pushed via --force-with-lease (safe overwrite)
  • Backup branch retained at backup/pre-rebase-2026-05-05 in worktree

The PR is ready for upstream maintainer review. Down-stream consumer (Mac/Allaert's Hermes deployment) has been running cherry-picked commits ccdb7a951 + c3d8a1016 since 2026-05-04 — Bug A/B/C fixes verified working in production for 24h+. skill_invocations table populating cleanly, derivatives-anomaly cron green, polymarket-regime + miniflux-digest now also green after the post-#19508 routing fix.

No blockers from our side. Waiting on NousResearch maintainer.

— Mac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants