Skip to content

Automation and Validation

Salvador Banderas Rovira edited this page Mar 15, 2026 · 8 revisions

Automation keeps the PKM consistent while preserving manual writing.

Validation and schema rules are defined in Schema Contract.

Automation behavior is configured in:

  • schema/automation.json
  • schema/automation.example.json (copy/adapt as a starting point)

Script Responsibilities

scripts/automation/run_all.py

  • Run the full automation pipeline in the correct order.
  • Stop immediately if any step fails.

Run it from repository root:

python scripts/automation/run_all.py

Portable launcher (recommended):

bash scripts/runtime/pkm_python.sh scripts/automation/run_all.py

scripts/automation/generate_pages.py

  • Create missing baseline entity notes from CSV entity tables.
  • Render in-note generated blocks in place.
  • Preserve all non-generated note content.

Supported generated directives:

  • header
  • list:<table_name>
  • table:<table_name>

Run it from repository root:

python scripts/automation/generate_pages.py

scripts/automation/build_indexes.py

  • Build index pages from indexes config in schema/automation.json.
  • Auto-build default entity indexes for every entity table found in data/*.csv.
    • Output pattern: notes/indexes/all_<entity_table>.md
    • Format: markdown table without ID columns; name is linked to the entity note; *_id values are resolved to linked *_name values
  • The same table rendering rules apply to explicit entity_table indexes declared in schema/automation.json.
  • Keep output deterministic (stable ordering).
  • Remove configured outputs when source tables are missing or empty (default behavior).
  • If an auto index output name collides with an explicit schema/automation.json index output, the explicit config wins and the auto index is skipped.

Run it from repository root:

python scripts/automation/build_indexes.py

Example index outputs when matching config and data exist:

  • notes/indexes/all_programs.md
  • notes/indexes/program_mentors.md
  • notes/indexes/program_mentees.md

scripts/quality/validate.py (recommended)

  • Verify foreign-key IDs exist.
  • Verify wiki links target valid IDs.
  • Verify display columns (if used) match referenced names.
  • Verify required columns exist.
  • Verify generated block markers are structurally valid.

Validation should report:

  • errors: fail CI/commit
  • warnings: pass with review recommendations

Run it from repository root:

python scripts/quality/validate.py

Generated Sections Policy

Never rewrite manual prose outside explicit generated blocks.

Use directive-bearing markers:

<!-- GENERATED START: header -->
# Entity Name
<!-- GENERATED END -->

## Related Items
<!-- GENERATED START: list:programs -->
- [prog_career_mentorship](../programs/prog_career_mentorship.md)
<!-- GENERATED END -->

## Related Items Table
<!-- GENERATED START: table:programs -->
| id | name |
| --- | --- |
| [prog_career_mentorship](../programs/prog_career_mentorship.md) | Career Mentorship |
<!-- GENERATED END -->

Only the content between markers is script-managed.

Generated links use relative Markdown paths so they resolve in local Markdown preview and on GitHub.

If older notes still contain wiki links, migrate them with:

python scripts/automation/migrate_wikilinks.py

Pre-Commit Flow (Recommended)

Preferred one-command flow:

bash scripts/runtime/pkm_python.sh scripts/automation/run_all.py

Equivalent explicit sequence:

python scripts/automation/generate_pages.py
python scripts/automation/build_indexes.py
python scripts/quality/validate.py

If validation fails, block commit until fixed.

Use versioned hooks in this repository:

bash scripts/setup/install_hooks.sh

This configures:

  • core.hooksPath=.githooks
  • .githooks/pre-commit to run scripts/runtime/pkm_python.sh scripts/automation/run_all.py

CI Flow (Recommended)

Run the same validation and generation checks in CI to guarantee reproducibility for contributors.

This repository includes a GitHub Actions workflow:

  • .github/workflows/pkm-check.yml

It sets up Python 3.11 and runs the pipeline twice. The second run must produce no tracked file changes.

Pipeline command:

bash scripts/runtime/pkm_python.sh scripts/automation/run_all.py

Idempotency guard:

git diff --exit-code
git diff --cached --exit-code

Runtime Settings

Runtime behavior can be configured in .env:

PKM_CONDA_ENV=
PKM_PYTHON_BIN=python3
  • PKM_CONDA_ENV empty: use system Python
  • PKM_CONDA_ENV set: helper checks env is active or exists, then runs with conda run -n <env>

Config Notes

schema/automation.json is intentionally empty in the starter template.

  • Baseline note generation still works from entity tables in data/
  • No indexes: explicit index generation is skipped, but default auto entity indexes are still generated

Add index rules there when you want generated index pages.

Optional Public Export

Create a filtered public snapshot:

python scripts/export/export_public_snapshot.py

Configure redaction IDs in:

  • export/private_ids.txt

Clone this wiki locally