Automation and Validation

Automation keeps the PKM consistent while preserving manual writing.

Validation and schema rules are defined in Schema Contract.

Automation behavior is configured in:

schema/automation.json
schema/automation.example.json (copy/adapt as a starting point)

Script Responsibilities

`scripts/automation/run_all.py`

Run the full automation pipeline in the correct order.
Stop immediately if any step fails.

Run it from repository root:

python scripts/automation/run_all.py

Portable launcher (recommended):

bash scripts/runtime/pkm_python.sh scripts/automation/run_all.py

`scripts/automation/generate_pages.py`

Create missing baseline entity notes from CSV entity tables.
Render in-note generated blocks in place.
Preserve all non-generated note content.

Supported generated directives:

header
list:<table_name>
table:<table_name>

Run it from repository root:

python scripts/automation/generate_pages.py

`scripts/automation/build_indexes.py`

Build index pages from indexes config in schema/automation.json.
Auto-build default entity indexes for every entity table found in data/*.csv.
- Output pattern: notes/indexes/all_<entity_table>.md
- Format: markdown table without ID columns; name is linked to the entity note; *_id values are resolved to linked *_name values
The same table rendering rules apply to explicit entity_table indexes declared in schema/automation.json.
Keep output deterministic (stable ordering).
Remove configured outputs when source tables are missing or empty (default behavior).
If an auto index output name collides with an explicit schema/automation.json index output, the explicit config wins and the auto index is skipped.

Run it from repository root:

python scripts/automation/build_indexes.py

Example index outputs when matching config and data exist:

notes/indexes/all_programs.md
notes/indexes/program_mentors.md
notes/indexes/program_mentees.md

`scripts/quality/validate.py` (recommended)

Verify foreign-key IDs exist.
Verify wiki links target valid IDs.
Verify display columns (if used) match referenced names.
Verify required columns exist.
Verify generated block markers are structurally valid.

Validation should report:

errors: fail CI/commit
warnings: pass with review recommendations

Run it from repository root:

python scripts/quality/validate.py

Generated Sections Policy

Never rewrite manual prose outside explicit generated blocks.

Use directive-bearing markers:

<!-- GENERATED START: header -->
# Entity Name
<!-- GENERATED END -->

## Related Items
<!-- GENERATED START: list:programs -->
- [prog_career_mentorship](../programs/prog_career_mentorship.md)
<!-- GENERATED END -->

## Related Items Table
<!-- GENERATED START: table:programs -->
| id | name |
| --- | --- |
| [prog_career_mentorship](../programs/prog_career_mentorship.md) | Career Mentorship |
<!-- GENERATED END -->

Only the content between markers is script-managed.

Generated links use relative Markdown paths so they resolve in local Markdown preview and on GitHub.

If older notes still contain wiki links, migrate them with:

python scripts/automation/migrate_wikilinks.py

Pre-Commit Flow (Recommended)

Preferred one-command flow:

bash scripts/runtime/pkm_python.sh scripts/automation/run_all.py

Equivalent explicit sequence:

python scripts/automation/generate_pages.py
python scripts/automation/build_indexes.py
python scripts/quality/validate.py

If validation fails, block commit until fixed.

Use versioned hooks in this repository:

bash scripts/setup/install_hooks.sh

This configures:

core.hooksPath=.githooks
.githooks/pre-commit to run scripts/runtime/pkm_python.sh scripts/automation/run_all.py

CI Flow (Recommended)

Run the same validation and generation checks in CI to guarantee reproducibility for contributors.

This repository includes a GitHub Actions workflow:

.github/workflows/pkm-check.yml

It sets up Python 3.11 and runs the pipeline twice. The second run must produce no tracked file changes.

Pipeline command:

bash scripts/runtime/pkm_python.sh scripts/automation/run_all.py

Idempotency guard:

git diff --exit-code
git diff --cached --exit-code

Runtime Settings

Runtime behavior can be configured in .env:

PKM_CONDA_ENV=
PKM_PYTHON_BIN=python3

PKM_CONDA_ENV empty: use system Python
PKM_CONDA_ENV set: helper checks env is active or exists, then runs with conda run -n <env>

Config Notes

schema/automation.json is intentionally empty in the starter template.

Baseline note generation still works from entity tables in data/
No indexes: explicit index generation is skipped, but default auto entity indexes are still generated

Add index rules there when you want generated index pages.

Optional Public Export

Create a filtered public snapshot:

python scripts/export/export_public_snapshot.py

Configure redaction IDs in:

export/private_ids.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automation and Validation

Script Responsibilities

`scripts/automation/run_all.py`

`scripts/automation/generate_pages.py`

`scripts/automation/build_indexes.py`

`scripts/quality/validate.py` (recommended)

Generated Sections Policy

Pre-Commit Flow (Recommended)

CI Flow (Recommended)

Runtime Settings

Config Notes

Optional Public Export

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Automation and Validation

Script Responsibilities

scripts/automation/run_all.py

scripts/automation/generate_pages.py

scripts/automation/build_indexes.py

scripts/quality/validate.py (recommended)

Generated Sections Policy

Pre-Commit Flow (Recommended)

CI Flow (Recommended)

Runtime Settings

Config Notes

Optional Public Export

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`scripts/automation/run_all.py`

`scripts/automation/generate_pages.py`

`scripts/automation/build_indexes.py`

`scripts/quality/validate.py` (recommended)