Skip to content

Create TestRunner.Report with markdown generation #198

@andreasronge

Description

@andreasronge

Summary

Create TestRunner.Report module with shared markdown report generation functions for both JSON and Lisp test runners. This enables consistent report output format across both DSLs and eliminates code duplication.

Context

Architecture reference: Test Runner Refactoring Plan - see "Module Responsibilities" section for PtcDemo.TestRunner.Report

Dependencies: None (Phase 1 tasks can be done in parallel)

Related issues: #196 (TestRunner.Base - completed), Epic #195

Current State

Report generation is currently implemented in LispTestRunner (lines 679-798):

  • write_report/2 - writes report content to file
  • generate_report/1 - builds full markdown report from summary
  • generate_results_table/1 - creates markdown table of all test results
  • generate_failed_details/1 - formats detailed failure information
  • generate_all_programs_section/1 - lists all programs attempted per test
  • generate_test_programs/1 - helper for single test program listing
  • format_timestamp/1 - formats DateTime for report header

The JSON TestRunner has no report generation capability.

Note: LispTestRunner also has private implementations of format_cost/1, format_duration/1, truncate/2, and format_attempt_result/1 (lines 652-675) that duplicate the functions in TestRunner.Base. The Report module should use the Base functions rather than reimplementing these.

Acceptance Criteria

  • PtcDemo.TestRunner.Report module created at demo/lib/ptc_demo/test_runner/report.ex
  • write_report/3 accepts path, summary, and DSL name to write report file
  • generate_report/2 accepts summary and DSL name, returns markdown string
  • generate_results_table/1 generates consistent table format from results
  • generate_failed_details/1 formats failure details section
  • generate_all_programs_section/1 generates all programs section
  • format_timestamp/1 formats DateTime as "YYYY-MM-DD HH:MM:SS UTC"
  • Uses Base.format_cost/1, Base.format_duration/1, Base.truncate/2, and Base.format_attempt_result/1 - no duplication
  • Report title is parameterized by DSL name (e.g., "PTC-JSON Test Report" or "PTC-Lisp Test Report")
  • Code compiles without warnings
  • Module has @moduledoc and @doc with examples
  • All public functions have @spec type specifications

Implementation Hints

Files to create:

  • demo/lib/ptc_demo/test_runner/report.ex - new module with report generation

Code to extract from LispTestRunner:

  • Lines 679-798 in demo/lib/ptc_demo/lisp_test_runner.ex
  • Change write_report/2 to write_report/3 with DSL name parameter
  • Change generate_report/1 to generate_report/2 with DSL name parameter
  • Report title should use DSL name: # PTC-#{dsl_name} Test Report

Patterns to follow:

  • Follow TestRunner.Base module structure (see demo/lib/ptc_demo/test_runner/base.ex)
  • Use @moduledoc and @doc with examples
  • Add @spec for all public functions

Reuse from TestRunner.Base:

alias PtcDemo.TestRunner.Base

# Use these instead of reimplementing:
Base.format_cost/1      # for cost formatting
Base.format_duration/1  # for duration formatting
Base.truncate/2         # for string truncation
Base.format_attempt_result/1  # for formatting program results

Summary map structure expected (from Base.build_summary/5):

%{
  passed: integer(),
  failed: integer(),
  total: integer(),
  total_attempts: integer(),
  duration_ms: integer(),
  model: String.t(),
  data_mode: atom(),
  results: [result_map()],
  stats: %{total_tokens: integer(), total_cost: float()},
  timestamp: DateTime.t()
}

Result map structure expected (each item in results list):

%{
  index: integer(),
  query: String.t(),
  passed: boolean(),
  attempts: integer(),
  program: String.t() | nil,        # final successful program
  all_programs: [{String.t(), any()}],  # all attempted {program, result} pairs
  error: String.t() | nil,          # error message if failed
  description: String.t(),
  constraint: tuple()
}

Edge Cases

  • Empty results list: Should generate valid report with "0/0 passed" and empty tables
  • All tests passed: Should omit "Failed Tests" section (currently returns empty string)
  • Results without :all_programs key: Graceful handling with "(no programs)" fallback (already implemented in source)
  • Results without :program key: Use "-" as fallback (already implemented in source)
  • DSL name capitalization: Accept as-is (caller provides "JSON" or "Lisp")

Test Plan

Unit tests: Deferred to Phase 2, task 2.3 (separate issue)

Manual verification:

  1. Module compiles without warnings: mix compile --warnings-as-errors
  2. Functions are callable: verify in iex -S mix that functions exist and accept correct arities
  3. Generate sample report:
    alias PtcDemo.TestRunner.Report
    
    summary = %{
      passed: 1, failed: 1, total: 2,
      total_attempts: 3, duration_ms: 5000,
      model: "test-model", data_mode: :schema,
      stats: %{total_tokens: 100, total_cost: 0.01},
      timestamp: DateTime.utc_now(),
      results: [
        %{index: 1, query: "Count items", passed: true, attempts: 1,
          program: "(count items)", all_programs: [{"(count items)", 5}],
          description: "Should count", constraint: {:eq, 5}},
        %{index: 2, query: "Sum values", passed: false, attempts: 2,
          program: nil, all_programs: [{"(sum x)", {:error, "undefined"}}],
          error: "Expected 10, got 5", description: "Should sum",
          constraint: {:eq, 10}}
      ]
    }
    
    IO.puts(Report.generate_report(summary, "Test"))

Out of Scope

  • Refactoring LispTestRunner to use this module (Phase 2, task 2.1)
  • Unit tests for Report module (Phase 2, task 2.3)
  • HTML or other output formats
  • Custom report templates

Documentation Updates

None - internal demo module, no public API docs required

Metadata

Metadata

Assignees

No one assigned

    Labels

    claude-approvedMaintainer-approved for Claude automationenhancementNew feature or requestready-for-implementationIssue is approved and ready to implement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions