ci: enhance extension test coverage for removal, permissions, and silent failures

## Problem Statement

Current extension testing has critical gaps in verification of cleanup operations, permission error detection, and silent failure handling. Tests verify that commands complete successfully (exit code 0) but don't validate that the expected state changes actually occurred.

### Critical Issues

1. **Incomplete `remove()` verification**: Tests only check exit codes, not actual cleanup
2. **No permission error detection**: "Permission denied" errors can be silent
3. **Weak silent failure detection**: Errors in stderr can pass if exit code is 0
4. **Limited API compliance coverage**: Only 9 of 20+ extensions tested
5. **No `upgrade()` testing**: Extension API v2.0 function not tested
6. **No state verification**: Tests verify commands ran, not results

### Evidence

**Current `remove()` testing** (`.github/workflows/api-compliance.yml:159-189`):
```yaml
- name: Test uninstall() function
  run: |
    extension-manager uninstall ${{ matrix.extension }} > /tmp/uninstall.log 2>&1
    if [ $? -eq 0 ]; then
      echo "✓ Uninstall completed successfully"
    fi
```
- ❌ No verification files were deleted
- ❌ No verification environment was restored
- ❌ No verification directories were cleaned

**Silent failure example** (`.github/scripts/extension-tests/test-idempotency.sh:40-47`):
```bash
if grep -qi "error\|failed" /tmp/configure2.log; then
  print_warning "Warnings or errors detected on second run"
  # Don't fail on warnings, just report them
fi
```
- ❌ Allows errors to pass if exit code is 0
- ❌ No stderr pattern analysis for specific error types
- ❌ No permission validation before operations

---

## Current Test Coverage Analysis

### What We Test ✅

| Aspect | Coverage | Location |
|--------|----------|----------|
| Exit codes | Comprehensive | `lib/assertions.sh:149-171` |
| Command existence | Yes | `verify-commands.sh` |
| `validate()` function | Yes | `test-api-compliance.sh` |
| `status()` function | Yes | `test-api-compliance.sh` |
| Basic idempotency | Yes | `test-idempotency.sh` |
| Tool functionality | Partial | `test-key-functionality.sh` |

### What We Don't Test ❌

| Gap | Severity | Impact |
|-----|----------|--------|
| `remove()` state verification | CRITICAL | Orphaned files/configs after uninstall |
| Permission error detection | CRITICAL | Silent installation failures |
| `upgrade()` function | CRITICAL | API v2.0 non-compliance |
| Environment cleanup | HIGH | PATH/aliases not restored |
| Tool-specific cleanup | HIGH | Caches/configs persist after removal |
| Dependency cleanup | MEDIUM | Orphaned dependencies |
| Rollback/recovery | MEDIUM | No recovery from partial failures |
| Cross-extension coverage | HIGH | Only 9/20+ extensions tested |

---

## Proposed Solution

### Architecture: Three-Tier Testing

```
.github/scripts/extension-tests/
├── lib/                           # Shared utilities
│   ├── test-helpers.sh            # Existing
│   ├── assertions.sh              # Existing
│   ├── error-detection.sh         # NEW: Generic error pattern matching
│   └── cleanup-verification.sh    # NEW: Generic cleanup verification
│
├── API Compliance (Generic)       # Contract verification
│   ├── test-api-compliance.sh     # ENHANCE: Add cleanup/permission tests
│   ├── test-upgrade-api.sh        # NEW: Test upgrade() API compliance
│   └── test-permissions.sh        # NEW: Generic permission testing
│
└── extensions/                    # NEW: Per-extension specific tests
    ├── nodejs-cleanup.sh
    ├── python-cleanup.sh
    ├── rust-cleanup.sh
    ├── golang-cleanup.sh
    └── ... (one per extension)
```

### Tier 1: API Compliance Tests (Generic Contract)

**Purpose:** Verify all extensions implement the Extension API correctly

**Tests:**
- ✅ All 7 API functions exist (`prerequisites`, `install`, `configure`, `validate`, `status`, `remove`, `upgrade`)
- ✅ Functions return appropriate exit codes
- ✅ Generic cleanup patterns (extensions clean up files they created)
- ✅ Generic error handling (permission errors are caught and reported)
- ✅ API-level idempotency (calling functions multiple times is safe)

**Example:**
```bash
# test-api-compliance.sh (enhanced)
test_remove_function_compliance() {
    # API contract verification
    assert_function_exists "remove"
    
    # Capture pre-removal state
    local files_before=$(find /workspace -type f | wc -l)
    local env_before=$(env | sort)
    
    # Execute removal
    assert_exit_code 0 "extension-manager uninstall $EXTENSION"
    
    # Verify generic cleanup
    assert_no_stderr_errors
    assert_environment_restored "$env_before"
    assert_extension_directory_removed "/workspace/scripts/lib/$EXTENSION"
    assert_no_orphaned_processes "$EXTENSION"
}
```

### Tier 2: Per-Extension Tests (Specific Verification)

**Purpose:** Verify specific extensions work correctly with their tools

**Tests:**
- ✅ Extension-specific tools installed/removed correctly
- ✅ Specific files/directories exist after install
- ✅ Specific files/directories removed after uninstall
- ✅ Tool-specific commands work
- ✅ Extension-specific environment variables
- ✅ Tool-specific upgrade behavior

**Example:**
```bash
# extensions/nodejs-cleanup.sh
test_nodejs_specific_cleanup() {
    # Pre-removal verification
    assert_command_exists "node"
    assert_command_exists "npm"
    assert_directory_exists "/opt/mise/installs/node"
    
    # Execute removal
    extension-manager uninstall nodejs
    
    # Post-removal verification
    assert_command_not_exists "node"
    assert_command_not_exists "npm"
    assert_directory_not_exists "/opt/mise/installs/node"
    assert_file_not_contains "$HOME/.bashrc" "mise activate node"
    assert_npm_cache_cleaned
    assert_mise_tool_removed "node"
}
```

### Tier 3: Shared Utilities (Reusable Functions)

**Purpose:** Provide reusable detection and verification functions

**New utilities:**

**`lib/error-detection.sh`:**
```bash
check_permission_errors() {
    local log_file=$1
    if grep -qi "permission denied\|access denied\|insufficient permissions" "$log_file"; then
        return 1
    fi
    return 0
}

check_stderr_patterns() {
    local stderr_file=$1
    local patterns=("error" "failed" "permission denied" "not found" "cannot")
    for pattern in "${patterns[@]}"; do
        if grep -qi "$pattern" "$stderr_file"; then
            print_error "Detected '$pattern' in stderr"
            return 1
        fi
    done
    return 0
}

assert_no_stderr_errors() {
    local stderr_file="/tmp/cmd_stderr_$$.log"
    if [ -s "$stderr_file" ]; then
        check_stderr_patterns "$stderr_file" || return 1
    fi
}
```

**`lib/cleanup-verification.sh`:**
```bash
verify_clean_environment() {
    local env_before=$1
    local env_after=$(env | sort)
    
    # Verify no new environment variables added
    local new_vars=$(comm -13 <(echo "$env_before") <(echo "$env_after"))
    if [ -n "$new_vars" ]; then
        print_error "New environment variables detected: $new_vars"
        return 1
    fi
}

verify_path_restored() {
    local original_path=$1
    if [ "$PATH" != "$original_path" ]; then
        print_error "PATH not restored after removal"
        return 1
    fi
}

verify_no_orphaned_processes() {
    local extension_name=$1
    local processes=$(pgrep -f "$extension_name" || true)
    if [ -n "$processes" ]; then
        print_error "Orphaned processes found for $extension_name: $processes"
        return 1
    fi
}

assert_extension_directory_removed() {
    local ext_dir=$1
    if [ -d "$ext_dir" ]; then
        print_error "Extension directory still exists: $ext_dir"
        return 1
    fi
}
```

---

## Implementation Plan

### Phase 1: Shared Utilities (Week 1)
- [ ] Create `lib/error-detection.sh`
  - [ ] `check_permission_errors()`
  - [ ] `check_stderr_patterns()`
  - [ ] `assert_no_stderr_errors()`
  - [ ] `detect_silent_failures()`
- [ ] Create `lib/cleanup-verification.sh`
  - [ ] `verify_clean_environment()`
  - [ ] `verify_path_restored()`
  - [ ] `verify_no_orphaned_processes()`
  - [ ] `assert_extension_directory_removed()`
  - [ ] `assert_files_removed()`
  - [ ] `assert_directories_removed()`

### Phase 2: Enhanced API Compliance (Week 2)
- [ ] Enhance `test-api-compliance.sh`
  - [ ] Add `test_remove_function_compliance()`
  - [ ] Add pre/post state capture
  - [ ] Add generic cleanup verification
  - [ ] Add permission error detection
  - [ ] Integrate new shared utilities
- [ ] Create `test-upgrade-api.sh`
  - [ ] Test `upgrade()` function exists
  - [ ] Test upgrade idempotency
  - [ ] Test upgrade error handling
  - [ ] Test upgrade rollback
- [ ] Create `test-permissions.sh`
  - [ ] Test behavior with limited permissions
  - [ ] Test permission pre-flight checks
  - [ ] Test permission error reporting

### Phase 3: Per-Extension Tests (Week 3-4)
- [ ] Create `extensions/` directory
- [ ] Create extension-specific cleanup tests:
  - [ ] `extensions/nodejs-cleanup.sh`
  - [ ] `extensions/python-cleanup.sh`
  - [ ] `extensions/rust-cleanup.sh`
  - [ ] `extensions/golang-cleanup.sh`
  - [ ] `extensions/ruby-cleanup.sh`
  - [ ] `extensions/docker-cleanup.sh`
  - [ ] `extensions/php-cleanup.sh`
  - [ ] `extensions/jvm-cleanup.sh`
  - [ ] `extensions/dotnet-cleanup.sh`
  - [ ] `extensions/tmux-workspace-cleanup.sh`
  - [ ] (Add all 20+ extensions)

### Phase 4: Workflow Integration (Week 5)
- [ ] Update `.github/workflows/api-compliance.yml`
  - [ ] Add new test scripts
  - [ ] Expand extension matrix to all 20+ extensions
  - [ ] Add upgrade testing
  - [ ] Add permission testing
- [ ] Create `.github/workflows/per-extension-cleanup.yml`
  - [ ] Run per-extension cleanup tests
  - [ ] Matrix strategy for all extensions
- [ ] Update existing workflows
  - [ ] Enhance error detection in `extension-tests.yml`
  - [ ] Integrate permission checks in `integration.yml`

### Phase 5: Silent Failure Hardening (Week 6)
- [ ] Update `test-idempotency.sh`
  - [ ] Make stderr analysis mandatory
  - [ ] Fail tests on error patterns in stderr
  - [ ] Add state verification, don't trust exit codes alone
- [ ] Update all test scripts
  - [ ] Replace permissive error handling with strict checks
  - [ ] Add `set -o pipefail` to all scripts
  - [ ] Integrate `check_stderr_patterns()` everywhere

---

## Acceptance Criteria

### API Compliance Tests
- [ ] All 7 Extension API functions tested (including `upgrade()`)
- [ ] Generic cleanup verification implemented
- [ ] Permission error detection active
- [ ] Silent failure detection enhanced
- [ ] All 20+ extensions in test matrix

### Per-Extension Tests
- [ ] Cleanup test for each extension
- [ ] Extension-specific state verification
- [ ] Tool-specific cleanup validated
- [ ] Tests verify actual state changes, not just exit codes

### Shared Utilities
- [ ] Error detection utilities created
- [ ] Cleanup verification utilities created
- [ ] All test scripts use shared utilities
- [ ] Comprehensive documentation

### Quality Metrics
- [ ] Zero false positives (permission errors caught)
- [ ] Zero false negatives (silent failures detected)
- [ ] 100% extension coverage in API compliance tests
- [ ] All cleanup operations verified

### Documentation
- [ ] Update `.github/scripts/extension-tests/README.md`
- [ ] Document new utilities in `lib/`
- [ ] Add per-extension test guide
- [ ] Update CLAUDE.md with new testing approach

---

## Testing Strategy

### Test the Tests
- [ ] Verify permission error detection catches real permission issues
- [ ] Verify cleanup verification catches incomplete removals
- [ ] Verify silent failure detection catches errors with exit 0
- [ ] Test against known failure scenarios

### Validation
- [ ] Run enhanced tests against all extensions
- [ ] Verify no regressions in existing tests
- [ ] Validate tests catch real issues in extensions
- [ ] Performance impact acceptable (<10% increase in CI time)

---

## Risk Assessment

| Risk | Severity | Mitigation |
|------|----------|------------|
| False positives in error detection | MEDIUM | Careful pattern matching, whitelist known warnings |
| Increased CI time | LOW | Parallel execution, optimize checks |
| Breaking existing tests | MEDIUM | Incremental rollout, feature flags |
| Maintenance burden | LOW | Shared utilities reduce duplication |

---

## References

- Extension API Specification: `EXTENSION_API_V2.md`
- Current test infrastructure: `.github/scripts/extension-tests/`
- API compliance workflow: `.github/workflows/api-compliance.yml`
- Extension manager: `docker/lib/extension-manager.sh`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: enhance extension test coverage for removal, permissions, and silent failures #18

Problem Statement

Critical Issues

Evidence

Current Test Coverage Analysis

What We Test ✅

What We Don't Test ❌

Proposed Solution

Architecture: Three-Tier Testing

Tier 1: API Compliance Tests (Generic Contract)

Tier 2: Per-Extension Tests (Specific Verification)

Tier 3: Shared Utilities (Reusable Functions)

Implementation Plan

Phase 1: Shared Utilities (Week 1)

Phase 2: Enhanced API Compliance (Week 2)

Phase 3: Per-Extension Tests (Week 3-4)

Phase 4: Workflow Integration (Week 5)

Phase 5: Silent Failure Hardening (Week 6)

Acceptance Criteria

API Compliance Tests

Per-Extension Tests

Shared Utilities

Quality Metrics

Documentation

Testing Strategy

Test the Tests

Validation

Risk Assessment

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Aspect	Coverage	Location
Exit codes	Comprehensive	`lib/assertions.sh:149-171`
Command existence	Yes	`verify-commands.sh`
`validate()` function	Yes	`test-api-compliance.sh`
`status()` function	Yes	`test-api-compliance.sh`
Basic idempotency	Yes	`test-idempotency.sh`
Tool functionality	Partial	`test-key-functionality.sh`

Gap	Severity	Impact
`remove()` state verification	CRITICAL	Orphaned files/configs after uninstall
Permission error detection	CRITICAL	Silent installation failures
`upgrade()` function	CRITICAL	API v2.0 non-compliance
Environment cleanup	HIGH	PATH/aliases not restored
Tool-specific cleanup	HIGH	Caches/configs persist after removal
Dependency cleanup	MEDIUM	Orphaned dependencies
Rollback/recovery	MEDIUM	No recovery from partial failures
Cross-extension coverage	HIGH	Only 9/20+ extensions tested

Risk	Severity	Mitigation
False positives in error detection	MEDIUM	Careful pattern matching, whitelist known warnings
Increased CI time	LOW	Parallel execution, optimize checks
Breaking existing tests	MEDIUM	Incremental rollout, feature flags
Maintenance burden	LOW	Shared utilities reduce duplication

ci: enhance extension test coverage for removal, permissions, and silent failures #18

Description

Problem Statement

Critical Issues

Evidence

Current Test Coverage Analysis

What We Test ✅

What We Don't Test ❌

Proposed Solution

Architecture: Three-Tier Testing

Tier 1: API Compliance Tests (Generic Contract)

Tier 2: Per-Extension Tests (Specific Verification)

Tier 3: Shared Utilities (Reusable Functions)

Implementation Plan

Phase 1: Shared Utilities (Week 1)

Phase 2: Enhanced API Compliance (Week 2)

Phase 3: Per-Extension Tests (Week 3-4)

Phase 4: Workflow Integration (Week 5)

Phase 5: Silent Failure Hardening (Week 6)

Acceptance Criteria

API Compliance Tests

Per-Extension Tests

Shared Utilities

Quality Metrics

Documentation

Testing Strategy

Test the Tests

Validation

Risk Assessment

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions