Enhancement: Improve finalizer removal diagnostics and provide safer override mechanism for HCP cleanup timeouts

## Summary

Following PR #1848, we need to enhance the finalizer removal mechanism to provide better diagnostics and a safer approach when HCP cleanup times out during E2E tests.

## Context

PR #1848 introduced `NukeHostedCluster()` which blindly removes ALL finalizers when deletion times out. While this unblocks E2E tests, it bypasses critical cleanup logic managed by various HyperShift components.

## Current Issues

The current implementation removes all finalizers without understanding:
- Why cleanup is failing or taking too long
- Which specific finalizer is blocking deletion
- What resources might be left behind

## Proposed Enhancements

1. **Enhanced Diagnostics Before Forceful Removal**
   - Log which finalizers are still present
   - Query and log the status of resources each finalizer is protecting
   - Attempt to identify the specific blocker

2. **Graduated Finalizer Removal**
   - Instead of removing all finalizers at once, remove them individually
   - Log what each finalizer was protecting before removal
   - Allow configuration of which finalizers can be safely force-removed

3. **Timeout Configuration**
   - Make cleanup timeout configurable per finalizer type
   - Different finalizers may need different grace periods

4. **Post-Removal Report**
   - Generate a report of potentially orphaned resources
   - Include cloud provider resources that may incur costs

## Implementation Suggestions

```go
// Example enhancement to NukeHostedCluster
func NukeHostedCluster(h *helper.H, hc *hyperv1.HostedCluster) error {
    // First, diagnose why deletion is blocked
    diagnostics := diagnoseFinalizerBlockage(h, hc)
    h.Logger.Info("Finalizer diagnostics", "report", diagnostics)
    
    // Attempt graceful removal of known safe finalizers first
    safeFinalizers := []string{
        "openshift.io/destroy-cluster",
        // Add other known safe finalizers
    }
    
    // Remove finalizers individually with logging
    for _, finalizer := range hc.GetFinalizers() {
        h.Logger.Warn("Force removing finalizer", 
            "finalizer", finalizer,
            "potentialImpact", getFinalizerImpact(finalizer))
        // Remove individual finalizer
    }
    
    // Generate orphaned resources report
    report := generateOrphanedResourcesReport(h, hc)
    h.Logger.Error("Potential orphaned resources after forced cleanup", "report", report)
}
```

## Identified Finalizers and Their Risks

Based on HyperShift codebase analysis, removing these finalizers can cause:

- `hypershift.openshift.io/finalizer`: Main cleanup orchestration - may leave cloud resources
- `hypershift.io/aws-oidc-discovery`: AWS OIDC documents remain
- `hypershift.openshift.io/karpenter-finalizer`: Running EC2 instances may be orphaned
- `hypershift.openshift.io/control-plane-operator-finalizer`: AWS PrivateLink endpoints remain

## Expected Benefits

1. Better understanding of cleanup failures
2. Reduced risk of orphaned resources
3. Improved debugging capabilities for E2E test failures
4. Cost savings by identifying orphaned cloud resources

## Related Issues/PRs

- PR #1848: Original implementation of `NukeHostedCluster()`
- Related HyperShift finalizer handling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancement: Improve finalizer removal diagnostics and provide safer override mechanism for HCP cleanup timeouts #1852

Summary

Context

Current Issues

Proposed Enhancements

Implementation Suggestions

Identified Finalizers and Their Risks

Expected Benefits

Related Issues/PRs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhancement: Improve finalizer removal diagnostics and provide safer override mechanism for HCP cleanup timeouts #1852

Description

Summary

Context

Current Issues

Proposed Enhancements

Implementation Suggestions

Identified Finalizers and Their Risks

Expected Benefits

Related Issues/PRs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions