Skip to content

Conversation

@zzstoatzz
Copy link
Collaborator

Closes #19317

Summary

Fixes automation actions that fail silently when background services run in read-only containers (common in rootless/secure deployments).

Root Cause

Background services (actions service) use OrchestrationClient for in-process API calls. OrchestrationClient was creating a full FastAPI app with UI support, which attempts to create the UI static directory. In read-only containers, this raises PermissionError, causing actions to fail and messages to go to DLQ with no error logs.

Changes

1. Skip UI creation for OrchestrationClient (Primary Fix)

File: src/prefect/server/api/clients.py

# Before
api_app = create_app()

# After  
api_app = create_app(ephemeral=True)
  • Background services use in-memory ASGI transport and don't need the UI
  • ephemeral=True skips UI directory creation
  • Prevents PermissionError in read-only containers

2. Add error logging for action failures

File: src/prefect/server/events/actions.py

  • Catch and log unexpected exceptions in action consumer
  • Provides visibility when actions fail and messages go to DLQ
  • Exceptions are still propagated to messaging system for proper retry handling

Testing

Unit Test Added

test_orchestration_client_works_with_readonly_ui_directory in tests/server/api/test_clients.py

  • Simulates read-only filesystem scenario
  • Fails before fix with PermissionError
  • Passes after fix - client created successfully

Manual Testing

  • Created HA Docker Compose setup with read-only background services
  • Reproduced original issue (actions in DLQ, no logs)
  • Verified fix resolves the issue
  • All 19 client tests pass

Impact

Fixes:

  • Automations now work in secure/rootless container deployments
  • Action failures are no longer silent - full error logging with context

No breaking changes:

  • Existing deployments unaffected
  • ephemeral=True is already used for ephemeral servers
  • Only changes behavior for OrchestrationClient (internal use only)

Related

  • User reported workaround: PREFECT_UI_STATIC_DIRECTORY=/writable/dir
  • This fix makes the workaround unnecessary

🤖 Generated with Claude Code

zzstoatzz and others added 2 commits October 31, 2025 13:54
Fixes #19317

## Changes

1. **Skip UI creation for OrchestrationClient** (primary fix)
   - `OrchestrationClient` now passes `ephemeral=True` to `create_app()`
   - Background services don't need the UI (they use in-memory ASGI transport)
   - Prevents PermissionError in read-only/rootless containers

2. **Add error logging for action failures**
   - Log unexpected exceptions in action consumer before they're retried
   - Provides visibility when actions fail and go to DLQ
   - Exceptions are still propagated to messaging system for proper retry handling

## Impact

- Automations now work in secure/rootless container deployments
- Action failures are no longer silent - errors are logged with full context
- No behavior change for existing deployments

## Testing

- Added regression test simulating read-only UI directory
- Verified with simple Python reproduction
- All existing client tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@github-actions github-actions bot added the bug Something isn't working label Oct 31, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 31, 2025

CodSpeed Performance Report

Merging #19319 will not alter performance

Comparing fix-19317-automation-readonly-failure (181ab54) with main (7d86c93)

Summary

✅ 2 untouched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automations 'run-deployment' action silently fails in HA when UI static dir is read-only

2 participants