-
Notifications
You must be signed in to change notification settings - Fork 586
Closed
Copy link
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releaseperformancePerformance related itemsPerformance related items
Milestone
Description
Offload CPU-bound crypto (Argon2/Fernet) to threadpool
Problem
Argon2 password hashing and Fernet encryption operations are called synchronously in async request handlers, blocking the event loop for 100-500ms per operation. This prevents the worker from processing any other requests during that time.
Measured impact:
- Login endpoint P95 latency: 370ms (vs 50ms baseline for non-blocking endpoints)
- Login endpoint P99 latency: 600ms
- Argon2 verify: ~130ms per call
- Argon2 hash: ~160ms per call
Root Cause
The argon2-cffi library and Fernet encryption use CPU-intensive operations that block the Python event loop when called directly from async handlers. These need to be offloaded to a threadpool using asyncio.to_thread().
Affected Files and Call Sites
Password Hashing (Argon2id) - 10 call sites
| File | Line | Function | Call | Impact |
|---|---|---|---|---|
services/email_auth_service.py |
319 | create_user |
hash_password() |
Medium |
services/email_auth_service.py |
410 | authenticate_user |
verify_password() |
High |
services/email_auth_service.py |
481 | change_password |
verify_password() |
Medium |
services/email_auth_service.py |
487 | change_password |
hash_password() |
Medium |
services/email_auth_service.py |
550 | ensure_platform_admin |
verify_password() |
Low |
services/email_auth_service.py |
551 | ensure_platform_admin |
hash_password() |
Low |
services/email_auth_service.py |
689 | create_users_bulk |
hash_password() |
Medium |
routers/email_auth.py |
244 | login |
verify_password() |
High |
routers/email_auth.py |
626 | update_email_user |
hash_password() |
Low |
admin.py |
3001 | admin_ui |
verify_password() |
Low |
Encryption Service (Argon2id KDF + Fernet) - 18 call sites
| File | Line | Function | Call | Impact |
|---|---|---|---|---|
services/oauth_manager.py |
225 | _client_credentials_flow |
decrypt_secret() |
High |
services/oauth_manager.py |
316 | _authorization_code_flow |
decrypt_secret() |
High |
services/oauth_manager.py |
433 | _refresh_token_flow |
decrypt_secret() |
High |
services/oauth_manager.py |
1009 | validate_token |
decrypt_secret() |
High |
services/token_storage_service.py |
100 | store_token |
encrypt_secret() |
Medium |
services/token_storage_service.py |
102 | store_token |
encrypt_secret() |
Medium |
services/token_storage_service.py |
164 | get_access_token |
decrypt_secret() |
High |
services/token_storage_service.py |
202 | refresh_token |
decrypt_secret() |
Medium |
services/token_storage_service.py |
212 | refresh_token |
decrypt_secret() |
Medium |
services/token_storage_service.py |
235 | refresh_token |
encrypt_secret() |
Medium |
services/token_storage_service.py |
236 | refresh_token |
encrypt_secret() |
Medium |
services/sso_service.py |
75 | _encrypt_secret |
encrypt_secret() |
Low |
services/sso_service.py |
86 | _decrypt_secret |
decrypt_secret() |
Low |
services/dcr_service.py |
174 | register_client |
encrypt_secret() |
Low |
services/dcr_service.py |
177 | register_client |
encrypt_secret() |
Low |
services/dcr_service.py |
264 | read_client |
decrypt_secret() |
Low |
services/dcr_service.py |
281 | update_client |
encrypt_secret() |
Low |
services/dcr_service.py |
317 | update_client |
decrypt_secret() |
Low |
routers/oauth_router.py |
119 | token |
decrypt_secret() |
High |
admin.py |
7683 | admin_create_tool |
encrypt_secret() |
Low |
admin.py |
7720 | admin_create_tool |
encrypt_secret() |
Low |
admin.py |
8024 | admin_update_tool |
encrypt_secret() |
Low |
admin.py |
8061 | admin_update_tool |
encrypt_secret() |
Low |
admin.py |
11589 | admin_create_resource |
encrypt_secret() |
Low |
admin.py |
11626 | admin_create_resource |
encrypt_secret() |
Low |
admin.py |
11908 | admin_update_resource |
encrypt_secret() |
Low |
admin.py |
11945 | admin_update_resource |
encrypt_secret() |
Low |
Proposed Solution
Step 1: Add async wrappers to service classes
# In mcpgateway/services/argon2_service.py
import asyncio
async def hash_password_async(self, password: str) -> str:
"""Async wrapper that offloads hashing to threadpool."""
return await asyncio.to_thread(self.hash_password, password)
async def verify_password_async(self, password: str, hash_value: str) -> bool:
"""Async wrapper that offloads verification to threadpool."""
return await asyncio.to_thread(self.verify_password, password, hash_value)# In mcpgateway/services/encryption_service.py
import asyncio
async def encrypt_secret_async(self, plaintext: str) -> str:
"""Async wrapper that offloads encryption to threadpool."""
return await asyncio.to_thread(self.encrypt_secret, plaintext)
async def decrypt_secret_async(self, bundle_json: str) -> Optional[str]:
"""Async wrapper that offloads decryption to threadpool."""
return await asyncio.to_thread(self.decrypt_secret, bundle_json)Step 2: Update all call sites to use async wrappers
Example change in email_auth_service.py:
# Before
is_valid = self.password_service.verify_password(password, user.password_hash)
# After
is_valid = await self.password_service.verify_password_async(password, user.password_hash)Files to Modify
mcpgateway/services/argon2_service.py- Add async wrappersmcpgateway/services/encryption_service.py- Add async wrappersmcpgateway/services/email_auth_service.py- 6 call sitesmcpgateway/services/oauth_manager.py- 4 call sitesmcpgateway/services/token_storage_service.py- 7 call sitesmcpgateway/services/sso_service.py- 2 call sitesmcpgateway/services/dcr_service.py- 5 call sitesmcpgateway/routers/email_auth.py- 2 call sitesmcpgateway/routers/oauth_router.py- 1 call sitemcpgateway/admin.py- 9 call sites
Testing
Run load test with password authentication enabled:
PASSWORD_AUTH_EMAIL=admin@example.com \
PASSWORD_AUTH_PASSWORD=changeme \
make load-test-uiSuccess criteria:
- Non-auth endpoint P99 latency should NOT degrade during auth-heavy load
- Login latency will remain ~370-600ms (Argon2 time is unavoidable, but now non-blocking)
Priority
High for deployments using password authentication or OAuth flows.
Medium for JWT-only deployments.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
SHOULDP2: Important but not vital; high-value items that are not crucial for the immediate releaseP2: Important but not vital; high-value items that are not crucial for the immediate releaseperformancePerformance related itemsPerformance related items