feat(sdk): add retry-with-backoff for transient localization failures#2105
Conversation
📝 WalkthroughWalkthroughThe PR adds configurable exponential backoff retry logic to the SDK engine for transient localization failures. New ChangesLocalization Retry Resilience
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/sdk/src/index.spec.ts (1)
605-624: ⚡ Quick winAdd a regression test for abort occurring during retry backoff.
Current coverage checks already-aborted signals, but not cancellation that happens after the first failed attempt while waiting to retry. That case would guard the new backoff sleep path directly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/sdk/src/index.spec.ts` around lines 605 - 624, Add a regression test that simulates abort during retry backoff by exercising LingoDotDevEngine.localizeObject: configure engine with retryDelayMs > 0, have the first fetch attempt fail (mockFetch rejects), start the localization call, then trigger controller.abort() while the engine is awaiting the backoff sleep (use fake timers to advance time appropriately); assert the call rejects with an "aborted" error and that mockFetch was not called a second time. This targets the retry/backoff sleep path to ensure cancellation during backoff stops further retries.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/sdk/src/index.ts`:
- Around line 97-112: The sleep function can miss an abort that happens between
the early signal.aborted check and addEventListener; fix it by registering the
onAbort listener before checking signal.aborted, then immediately check
signal.aborted and if set call onAbort (or reject) to avoid the race; ensure you
still clear the timeout and remove the listener in both the timer callback and
the abort handler for proper cleanup (references: sleep, onAbort, timer).
---
Nitpick comments:
In `@packages/sdk/src/index.spec.ts`:
- Around line 605-624: Add a regression test that simulates abort during retry
backoff by exercising LingoDotDevEngine.localizeObject: configure engine with
retryDelayMs > 0, have the first fetch attempt fail (mockFetch rejects), start
the localization call, then trigger controller.abort() while the engine is
awaiting the backoff sleep (use fake timers to advance time appropriately);
assert the call rejects with an "aborted" error and that mockFetch was not
called a second time. This targets the retry/backoff sleep path to ensure
cancellation during backoff stops further retries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f23073a-203d-4da3-aeff-f8659b26ecf3
📒 Files selected for processing (3)
.changeset/yummy-snails-return.mdpackages/sdk/src/index.spec.tspackages/sdk/src/index.ts
| private static sleep(ms: number, signal?: AbortSignal): Promise<void> { | ||
| return new Promise((resolve, reject) => { | ||
| if (signal?.aborted) { | ||
| reject(new Error("Operation was aborted")); | ||
| return; | ||
| } | ||
| const onAbort = () => { | ||
| clearTimeout(timer); | ||
| reject(new Error("Operation was aborted")); | ||
| }; | ||
| const timer = setTimeout(() => { | ||
| signal?.removeEventListener("abort", onAbort); | ||
| resolve(); | ||
| }, ms); | ||
| signal?.addEventListener("abort", onAbort, { once: true }); | ||
| }); |
There was a problem hiding this comment.
Fix abort-listener registration race in sleep.
An abort between the early signal.aborted check and addEventListener can be missed, causing delayed cancellation until the timeout elapses.
💡 Suggested patch
private static sleep(ms: number, signal?: AbortSignal): Promise<void> {
return new Promise((resolve, reject) => {
if (signal?.aborted) {
reject(new Error("Operation was aborted"));
return;
}
- const onAbort = () => {
- clearTimeout(timer);
- reject(new Error("Operation was aborted"));
- };
- const timer = setTimeout(() => {
- signal?.removeEventListener("abort", onAbort);
- resolve();
- }, ms);
- signal?.addEventListener("abort", onAbort, { once: true });
+ let timer: ReturnType<typeof setTimeout> | undefined;
+ const onAbort = () => {
+ if (timer) clearTimeout(timer);
+ signal?.removeEventListener("abort", onAbort);
+ reject(new Error("Operation was aborted"));
+ };
+ signal?.addEventListener("abort", onAbort, { once: true });
+ if (signal?.aborted) {
+ onAbort();
+ return;
+ }
+ timer = setTimeout(() => {
+ signal?.removeEventListener("abort", onAbort);
+ resolve();
+ }, ms);
});
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/sdk/src/index.ts` around lines 97 - 112, The sleep function can miss
an abort that happens between the early signal.aborted check and
addEventListener; fix it by registering the onAbort listener before checking
signal.aborted, then immediately check signal.aborted and if set call onAbort
(or reject) to avoid the race; ensure you still clear the timeout and remove the
listener in both the timer callback and the abort handler for proper cleanup
(references: sleep, onAbort, timer).
Summary
Add automatic retry-with-backoff for transient localization failures in the SDK, so a 5xx response or network blip no longer fails the whole request.
Changes
fetchWithRetrytoLingoDotDevEngine, used bylocalizeChunk: retries on 5xx responses and network errors with exponential backoff + full jitter.maxRetries(default3) andretryDelayMs(default500);maxRetries: 0disables retries.>= 500); 4xx responses and aborted requests (signal.aborted) are never retried.Testing
Business logic tests added:
maxRetrieson persistent 5xx (asserts initial attempt + N retries)Invalid request)fetch) and recoversmaxRetries: 0performs a single attempt with no retriesVisuals
Required for UI/UX changes:
N/A
Checklist
Closes N/A
Summary by CodeRabbit
New Features
maxRetries(default: 3) andretryDelayMs(default: 500ms) to control retry behavior.Tests