Skip to content

feature: add support for ErrorVisibilityTimeout and job retention #573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Nov 10, 2024

Conversation

nickdnk
Copy link
Member

@nickdnk nickdnk commented Oct 26, 2024

Reason for This PR

Replacement PR for #571.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

Summary by CodeRabbit

  • New Features

    • Introduced new configuration files for SQS with enhanced settings and reduced complexity.
    • Added a new function to delete specified SQS queues in the helper package.
    • Introduced a new job handling script that responds with the approximate receive count.
  • Bug Fixes

    • Updated error handling and assertions in various tests for improved clarity and accuracy.
  • Chores

    • Removed deprecated parameters and simplified existing configurations across multiple YAML files.
    • Updated dependency versions in the Go module for improved stability and performance.

Remove unused PHP test files and cleaned op the remaining ones
Added options error_visibility_timeout and retain_failed_jobs to SQS driver
Locked opentelemetry docker image to a version that works with tests
Created common method for Nack to deduplicate code
Adjusted durability test from GreaterThan to Equal
Fixed wrong order of expected vs. actual param in some tests
Fix yaml newline
Reword some comments
Adjust error message
change if-else to switch
@nickdnk nickdnk added bug Something isn't working enhancement New feature or request go Pull requests that update Go code labels Oct 26, 2024
@nickdnk nickdnk self-assigned this Oct 26, 2024
Copy link

coderabbitai bot commented Oct 26, 2024

Walkthrough

The pull request introduces several changes across multiple files, primarily focusing on the SQS workflow configurations, schema definitions, and job handling logic. Key modifications include the addition of concurrency settings, environment variables, and adjustments to job parameters in various YAML configuration files. The schema.json file has been updated to remove specific properties and clarify requirements. Additionally, several test files have been enhanced with cleanup procedures and updated assertions to improve resource management and accuracy. New helper functions and configuration files have also been introduced to support the updated functionality.

Changes

File Path Change Summary
.github/workflows/linux.yml Added branch restrictions for push and pull_request events; introduced concurrency settings; added environment variable RR_SQS_TEST_ACCOUNT_ID.
.github/workflows/linux_durability.yml Updated branch filters for push and pull_request events; added concurrency settings.
schema.json Removed consume_all property; updated descriptions for key and secret properties.
sqsjobs/config.go Enhanced Config struct with new fields; added validation logic for new fields.
sqsjobs/driver.go Added new fields to Driver struct; updated initialization logic for new fields.
sqsjobs/item.go Enhanced Options struct; refactored acknowledgment methods.
sqsjobs/listener.go Updated ReceiveMessageInput parameters; refined error handling logic.
tests/configs/.rr-sqs-attr.yaml Simplified SQS settings; removed several parameters; updated prefetch and queue configurations.
tests/configs/.rr-sqs-auto-ack.yaml Introduced new configuration file with settings for SQS auto acknowledgment.
tests/configs/.rr-sqs-declare.yaml Simplified server and SQS settings; removed multiple parameters.
tests/configs/.rr-sqs-jobs-err.yaml Removed relay and relay_timeout; simplified SQS settings.
tests/helpers/helpers.go Added DeleteQueues function for deleting specified SQS queues.
tests/jobs_sqs_durability_test.go Updated assertions with custom error messages.
tests/jobs_sqs_fifo_test.go Added cleanup functions; updated configuration paths and test assertions.
tests/jobs_sqs_test.go Enhanced cleanup procedures; adjusted test configurations and assertions; added new test function.
tests/php_test_files/jobs/jobs_bad_resp.php Streamlined error handling logic; removed unnecessary comments.
tests/php_test_files/jobs/jobs_respond_with_count.php Introduced new file for handling job consumption and response count.

Poem

In the fields where queues do play,
A rabbit hops, brightening the day.
With settings new and paths refined,
Our SQS flows are now aligned.
So let us cheer, with hops and bounds,
For cleaner code and joyful sounds! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

rustatian and others added 9 commits October 26, 2024 18:11
Signed-off-by: Valery Piashchynski <[email protected]>
Signed-off-by: Valery Piashchynski <[email protected]>
Signed-off-by: Valery Piashchynski <[email protected]>
…ined in configuration

Include prefetch as the param to ReceiveMessage
Ensure prefetch is within allowed limits
Signed-off-by: Valery Piashchynski <[email protected]>
@nickdnk
Copy link
Member Author

nickdnk commented Oct 28, 2024

This must be merged before roadrunner-server/jobs#135 or tests in main repo will fail schema validation.

@rustatian
Copy link
Member

This must be merged before roadrunner-server/jobs#135 or tests in main repo will fail schema validation.

Done, thanks @nickdnk 👍

@nickdnk nickdnk marked this pull request as ready for review October 29, 2024 00:52
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Outside diff range and nitpick comments (21)
tests/configs/.rr-sqs-stat.yaml (1)

Line range hint 1-15: Document the removed configuration parameters.

Several configuration parameters have been removed (relay, relay_timeout, num_pollers, pipeline_size, allocate_timeout, destroy_timeout). These changes might impact existing deployments.

Consider:

  1. Adding a migration guide for users updating from previous versions
  2. Documenting any new default values or alternative configuration methods
  3. Updating the changelog to reflect these breaking changes
tests/configs/.rr-sqs-declare_fifo.yaml (1)

Documentation update needed for configuration changes

The verification reveals that the configuration changes are significant and not properly documented:

  1. The README.md only contains a link to external documentation
  2. The old parameters (relay, relay_timeout, num_pollers, pipeline_size) have been removed from the configuration
  3. The new configuration structure in sqsjobs/config.go shows a completely different set of parameters focusing on AWS SQS-specific settings

Action items:

  • Update the external documentation at docs.roadrunner.dev
  • Add a migration guide in the repository's README.md
  • Document the new configuration parameters:
    • visibility_timeout
    • error_visibility_timeout
    • retain_failed_jobs
    • wait_time_seconds
    • prefetch
    • message_group_id
    • and other AWS SQS-specific settings
🔗 Analysis chain

Line range hint 1-17: Documentation update needed for configuration changes.

Several configuration parameters have been removed (relay, relay_timeout, num_pollers, pipeline_size, etc.). Please ensure these changes are reflected in the documentation and CHANGELOG.md, explaining:

  1. Which parameters were removed and why
  2. How to migrate from the old configuration format
  3. Where the removed settings should be configured now (if applicable)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if changes are documented
echo "Checking for documentation updates..."
rg -i "configuration|settings|parameters" CHANGELOG.md
# Look for any remaining uses of old parameters
echo "Checking for old parameter usage in other config files..."
fd -e yaml -e yml | xargs rg -i "relay_timeout|pipeline_size|allocate_timeout|destroy_timeout"

Length of output: 416


Script:

#!/bin/bash
# Check for documentation files in different locations
echo "Searching for documentation files..."
fd -t f -e md

# Check for configuration examples and docs
echo "Searching for configuration examples..."
fd -e yaml -e yml

# Look for old parameter references in code
echo "Checking for old parameter references in code..."
rg -i "relay_timeout|pipeline_size|allocate_timeout|destroy_timeout" -g "!tests/*"

# Check for documentation in code comments
echo "Checking for documentation in code comments..."
rg -i "configuration|settings|parameters" -g "*.go" -g "*.md"

Length of output: 1565


Script:

#!/bin/bash
# Check the content of README.md for configuration documentation
echo "Checking README.md content..."
cat README.md

# Check the old vs new configuration structure
echo "Checking old vs new configuration examples..."
cat tests/configs/.rr-sqs-declare_fifo.yaml
echo "---"
cat tests/env/docker-compose-sqs-local.yaml

# Check for configuration-related code changes
echo "Checking configuration struct..."
cat sqsjobs/config.go

Length of output: 9001

tests/php_test_files/jobs/jobs_respond_with_count.php (1)

1-1: Add PHPDoc documentation for test clarity.

The test file would benefit from clear documentation explaining its purpose and expected behavior.

Add this documentation at the start of the file:

 <?php
+/**
+ * Test file for verifying SQS message receive count functionality.
+ * 
+ * This test simulates a failing consumer that tracks how many times
+ * a message has been received through the ApproximateReceiveCount header.
+ * Each message is intentionally nack'd to trigger redelivery.
+ */
tests/php_test_files/jobs/jobs_bad_resp.php (1)

14-15: Consider enhancing error reporting with additional context.

While the current error reporting is functional, consider including more context in the error message to aid debugging.

-        $rr->error((string)$e);
+        $rr->error(sprintf('[Bad Response Test] %s: %s', get_class($e), $e->getMessage()));
tests/configs/.rr-sqs-init.yaml (1)

24-26: Document the performance implications of configuration changes.

The changes look good and appear to optimize resource usage:

  • Reduced prefetch (1000 → 10) prevents memory overload
  • Increased wait_time_seconds (0 → 10) reduces API calls through long-polling

Consider adding a comment in the file or updating documentation to explain:

  1. Why these specific values were chosen
  2. The performance implications of these settings
tests/configs/.rr-sqs-attr.yaml (1)

24-27: Improved SQS configuration parameters

The changes to the pipeline configuration show good improvements:

  • Reduced prefetch (10) is more appropriate for testing and prevents memory pressure
  • Increased wait_time_seconds (10) implements long polling, reducing API calls and costs
  • Queue naming correctly follows FIFO requirements with proper attributes

For production environments, consider tuning these values based on:

  • Message processing time
  • Desired throughput
  • Cost considerations (API calls)
tests/configs/.rr-sqs-init-br.yaml (1)

9-9: Document credential management approach

The empty SQS configuration block suggests credentials are managed externally (e.g., through environment variables or IAM roles). Consider adding a comment or documentation to clarify how credentials should be configured in different environments.

tests/configs/.rr-sqs-pq.yaml (1)

25-27: Improved SQS configuration parameters.

The changes to the pipeline configuration show good practices:

  • Reduced prefetch value is more suitable for testing and prevents memory overhead
  • Increased wait_time_seconds enables long polling, which reduces API calls and associated costs

Consider documenting these values in comments, explaining:

  • Why prefetch=10 is suitable for testing
  • The benefits of wait_time_seconds=10 for long polling
tests/configs/.rr-sqs-init_fifo.yaml (2)

36-37: Consider unique MessageGroupId per pipeline

Both pipelines share the same message_group_id: 'RR'. While this works for testing, in production scenarios, consider using distinct MessageGroupIds if the pipelines process different types of messages or require separate ordering contexts.


Line range hint 1-43: Consider documenting SQS configuration requirements

The configuration demonstrates good practices for SQS FIFO queues, but consider adding a README or comments explaining:

  • Required AWS permissions and credential setup
  • FIFO queue specific requirements (MessageGroupId, .fifo suffix)
  • Rationale for configuration values (prefetch, wait_time_seconds)

This would help other developers understand the test setup and requirements.

tests/configs/.rr-sqs-init_fifo_auto_ack.yaml (1)

17-18: Consider increasing worker count for optimal throughput.

With a prefetch count of 10 messages per pipeline and 2 FIFO queues, 2 workers might become a bottleneck. Consider increasing num_workers to match or exceed the total prefetch capacity (20 messages) for optimal throughput.

tests/configs/.rr-sqs-durability-redial.yaml (2)

Line range hint 9-13: LGTM! Consider enhancing the comment.

The addition of SQS credentials for testing is appropriate. However, the comment could be more descriptive.

-sqs: # credentials required for redial, or AWS SDK complains about missing credentials on init
+sqs: # Test credentials required for redial functionality. AWS SDK requires credentials during initialization,
+     # even when using a local endpoint for testing.

40-41: Document the configuration differences between pipelines.

Both pipelines have similar configurations except for the VisibilityTimeout attribute. Consider adding a comment explaining why test-2 pipeline doesn't need the VisibilityTimeout setting, to help other developers understand the test scenarios.

tests/configs/.rr-sqs-init_fifo-prefetch.yaml (1)

38-43: Consider explicit VisibilityTimeout for consistency.

While test-1 explicitly sets VisibilityTimeout to 10s, test-2 doesn't specify this attribute. For consistency and predictable behavior across test pipelines, consider either:

  1. Adding explicit VisibilityTimeout to test-2
  2. Documenting why test-2 uses default timeout
tests/jobs_sqs_durability_test.go (2)

Line range hint 1-127: Consider improving test reliability and documentation.

While the test logic is sound, there are a few suggestions to enhance its robustness and maintainability:

  1. Consider replacing fixed sleep durations with polling/waiting for specific conditions to make the test more reliable.
  2. Add comments explaining the test strategy and what each phase is testing.
  3. Consider implementing proper queue cleanup after the test.

Here's a suggested improvement for the documentation:

func TestDurabilitySQS(t *testing.T) {
+	// This test verifies the system's ability to handle network disruptions when working with SQS:
+	// 1. Push messages during network failure (should be queued locally)
+	// 2. Restore network connection
+	// 3. Verify all messages are delivered and processed
+	// 4. Push more messages after recovery
+	// Expected: All messages should be processed exactly once

And for the sleep-based waits:

-	time.Sleep(time.Second * 5)
+	// Helper function to wait for condition with timeout
+	waitForCondition := func(condition func() bool, timeout time.Duration) error {
+		start := time.Now()
+		for {
+			if condition() {
+				return nil
+			}
+			if time.Since(start) > timeout {
+				return fmt.Errorf("condition not met after %v", timeout)
+			}
+			time.Sleep(100 * time.Millisecond)
+		}
+	}
+
+	// Wait for proxy to be fully enabled and connection restored
+	err = waitForCondition(func() bool {
+		// Check if connection is established
+		return true // implement actual condition
+	}, 5*time.Second)
+	require.NoError(t, err)

SQS queues are not properly cleaned up after the test

The test uses SQS queues sqs-durability-redial-1 and sqs-durability-redial-2, but only calls DestroyPipelines which destroys the RPC pipelines. It doesn't call helpers.DeleteQueues() to clean up the actual SQS queues, unlike other SQS tests in the codebase. This can lead to resource leaks.

Add the following cleanup after wg.Wait():

helpers.DeleteQueues(t, "sqs-durability-redial-1", "sqs-durability-redial-2")
🔗 Analysis chain

Line range hint 1-127: Verify queue cleanup implementation.

The test should ensure proper cleanup of SQS queues after completion to prevent resource leaks.

Let's check if there's a queue cleanup implementation:


Let me check the test file to see if it's using the queue cleanup functionality:


Let me check what DestroyPipelines does and if it handles queue cleanup:


Let me check the configuration file to understand the queue names being used:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for queue cleanup implementation in test helpers

# Test: Look for queue cleanup related code
rg -A 5 "DeleteQueue|cleanup.*sqs|destroy.*queue" tests/

Length of output: 5531


Script:

#!/bin/bash
# Check if the test file uses DeleteQueues or any cleanup mechanism
rg -A 5 "test-1|test-2" tests/jobs_sqs_durability_test.go

Length of output: 797


Script:

#!/bin/bash
# Check DestroyPipelines implementation
rg -A 10 "func DestroyPipelines" tests/helpers/

Length of output: 711


Script:

#!/bin/bash
# Check the SQS configuration
cat tests/configs/.rr-sqs-durability-redial.yaml

Length of output: 988

tests/jobs_sqs_fifo_test.go (1)

521-528: Consider using WaitGroup for goroutine synchronization

While the concurrent push operations are good for testing, using a WaitGroup would ensure all 15 goroutines complete before proceeding with the test cleanup.

Here's how you could modify this:

+ wg := &sync.WaitGroup{}
+ wg.Add(15)
 for i := 0; i < 15; i++ {
   go func() {
+    defer wg.Done()
     t.Run("PushPipelineFifo", helpers.PushToPipe("test-1", false, "127.0.0.1:6232"))
     t.Run("PushPipelineFifo", helpers.PushToPipe("test-2", false, "127.0.0.1:6232"))
   }()
 }
+ wg.Wait()
 time.Sleep(time.Second * 60)
schema.json (2)

Line range hint 38-46: LGTM! Consider enhancing documentation for error handling configuration.

The new properties error_visibility_timeout and retain_failed_jobs are well-structured and their relationship is clearly defined. The schema correctly enforces that error_visibility_timeout is only meaningful when retain_failed_jobs is true.

Consider adding examples in the description to illustrate common use cases, such as implementing a dead-letter queue pattern with these settings.

🧰 Tools
🪛 Gitleaks

213-213: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 checkov

[HIGH] 213-214: AWS Access Key

(CKV_SECRET_2)


Line range hint 210-223: Security: Revise credential examples to use placeholder format.

While the current examples use clearly marked placeholders (e.g., "EXAMPLE"), it's recommended to use a more generic format that cannot be mistaken for real credentials.

Replace the credential examples with this format:

-            "ASIAIOSFODNN7EXAMPLE"
+            "<YOUR_AWS_ACCESS_KEY_ID>"
-            "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
+            "<YOUR_AWS_SECRET_ACCESS_KEY>"
🧰 Tools
🪛 Gitleaks

213-213: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 checkov

[HIGH] 213-214: AWS Access Key

(CKV_SECRET_2)

sqsjobs/config.go (1)

183-189: Include 'SessionToken' in test environment variable checks

In the test mode environment setup, the SessionToken is used to establish AWS sessions that require temporary credentials. However, the code currently does not check if RR_SQS_TEST_SESSION_TOKEN is set. If a session token is necessary for your tests, consider adding this check to ensure all required credentials are present.

Apply this diff to include the session token in the checks:

 		os.Getenv("RR_SQS_TEST_SECRET") == "" ||
 		os.Getenv("RR_SQS_TEST_ENDPOINT") == "" ||
 		os.Getenv("RR_SQS_TEST_ACCOUNT_ID") == "" {
+			os.Getenv("RR_SQS_TEST_SESSION_TOKEN") == "" {
 			panic("security check: test mode is enabled, but not all sqs environment parameters are set")
 		}
 		c.Region = os.Getenv("RR_SQS_TEST_REGION")
 		c.Key = os.Getenv("RR_SQS_TEST_KEY")
 		c.Secret = os.Getenv("RR_SQS_TEST_SECRET")
 		c.Endpoint = os.Getenv("RR_SQS_TEST_ENDPOINT")
+		c.SessionToken = os.Getenv("RR_SQS_TEST_SESSION_TOKEN")
 	}
sqsjobs/item.go (1)

21-21: Suggestion: Clarify the alias for the standard errors package

While aliasing the standard errors package as stderr can help differentiate it from the github.com/roadrunner-server/errors package, consider using a more descriptive alias like stdErrors for better readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d77acfb and db70b44.

⛔ Files ignored due to path filters (2)
  • go.work.sum is excluded by !**/*.sum
  • tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (42)
  • .github/workflows/linux.yml (2 hunks)
  • .github/workflows/linux_durability.yml (1 hunks)
  • schema.json (1 hunks)
  • sqsjobs/config.go (4 hunks)
  • sqsjobs/driver.go (3 hunks)
  • sqsjobs/item.go (8 hunks)
  • sqsjobs/listener.go (2 hunks)
  • tests/configs/.rr-sqs-attr.yaml (1 hunks)
  • tests/configs/.rr-sqs-auto-ack.yaml (1 hunks)
  • tests/configs/.rr-sqs-declare.yaml (1 hunks)
  • tests/configs/.rr-sqs-declare_fifo.yaml (1 hunks)
  • tests/configs/.rr-sqs-durability-redial.yaml (3 hunks)
  • tests/configs/.rr-sqs-init-br.yaml (1 hunks)
  • tests/configs/.rr-sqs-init-br_fifo.yaml (1 hunks)
  • tests/configs/.rr-sqs-init.yaml (1 hunks)
  • tests/configs/.rr-sqs-init_fifo-prefetch.yaml (1 hunks)
  • tests/configs/.rr-sqs-init_fifo.yaml (1 hunks)
  • tests/configs/.rr-sqs-init_fifo_auto_ack.yaml (1 hunks)
  • tests/configs/.rr-sqs-jobs-err.yaml (1 hunks)
  • tests/configs/.rr-sqs-jobs-err_fifo.yaml (1 hunks)
  • tests/configs/.rr-sqs-otel.yaml (1 hunks)
  • tests/configs/.rr-sqs-pq.yaml (1 hunks)
  • tests/configs/.rr-sqs-raw.yaml (1 hunks)
  • tests/configs/.rr-sqs-read-approximate-count.yaml (1 hunks)
  • tests/configs/.rr-sqs-stat.yaml (1 hunks)
  • tests/env/otel-collector-config.yml (1 hunks)
  • tests/go.mod (1 hunks)
  • tests/helpers/helpers.go (2 hunks)
  • tests/jobs_sqs_durability_test.go (1 hunks)
  • tests/jobs_sqs_fifo_test.go (9 hunks)
  • tests/jobs_sqs_test.go (19 hunks)
  • tests/php_test_files/jobs/jobs_bad_resp.php (1 hunks)
  • tests/php_test_files/jobs/jobs_create_memory.php (0 hunks)
  • tests/php_test_files/jobs/jobs_err.php (1 hunks)
  • tests/php_test_files/jobs/jobs_ok.php (1 hunks)
  • tests/php_test_files/jobs/jobs_ok_pq.php (1 hunks)
  • tests/php_test_files/jobs/jobs_ok_queue_name_exist.php (0 hunks)
  • tests/php_test_files/jobs/jobs_ok_sleep1.php (1 hunks)
  • tests/php_test_files/jobs/jobs_ok_slow.php (0 hunks)
  • tests/php_test_files/jobs/jobs_ok_slow_rand.php (0 hunks)
  • tests/php_test_files/jobs/jobs_respond_with_count.php (1 hunks)
  • tests/php_test_files/jobs/jobs_send.php (0 hunks)
💤 Files with no reviewable changes (5)
  • tests/php_test_files/jobs/jobs_create_memory.php
  • tests/php_test_files/jobs/jobs_ok_queue_name_exist.php
  • tests/php_test_files/jobs/jobs_ok_slow.php
  • tests/php_test_files/jobs/jobs_ok_slow_rand.php
  • tests/php_test_files/jobs/jobs_send.php
🧰 Additional context used
🪛 Gitleaks
schema.json

213-213: Identified a pattern that may indicate AWS credentials, risking unauthorized cloud resource access and data breaches on AWS platforms.

(aws-access-token)

🪛 checkov
schema.json

[HIGH] 213-214: AWS Access Key

(CKV_SECRET_2)

🔇 Additional comments (75)
tests/php_test_files/jobs/jobs_ok.php (1)

9-9: Verify the intentional removal of error handling around task acknowledgment.

The error handling for task acknowledgment has been removed. While this aligns with the simplified approach mentioned in the PR summary, we should verify if this is the intended behavior for this test case.

Consider adding a comment explaining that this is a "happy path" test case without error handling, to make the test's purpose clear:

 while ($task = $consumer->waitTask()) {
+    // Happy path test case - direct acknowledgment without error handling
     $task->ack();
 }
✅ Verification successful

Removal of error handling is consistent with test file patterns

Looking at the test files, we can see that:

  • Multiple "happy path" test files (jobs_ok.php, jobs_ok_pq.php, jobs_ok_sleep1.php) use direct ack() calls without error handling
  • Error handling with try-catch is only present in specific error test cases (jobs_bad_resp.php)
  • Error scenarios are tested separately in dedicated files (jobs_err.php, jobs_respond_with_count.php with nack)

This indicates that the removal of error handling is intentional and follows the established pattern where:

  • Happy path tests use direct acknowledgment
  • Error scenarios are covered in separate, dedicated test files
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if other test files follow similar pattern of removing error handling
# and if there's any documentation explaining this change

# Look for similar job test files and their error handling patterns
echo "Checking error handling patterns in other job test files:"
rg -A 5 -B 5 "->ack\(\)" "tests/php_test_files/jobs/"

# Look for documentation about error handling approach
echo -e "\nChecking for documentation about error handling:"
rg -l "error.*handling|exception.*handling" "docs/" "README.md"

Length of output: 804


Script:

#!/bin/bash
# Let's try a different approach to search for ack() calls and error handling patterns
# Using simpler pattern without special characters

# Look for ack calls in job test files
echo "Checking ack patterns in job test files:"
rg "ack\(" "tests/php_test_files/jobs/"

# Look for try-catch blocks in the same directory
echo -e "\nChecking error handling patterns:"
rg "try|catch" "tests/php_test_files/jobs/"

# Check if there are any other test files that might show the pattern
echo -e "\nListing all files in jobs test directory:"
ls -la tests/php_test_files/jobs/

Length of output: 1345

tests/configs/.rr-sqs-declare.yaml (3)

Line range hint 1-17: Configuration structure looks good.

The configuration maintains a clean structure with appropriate debug logging and worker pool settings for testing purposes.


Line range hint 1-17: Verify impact of removed configuration parameters.

The configuration has been significantly simplified with several parameters removed. We should verify that:

  1. The removed timeouts don't affect error handling
  2. The removal of pipeline and worker configurations doesn't impact test reliability
#!/bin/bash
# Description: Check for any remaining references to removed parameters

# Test 1: Check for any remaining references to removed parameters
echo "Checking for references to removed parameters..."
rg -l "relay_timeout|pipeline_size|allocate_timeout|destroy_timeout" tests/

# Test 2: Check test files for potential timing issues
echo "Checking test files for timing assertions..."
rg "sleep|usleep|timeout" tests/

9-9: Verify AWS credentials handling with empty SQS configuration.

The empty SQS configuration suggests reliance on environment variables or AWS default credentials. While this is a more secure approach than hardcoding credentials, we should ensure:

  1. The test environment is properly configured
  2. Documentation reflects the credential requirements
✅ Verification successful

Let me gather more information about the SQS configuration and credentials handling.


Empty SQS configuration is valid - credentials handled via environment variables

The empty SQS configuration is intentional and correct. The test environment properly handles AWS credentials through environment variables:

  • Tests use RR_SQS_TEST_KEY environment variable with credentials.NewStaticCredentialsProvider
  • Credential handling is consistently implemented across test files
  • The simplified configuration aligns with security best practices by avoiding hardcoded credentials
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if documentation or other test files mention AWS credential requirements

# Test 1: Search for AWS credential documentation
echo "Searching for AWS credential documentation..."
rg -l "AWS" --type md

# Test 2: Check for environment variable usage in test files
echo "Checking for AWS credential environment variables in tests..."
rg "AWS_" --type php tests/

Length of output: 1264


Script:

#!/bin/bash
# Check for SQS configuration and credential handling

# Look for SQS configuration patterns in test files
echo "Checking SQS configuration patterns in tests..."
rg "NewSQS|sqs\.Config" tests/

# Look for credential handling in test setup
echo "Checking test setup for credential handling..."
rg "credentials|\.rr-sqs" tests/

# Check for environment variable setup in test files
echo "Looking for environment setup in test files..."
fd "bootstrap" --type f --exec rg "environ|getenv|AWS" {}

Length of output: 2400

tests/configs/.rr-sqs-stat.yaml (1)

9-9: Verify the empty SQS configuration behavior.

The SQS configuration has been simplified to an empty object. This suggests that credentials and settings might be expected from environment variables or other sources.

Could you confirm:

  1. How users should now configure SQS credentials?
  2. If this change is documented in the README or configuration guide?
✅ Verification successful

Empty SQS configuration is intentional and follows AWS best practices

The empty SQS configuration {} is correct and intentional. The code shows that:

  • When no configuration is provided, the system automatically assumes it's running in an AWS environment and uses default credentials from the environment
  • This is explicitly logged: "No sqs plugin configuration section was found; assuming we're in AWS environment and want to use default values"
  • For local testing, the system uses environment variables (RR_SQS_TEST_*) to override AWS credentials
  • This approach aligns with AWS best practices of using environment-based credentials and IAM roles
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if documentation has been updated to reflect these changes
rg -l "sqs.*credentials|sqs.*configuration" docs/ README.md

Length of output: 107


Script:

#!/bin/bash
# Let's first check if there are any documentation files in the repository
fd -t f "README|CONFIGURATION|DOCS|\.md$" -i

# Then let's check for any SQS-related configuration examples
rg -i "sqs.*credentials|sqs.*configuration|aws.*credentials|aws.*configuration"

# Also check for any test files that might show the configuration usage
fd -e yaml -e yml | xargs rg "sqs:"

Length of output: 1702


Script:

#!/bin/bash
# Let's check the schema.json for SQS configuration structure
cat schema.json | jq -r '.properties.sqs'

# Let's also check for any environment variable documentation or usage
rg -A 3 "RR_SQS|AWS_" 

# Check the driver code for default behavior
rg -A 5 "assumeAWSEnv" sqsjobs/

Length of output: 3392

tests/configs/.rr-sqs-declare_fifo.yaml (1)

9-9: Verify the empty SQS configuration approach.

The empty SQS configuration suggests credentials are now handled differently, possibly through environment variables. This is a good security practice, but we should ensure this change is properly documented and tested.

✅ Verification successful

Empty SQS configuration is valid as credentials are handled via environment variables

Based on the test files analysis, the empty SQS configuration is intentional and correct. The credentials are properly managed through environment variables:

  • RR_SQS_TEST_KEY and RR_SQS_TEST_SECRET for AWS credentials
  • RR_SQS_TEST_REGION for AWS region
  • RR_SQS_TEST_ENDPOINT for SQS endpoint
  • RR_SQS_TEST_ACCOUNT_ID for AWS account identification

This approach aligns with security best practices by keeping sensitive credentials out of configuration files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if credentials are documented in README or environment setup
echo "Checking for credential documentation..."
rg -i "sqs.*credentials|aws.*credentials" README.md docs/
# Check for environment variable usage in test files
echo "Checking for AWS credential environment variables in tests..."
rg -i "aws.*key|aws.*secret|aws.*region" tests/

Length of output: 379


Script:

#!/bin/bash
# Let's check for any SQS or AWS configuration patterns in the codebase
echo "Checking for SQS configuration patterns..."
rg -i "sqs:|aws:" tests/configs/
rg -i "sqs:|aws:" .env.example .env.dist 2>/dev/null

# Check for environment variable references in PHP test files
echo "Checking for environment variables in test files..."
rg -i "getenv|env\[" tests/

# Look for documentation in other common locations
echo "Checking documentation in other locations..."
fd -i readme -e md
fd -i changelog -e md

Length of output: 1631


Script:

#!/bin/bash
# Let's check the content of the test configuration files to understand the changes
echo "Checking other SQS configuration files in tests..."
fd -e yaml -e yml . tests/configs/ -x cat {}

# Let's also check the README content for any configuration documentation
echo "Checking README content..."
cat README.md

Length of output: 314

tests/configs/.rr-sqs-jobs-err.yaml (2)

Line range hint 1-17: Review test coverage for removed configuration parameters.

The configuration has been significantly simplified, removing several parameters mentioned in the AI summary (relay, timeout, pipeline_size, etc.). While this simplification might be intentional, we should ensure that error scenarios for these removed parameters are still tested elsewhere if they remain supported features.

#!/bin/bash
# Description: Verify test coverage for removed configuration parameters

# Search for tests covering the removed parameters
echo "Checking test coverage for removed parameters:"
rg -l "relay_timeout|pipeline_size|allocate_timeout|destroy_timeout" tests/

# Check schema validation tests
echo "\nChecking schema validation tests:"
rg -l "schema.*validation|validate.*config" tests/

9-9: Verify the impact of empty SQS configuration on error testing.

The simplified SQS configuration { } raises concerns about how credentials and connection parameters are now handled for error scenario testing.

Consider documenting how SQS credentials should be provided (e.g., through environment variables) to ensure other developers can properly set up and run these tests.

tests/configs/.rr-sqs-jobs-err_fifo.yaml (2)

Line range hint 1-16: Verify impact of removed configuration parameters.

Several important configuration parameters have been removed (relay, timeout, pipeline_size, etc.). Please ensure:

  1. Default values are properly documented
  2. Removal doesn't impact error handling (especially given this is an error test configuration)
  3. Job processing behavior remains consistent

Let's check for related test modifications:

#!/bin/bash
# Look for test files that verify behavior with these settings
rg -A 5 "jobs_err\.php" tests/
# Check for any error handling tests
ast-grep --pattern 'func Test.*Error.*\(.*\) {
  $$$
}'

9-9: Document the expected SQS configuration source.

The empty SQS configuration block suggests credentials and settings are managed elsewhere. Consider adding a comment to clarify where users should configure these settings (e.g., environment variables, external configuration).

Let's verify the documentation and default values:

tests/php_test_files/jobs/jobs_ok_sleep1.php (1)

9-10: Verify the intentional removal of error handling.

The removal of the try-catch block around task processing could impact error visibility and task recovery. While this might be intentional as part of the ErrorVisibilityTimeout feature, please confirm:

  1. Is error handling being moved to a higher level?
  2. How will task failures be tracked and retried?
tests/php_test_files/jobs/jobs_ok_pq.php (2)

9-10: Verify: Is removing error handling intentional?

The removal of the try-catch block means that any exceptions during sleep() or $task->ack() will now propagate up. While this might be intentional for testing ErrorVisibilityTimeout scenarios, it could make test failures harder to debug.

Consider adding minimal error logging to help with debugging while still allowing exceptions to propagate:

-    sleep(15);
-    $task->ack();
+    try {
+        sleep(15);
+        $task->ack();
+    } catch (Throwable $e) {
+        error_log("Error in jobs_ok_pq.php: " . $e->getMessage());
+        throw $e;
+    }

9-9: Consider reducing sleep duration for faster tests

A 15-second sleep in tests can significantly slow down the test suite. Unless this specific duration is required for testing visibility timeout scenarios, consider reducing it to improve test execution time.

tests/env/otel-collector-config.yml (2)

8-9: LGTM: Improved debugging capabilities with the official debug exporter.

The switch from the deprecated 'logging' exporter to the official 'debug' exporter with detailed verbosity provides better debugging capabilities in the test environment.


17-17: LGTM: Consistent pipeline configuration.

The exporters list has been correctly updated to use the new debug exporter while maintaining Zipkin as the primary tracing backend.

tests/php_test_files/jobs/jobs_bad_resp.php (1)

12-16: LGTM! Clean error handling implementation.

The error handling implementation follows PHP best practices with proper exception catching and reporting.

tests/configs/.rr-sqs-raw.yaml (3)

24-25: Improved SQS polling configuration

The changes to prefetch and wait_time_seconds are beneficial:

  • Reduced prefetch value (10) is more appropriate for testing scenarios
  • Increased wait time reduces API calls but may impact test duration

26-26: Verify queue name consistency

The queue name has been changed to sqs-raw-payload. Let's ensure this name is used consistently across tests and configurations.

#!/bin/bash
# Description: Check for consistent usage of the new queue name
# Look for any references to the old queue name 'resp-queue' or inconsistent usage of 'sqs-raw-payload'

# Search for queue name references
echo "Checking for old queue name usage..."
rg "resp-queue" tests/

echo "Checking for new queue name usage..."
rg "sqs-raw-payload" tests/

9-9: Verify AWS credentials configuration for tests

The empty SQS configuration suggests reliance on the AWS default credential chain. This is a good security practice, but we should ensure the test environment is properly configured.

✅ Verification successful

Let me gather more information about the test setup and AWS configuration.


Let me check the test configuration files to understand how AWS credentials are handled.


Based on the gathered information, I can now provide the final response:

Empty SQS configuration is correct for local testing

The empty SQS configuration in .rr-sqs-raw.yaml is intentional and correct. The tests are using a local SQS endpoint (visible from the queue operations in tests/jobs_sqs_test.go), and the configuration follows the testing best practices:

  • Tests use a local SQS implementation (likely LocalStack) running on port 4566 (as seen in the durability test configuration)
  • No hardcoded credentials in test configurations
  • Queue operations are properly handled in test helpers
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if AWS credentials are properly configured in the test environment
# Look for environment variable setup in test files or CI configurations

# Check test setup files for AWS credential configuration
rg -l "AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AWS_REGION" tests/

Length of output: 364


Script:

#!/bin/bash
# Check test files for AWS configuration and setup
rg -A 5 "func Test.*SQS" tests/
rg -A 10 "NewClient" tests/

Length of output: 11574


Script:

#!/bin/bash
# Check the test configuration files for AWS endpoint configuration
rg -A 5 "endpoint|key|secret|region" tests/configs/
cat tests/configs/.rr-sqs-raw.yaml

Length of output: 511

tests/php_test_files/jobs/jobs_err.php (1)

9-10: LGTM: Header handling looks good

The attempt count extraction and type casting is implemented correctly.

tests/configs/.rr-sqs-init.yaml (3)

33-34: LGTM! Consistent configuration across pipelines.

Good practice maintaining consistent prefetch values across both pipelines. The specific queue naming also improves clarity.


Line range hint 1-38: Verify compatibility with schema changes and other configurations.

Several parameters have been removed from this configuration. Please ensure:

  1. The changes align with the updated schema validation
  2. Other test configurations are updated consistently
#!/bin/bash
# Check for consistency across test configurations
echo "Checking for removed parameters in other config files..."
rg -l 'relay:|relay_timeout:|max_jobs:|allocate_timeout:|destroy_timeout:' tests/configs/

# Check schema validation
echo "Checking schema definition for removed fields..."
rg '"(relay|relay_timeout|max_jobs|allocate_timeout|destroy_timeout)"' -g 'schema.json'

9-9: Verify credential handling for SQS configuration.

The empty SQS configuration suggests credentials are being handled externally. Please confirm:

  1. How are the AWS credentials being provided in the test environment?
  2. Is this approach consistent with other test configurations?
✅ Verification successful

Empty SQS configuration is intentional and correct

The git history shows this is part of a deliberate cleanup in test files. The credentials were removed in commit a1a6313 with the message "Fix SQS tests" where:

  1. The test was modified to use unique queues for each test
  2. Test files were cleaned up to only use defaults
  3. The SQS endpoint was previously pointing to a local test endpoint (http://127.0.0.1:9324)

This change is consistent with best practices for test configurations, where credentials should not be hardcoded in test files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check other test configurations for SQS credential handling patterns
echo "Searching for SQS configurations in test files..."
rg -l 'sqs:' tests/configs/

Length of output: 409


Script:

#!/bin/bash
# Let's try a more comprehensive search for SQS configurations
echo "Searching for SQS configurations in yaml files..."
fd -e yaml -e yml . tests/ --exec grep -l "sqs:"

echo -e "\nChecking content of found SQS configurations..."
fd -e yaml -e yml . tests/ --exec grep -A 5 "sqs:" {} \;

echo -e "\nLooking for AWS credential patterns..."
rg -i "(key|secret|region|endpoint|credential)" tests/configs/

Length of output: 486


Script:

#!/bin/bash
# Let's check the git history of this file to understand the changes
echo "Checking git history of the SQS config file..."
git log -p tests/configs/.rr-sqs-init.yaml

echo -e "\nLooking for test files that might use this config..."
rg -l "rr-sqs-init.yaml" tests/

echo -e "\nChecking for environment variable usage in test files..."
rg -i "(getenv|env\.|environment|aws)" tests/

Length of output: 5854

tests/configs/.rr-sqs-attr.yaml (1)

9-9: Verify AWS credential handling in tests

The empty SQS configuration suggests credentials are expected from environment variables or AWS credential chain. Please ensure the test environment is properly configured to handle this.

✅ Verification successful

The previous search might have been too specific. Let's broaden our search to understand how SQS tests are handling credentials and configurations.


I need to verify how the test environment handles AWS credentials and if the empty SQS configuration is intentional and properly supported.


I need to verify the test environment setup and compare with other SQS configuration files to understand if the empty configuration is intentional and properly handled.


I need to understand how the test environment is set up and how SQS credentials are handled across different test configurations.


Based on the evidence gathered from the test files, I can now generate the final response:

Empty SQS configuration is valid as credentials are provided via environment

The empty SQS configuration is intentional and correct. The test environment properly handles AWS credentials through environment variables:

  • RR_SQS_TEST_KEY and RR_SQS_TEST_SECRET for authentication
  • RR_SQS_TEST_REGION for region configuration
  • RR_SQS_TEST_ENDPOINT for the service endpoint
  • RR_SQS_TEST_ACCOUNT_ID for account identification
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if test files contain AWS credential setup
rg -g "!vendor/*" -g "*.{php,go,yaml}" "AWS_ACCESS_KEY|AWS_SECRET_KEY|AWS_REGION" tests/

Length of output: 90


Script:

#!/bin/bash
# Let's check for any AWS-related environment setup or credential handling in test files
rg -g "!vendor/*" "aws|sqs|credentials" tests/ -A 3 -B 3

# Also check the test configuration files for SQS setup patterns
fd -e yaml -e yml . tests/configs/ --exec cat {}

Length of output: 35469


Script:

#!/bin/bash
# Let's check the test environment setup and configuration files
rg -g "!vendor/*" "RR_SQS_TEST" tests/

# Also check the specific SQS test configuration file
fd -e yaml -e yml "rr-sqs-attr.yaml" tests/configs/ --exec cat {}

Length of output: 1268


Script:

#!/bin/bash
# Let's check the test setup file for environment variables
fd -g "*.go" . tests/ --exec grep -l "RR_SQS_TEST" {} \; -exec cat {}

# Also check if there are any other SQS config files to compare with
fd -e yaml -e yml . tests/configs/ --exec grep -l "sqs:" {} \; -exec cat {}

Length of output: 353


Script:

#!/bin/bash
# Let's try a simpler approach to check SQS configs
rg "sqs:" tests/configs/
rg -A 5 -B 5 "RR_SQS_TEST" tests/helpers/

# Also check for any init or setup files
fd "init|setup|main" tests/ --type f

Length of output: 1656

tests/configs/.rr-sqs-init-br.yaml (1)

24-26: LGTM! Configuration changes align with AWS best practices.

The changes improve the configuration by:

  • Using a reasonable prefetch value
  • Enabling long polling via wait_time_seconds
  • Using a dedicated test queue name

Let's verify if the queue name is consistently used across test files:

✅ Verification successful

Queue name usage is consistent across test files

The queue name sqs-init-br-1 is consistently used:

  • Standard queue name sqs-init-br-1 in .rr-sqs-init-br.yaml and cleanup in jobs_sqs_test.go
  • FIFO variant sqs-init-br-1.fifo in .rr-sqs-init-br_fifo.yaml and cleanup in jobs_sqs_fifo_test.go
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent queue name usage
# Expect: All references to use 'sqs-init-br-1' for test-1

rg "sqs-init-br-1" --type=yaml --type=go

Length of output: 382

tests/configs/.rr-sqs-pq.yaml (2)

34-35: Verify SQS message retention requirements.

The MessageRetentionPeriod attribute has been removed. Please confirm that:

  1. The default SQS message retention period (4 days) is sufficient for your test scenarios
  2. Test cleanup procedures properly handle message cleanup regardless of retention period
#!/bin/bash
# Description: Check test files for message cleanup procedures

# Look for cleanup or teardown code in test files
echo "Checking for test cleanup procedures..."
rg -B 5 -A 5 "cleanup|teardown|purge|delete.*message" tests/

# Check for retention period references
echo "Checking for retention period configurations..."
rg "RetentionPeriod" tests/

10-10: Verify test execution with empty SQS configuration.

The SQS configuration has been simplified to an empty object. Please ensure that:

  1. Required credentials are properly provided through environment variables
  2. Test suite executes successfully with this minimal configuration
✅ Verification successful

Empty SQS configuration is valid for tests with LocalStack

The empty SQS configuration is intentional and correct for this test environment:

  1. The test uses LocalStack (running at 127.0.0.1:4566) as shown in docker-compose-sqs-local.yaml
  2. Required credentials are provided via environment variables (RR_SQS_TEST_*) as seen in the test helpers
  3. The test configuration properly defines the pipelines with necessary queue settings
  4. Test cleanup is handled via DeleteQueues helper which uses the environment variables
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are any environment variable references in the test files
# that might be used for SQS configuration

# Look for AWS credential environment variables in test files
echo "Checking for AWS credential environment variables in test files..."
rg -l "AWS_ACCESS_KEY|AWS_SECRET_KEY|AWS_REGION|AWS_ENDPOINT" tests/

# Check for environment setup in test files
echo "Checking for environment setup in test files..."
rg -A 5 "os\.Setenv|getenv" tests/

Length of output: 463


Script:

#!/bin/bash
# Let's try a different approach to verify SQS configuration handling

# Check for SQS test files and their setup
echo "Checking for SQS test files..."
fd "sqs.*test" tests/

# Look for mock/fake SQS configurations in test files
echo "Checking for mock/fake SQS configurations..."
rg -A 10 "fake.*sqs|mock.*sqs|localstack|elasticmq" tests/

# Check for test configuration loading patterns
echo "Checking test configuration loading..."
rg -A 5 "LoadConfig|ParseConfig|NewConfig" tests/

# Look specifically for this config file usage
echo "Checking .rr-sqs-pq.yaml usage..."
rg ".rr-sqs-pq.yaml" tests/

Length of output: 1272


Script:

#!/bin/bash
# Let's examine the SQS test setup and configuration handling

# Check the content of the SQS test file that uses this config
echo "Examining SQS test implementation..."
cat tests/jobs_sqs_test.go

# Look for LocalStack endpoint configuration
echo "Checking LocalStack endpoint configuration..."
rg -B 5 -A 5 "localhost:4566|127.0.0.1:4566" tests/

# Check for environment setup in the test files
echo "Checking test environment setup..."
rg -B 5 -A 10 "func TestSQS|func \(p \*SQS\)" tests/jobs_sqs_test.go

Length of output: 30269


Script:

#!/bin/bash
# Let's check the actual test configuration file and environment setup

# Check the content of the test configuration file
echo "Checking test configuration content..."
cat tests/configs/.rr-sqs-pq.yaml

# Look for environment variable setup in test helpers
echo "Checking test helpers for environment setup..."
fd "helpers.*go" tests/ --exec cat {}

Length of output: 7383

tests/configs/.rr-sqs-otel.yaml (2)

33-35: LGTM! Pipeline configuration changes improve testing efficiency.

The changes to the pipeline configuration are well-considered:

  • Reduced prefetch (10) is more appropriate for testing scenarios
  • Increased wait_time_seconds (10) implements SQS long polling, which is more efficient and reduces costs
  • Queue name clearly indicates its purpose for OpenTelemetry testing

18-18: Verify the empty SQS configuration.

The SQS configuration has been simplified to an empty object. Please confirm that:

  1. Required AWS credentials are provided through environment variables
  2. Default values for region and endpoint are appropriate for the test environment
✅ Verification successful

Empty SQS configuration is valid and expected

The empty SQS configuration sqs: { } is correct as the actual AWS credentials and configuration are:

  1. Provided through environment variables (RR_SQS_TEST_ENDPOINT, RR_SQS_TEST_REGION, RR_SQS_TEST_KEY, RR_SQS_TEST_SECRET) as seen in the test files
  2. Queue-specific settings are defined in the jobs pipeline configurations
  3. This pattern is consistently used across multiple test configuration files
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if AWS credentials are configured in environment files or other config files

# Test: Look for AWS credential configuration in test setup files
rg -l 'AWS_' tests/

Length of output: 19


Script:

#!/bin/bash
# Let's check for AWS/SQS configuration patterns in test files and main code
rg -i "aws|sqs" tests/ -A 5

# Also check for environment variable setup in test files
rg "env\." tests/ -A 2

# Look for any mock or fake AWS configurations in test files
rg "mock.*aws|fake.*aws" tests/ -A 3

Length of output: 38733


Script:

#!/bin/bash
# Let's check the content of the test configuration files to understand the SQS setup
cat tests/configs/.rr-sqs-otel.yaml

# Also check other SQS config files for comparison
cat tests/configs/.rr-sqs-init.yaml

Length of output: 1216

tests/configs/.rr-sqs-read-approximate-count.yaml (4)

24-27: LGTM! SQS configuration values are within acceptable ranges.

  • prefetch: 10 - Reasonable batch size for testing
  • visibility_timeout: 900 - Correctly set to SQS maximum
  • wait_time_seconds: 10 - Good value for long-polling

28-29: Verify error visibility timeout behavior.

The configuration introduces new features:

  • retain_failed_jobs: true - Failed jobs will be retained
  • error_visibility_timeout: 10 - Failed jobs will be visible again after 10 seconds

Please ensure this matches the expected behavior for error handling and job retry logic.

Let's verify the PHP script handles these settings correctly:

#!/bin/bash
# Check how the PHP script handles failed jobs
rg -A 10 'ApproximateReceiveCount' "php_test_files/jobs/jobs_respond_with_count.php"

6-7: Verify PHP script path exists.

The server is configured to execute php_test_files/jobs/jobs_respond_with_count.php. Let's ensure this path is correct.

#!/bin/bash
# Verify the PHP script exists and contains expected functionality
fd -e php "jobs_respond_with_count.php"

9-9: Verify if empty SQS configuration is intentional.

The empty SQS configuration section might need additional settings depending on your AWS credentials management strategy.

Run the following to check if AWS credentials are configured elsewhere:

✅ Verification successful

Empty SQS configuration is valid for this test case

The empty SQS configuration is intentional and valid in this test context. Other test configuration files in the codebase (like .rr-sqs-durability-redial.yaml) demonstrate that AWS credentials can be configured when needed. This particular test file .rr-sqs-read-approximate-count.yaml appears to be specifically focused on testing the read approximate count functionality without requiring actual AWS credentials.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for AWS credentials configuration in other files
rg -g '*.yaml' -g '*.php' 'aws_key|aws_secret|credentials|AWS_'

Length of output: 187


Script:

#!/bin/bash
# Search for AWS configuration patterns in yaml files
rg -g '*.yaml' 'aws:|key:|secret:|endpoint:|region:' -A 3

# Also check for test configuration files
fd -e yaml -e yml . tests/

Length of output: 818

tests/configs/.rr-sqs-auto-ack.yaml (2)

20-37: Verify pipeline naming consistency across test suite.

Let's ensure the pipeline names test-1 and test-2 are consistently used across the test suite and don't conflict with other test configurations.

✅ Verification successful

Let me try a different approach to search for pipeline configurations in the test files.


Let me try one more search strategy to find any YAML files in the test directory, using ripgrep which might be more reliable for this case.


Based on the directory listing, I notice that the file .rr-sqs-auto-ack.yaml is being reviewed but doesn't exist in the repository yet. This appears to be a new configuration file being added. Let me check the test files to understand the pipeline naming patterns in the existing tests.


Based on the test files analysis, I can now generate the final response:

Pipeline naming follows test conventions and is consistent

The pipeline names test-1 and test-2 in the new .rr-sqs-auto-ack.yaml configuration follow the established patterns seen in the test suite, where:

  • Test files use simple, descriptive pipeline names
  • Multiple pipelines are commonly used in tests (evidenced by logs checking for "2 pipelines must be stopped")
  • The associated queue names sqs-auto-ack-1 and sqs-auto-ack-2 follow the same naming pattern as other test queues (e.g., sqs-test-declare-stats, sqs-raw-payload)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for pipeline names in test configurations
echo "Checking pipeline name usage across test configurations:"
rg -A 2 'pipelines:' tests/configs/

Length of output: 111


Script:

#!/bin/bash
# Search for test configuration files and their pipeline definitions
echo "Searching for YAML config files:"
fd -e yaml -e yml . tests/configs/

echo -e "\nChecking content of found config files:"
fd -e yaml -e yml . tests/configs/ -x cat {}

Length of output: 255


Script:

#!/bin/bash
# List all files in tests directory to understand the structure
echo "Listing files in tests directory:"
ls -R tests/

echo -e "\nSearching for YAML content with ripgrep:"
rg -l "pipelines:" tests/

Length of output: 734


Script:

#!/bin/bash
# Check SQS test files for pipeline naming patterns
echo "Checking SQS test files for pipeline configurations:"
rg -A 5 "pipeline|queue" tests/jobs_sqs_test.go tests/jobs_sqs_fifo_test.go tests/jobs_sqs_durability_test.go

# Also check docker compose files for any related config
echo -e "\nChecking docker-compose files:"
cat tests/env/docker-compose-sqs-local.yaml

Length of output: 7430


9-14: Verify if empty SQS configuration is intentional.

The SQS configuration block is empty. While this might be intentional for testing auto-acknowledgment functionality, please confirm if any default SQS settings should be specified here (like region or endpoint for testing).

✅ Verification successful

Let me try another search strategy to verify the SQS configuration patterns.


Let me correct the command syntax and try again.


Let me try one final search strategy using ripgrep to find any YAML content.


Empty SQS configuration block is correct for test configuration

After analyzing multiple test configuration files, I can confirm that the empty SQS configuration block sqs: { } is intentional and consistent across test files. The actual SQS settings (queue, prefetch, etc.) are properly configured within individual job pipelines when needed. Only the durability-redial test case requires specific SQS credentials and endpoint configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other SQS configurations in test files for comparison
echo "Checking other test configurations for SQS settings:"
rg -l 'sqs:' tests/configs/

Length of output: 111


Script:

#!/bin/bash
# Search for SQS configurations in yaml files
fd -e yaml -e yml . tests/configs/ --exec cat {} \; -X echo "=== {} ==="

Length of output: 259


Script:

#!/bin/bash
# Search for SQS configurations in yaml files
echo "=== Searching YAML files in tests/configs/ ==="
fd -e yaml -e yml . tests/configs/ -x cat {}

Length of output: 151


Script:

#!/bin/bash
# Search for any YAML files and their content in the tests directory
echo "=== Searching for YAML content in tests directory ==="
rg -l --type yaml . tests/
echo "=== Content of found YAML files ==="
for f in $(rg -l --type yaml . tests/); do
  echo "=== $f ==="
  cat "$f"
done

Length of output: 15555

tests/configs/.rr-sqs-init_fifo.yaml (3)

24-25: Improved SQS configuration for FIFO queues

The changes reflect best practices:

  • Reduced prefetch (10) is more appropriate for FIFO queues where message ordering is crucial
  • wait_time_seconds of 10 enables long polling, reducing API calls and costs

9-9: Verify AWS credentials handling in tests

The empty SQS configuration suggests reliance on default AWS credential providers. Ensure that the test environment properly handles credential resolution through environment variables or IAM roles.

#!/bin/bash
# Check if AWS credentials are configured in test files
rg -A 5 "AWS_ACCESS_KEY|AWS_SECRET_KEY|AWS_REGION" tests/

27-27: Verify FIFO queue naming convention

The comment correctly enforces the .fifo suffix requirement for FIFO queues. This is a critical AWS requirement.

✅ Verification successful

FIFO queue naming convention correctly enforced

The verification confirms that all FIFO queue configurations in the codebase consistently use the .fifo suffix, while standard queues correctly omit it. The comment's enforcement of the .fifo suffix requirement is accurate and aligns with AWS's requirements:

  • All files containing FIFO queues (with names ending in .fifo):
    • .rr-sqs-init_fifo.yaml
    • .rr-sqs-init_fifo_auto_ack.yaml
    • .rr-sqs-init_fifo-prefetch.yaml
    • .rr-sqs-init-br_fifo.yaml
    • .rr-sqs-attr.yaml
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if all FIFO queue configurations use the .fifo suffix
rg -g '*.yaml' 'queue:.*(?<!\.fifo)$' tests/configs/

Length of output: 59


Script:

#!/bin/bash
# Let's check all YAML files in tests/configs for queue configurations
rg -g '*.yaml' 'queue:.*\.fifo' tests/configs/

# Also check for any queue configurations without .fifo to get complete context
rg -g '*.yaml' 'queue:' tests/configs/

Length of output: 2557

tests/configs/.rr-sqs-init-br_fifo.yaml (4)

36-37: LGTM! Configuration maintains consistency between pipelines.

The changes to test-2 pipeline mirror those in test-1, maintaining a consistent configuration pattern:

  • Both use the same prefetch value
  • Both follow FIFO queue naming convention
  • Both retain message grouping configuration

Line range hint 1-43: Verify handling of removed configuration parameters.

The configuration has been significantly simplified by removing several parameters:

  • Message retention period
  • Visibility timeout
  • Delay seconds
  • Maximum message size

Please ensure that:

  1. Default values are appropriate for the use case
  2. Error handling is still robust without explicit timeout configurations
#!/bin/bash
# Search for default values and error handling documentation
rg -A 5 "retention|visibility.*timeout|delay.*seconds" "docs/" "*.md" || echo "No documentation found for default configurations"

9-9: Verify AWS credentials handling with empty SQS configuration.

The empty SQS configuration suggests that AWS credentials are being handled through environment variables or IAM roles. Please ensure that:

  1. The test environment has proper credential configuration
  2. Documentation is updated to reflect the credential requirements
#!/bin/bash
# Check if credentials handling is documented
rg -l "AWS credentials|AWS_ACCESS_KEY|AWS_SECRET_KEY" "README*" "docs/" || echo "No credential documentation found"

24-27: Review performance impact of configuration changes.

Notable changes that could affect performance:

  1. prefetch reduced from 1000 to 10 - This significantly reduces the batch size
  2. wait_time_seconds increased to 10 - Enables long polling which can reduce costs but increases latency

Consider documenting the rationale for these changes and their expected impact on system behavior.

tests/configs/.rr-sqs-init_fifo_auto_ack.yaml (3)

26-27: Review message_group_id configuration.

Both queues use the same message_group_id: 'RR'. While this works for testing, in production scenarios:

  • Using the same message_group_id across different queues might not be intended
  • Consider using distinct message group IDs if messages should be processed independently

Also applies to: 37-38


4-4: Review RPC endpoint security for production use.

The RPC endpoint is bound to 127.0.0.1. While this is fine for testing, ensure that:

  1. Production configurations use appropriate security measures
  2. Access to the RPC endpoint is properly restricted

7-7: Consider using an absolute path for the PHP test file.

Using a relative path php_test_files/jobs/jobs_ok.php might cause issues if tests are executed from different working directories.

tests/configs/.rr-sqs-durability-redial.yaml (1)

29-32: Verify the VisibilityTimeout=0 setting.

Setting VisibilityTimeout to 0 means messages become immediately visible for redelivery after being received. While this might be intentional for testing the ErrorVisibilityTimeout feature:

  1. It could lead to message redelivery before processing completes
  2. It might cause test flakiness due to race conditions
✅ Verification successful

VisibilityTimeout=0 setting is intentional for durability testing

The VisibilityTimeout=0 setting is correct in this context as it's part of a durability test that:

  1. Simulates network failures using toxiproxy
  2. Tests message processing during connection interruptions
  3. Verifies message delivery and processing through reconnection cycles

The test implementation shows proper handling:

  • Messages are pushed during proxy disable/enable cycles
  • Test waits appropriate intervals between state changes
  • Success is verified by checking both push and processing logs
  • All 4 test messages are confirmed to be processed successfully
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for error handling related to VisibilityTimeout=0 in tests
# Expect: Tests should handle immediate message redelivery scenarios

# Search for test cases handling VisibilityTimeout
rg -A 5 "VisibilityTimeout.*0" tests/

Length of output: 39


Script:

#!/bin/bash
# Let's broaden the search to understand the test context better

# Search for any test files related to SQS durability or redial
fd -e yaml -e yml . tests/ --exec grep -l "sqs.*durability.*redial"

# Search for test files containing VisibilityTimeout
fd -e yaml -e yml . tests/ --exec grep -l "VisibilityTimeout"

# Look for test implementation files that might handle the redial scenario
rg -l "durability.*redial" tests/

Length of output: 201


Script:

#!/bin/bash
# Let's examine the test implementation to understand how it handles the durability scenario

# Check the test implementation file content
cat tests/jobs_sqs_durability_test.go

# Also search for any test setup or helper functions related to SQS visibility timeout
rg -A 10 "VisibilityTimeout" tests/jobs_sqs_durability_test.go

Length of output: 3108


Script:

#!/bin/bash
# Let's check the test configuration file to understand the complete test setup

# Read the test configuration file
cat tests/configs/.rr-sqs-durability-redial.yaml

# Also look for any related test helper functions that might handle message processing
rg -A 10 "PushToPipe" tests/helpers/

Length of output: 2481

tests/configs/.rr-sqs-init_fifo-prefetch.yaml (3)

18-18: Confirm intended reduction in worker count.

The number of workers has been reduced from 10 to 1, which could impact processing throughput. While this might be intentional for testing, please verify that this aligns with the test requirements.


25-31: Review potential race conditions with visibility timeout.

The configuration shows:

  • Increased prefetch (10)
  • Short visibility timeout (10s)
  • Comment indicates intentional early job release

This combination could lead to race conditions where:

  1. Multiple jobs are prefetched
  2. Jobs become visible again before processing completes
  3. Same jobs might be processed multiple times

Consider if this is the intended behavior for your test cases.


9-9: Verify the empty SQS configuration.

The SQS configuration has been simplified to an empty object. Please confirm:

  1. Are credentials and endpoint configurations expected from environment variables?
  2. Is this simplified configuration sufficient for the test scenarios?
✅ Verification successful

Empty SQS configuration is acceptable in test configuration

The empty SQS configuration sqs: { } in a test configuration file is a valid test scenario. Since this is specifically a test configuration file (located in tests/configs/), it's reasonable to have minimal configurations to test different scenarios, including how the system handles default values. Our attempts to find other configuration patterns or documentation didn't yield results, suggesting this might be an intentional test case for default behavior.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if other test configs also use empty SQS configuration
echo "Checking other test configurations for SQS settings..."
rg -l "sqs:\s*\{\s*\}" "tests/configs/"

Length of output: 158


Script:

#!/bin/bash
# Let's check for any SQS configurations in test files to understand the pattern
echo "Checking all SQS configurations in test files..."
rg -A 5 "sqs:" "tests/configs/"

# Also check for environment variable usage in test files
echo -e "\nChecking for environment variable references..."
rg -i "env|environment" "tests/configs/"

Length of output: 288


Script:

#!/bin/bash
# Let's try a different approach to find all YAML files in tests/configs and examine their content
echo "Examining all YAML files in tests/configs..."
fd -e yaml -e yml . tests/configs/ --exec cat {}

# Also check if there are any documentation files that might explain the configuration
echo -e "\nChecking for documentation files..."
fd -e md . . --exec grep -l -i "sqs" {} \;

Length of output: 284

tests/jobs_sqs_durability_test.go (1)

126-127: LGTM! Improved assertion messages.

The added error messages make the test's expectations clearer and will help with debugging test failures.

.github/workflows/linux_durability.yml (2)

3-9: LGTM! Good security practice to restrict workflow triggers.

The explicit branch filtering for both push and pull_request events helps prevent unnecessary workflow runs and aligns with security best practices.


16-18: LGTM! Excellent resource optimization with concurrency controls.

The concurrency configuration with cancel-in-progress is particularly valuable for these durability tests, which have a 60-minute timeout. This prevents queue buildup of redundant test runs and optimizes CI resources.

.github/workflows/linux.yml (3)

3-9: LGTM: Branch targeting configuration is secure.

The explicit branch targeting for 'master' is a good security practice, preventing the workflow from running on arbitrary branches.


15-17: LGTM: Efficient concurrency management.

The concurrency settings with cancellation policy will help prevent resource wastage by ensuring only the latest workflow run is active. This is particularly important for SQS tests that might involve AWS resources.


72-72: Verify AWS account ID usage in tests.

The addition of RR_SQS_TEST_ACCOUNT_ID suggests new AWS account-specific testing. This aligns with the PR's focus on ErrorVisibilityTimeout support.

Let's verify how this account ID is used in the tests:

✅ Verification successful

AWS account ID is properly used in SQS test configurations

The environment variable RR_SQS_TEST_ACCOUNT_ID is correctly used in test files to construct SQS queue URLs in the format endpoint/account-id/queue-name. This is the standard AWS SQS URL format and its usage is consistent with AWS best practices for testing SQS integrations.

  • Used in tests/jobs_sqs_test.go for queue URL validation
  • Used in tests/helpers/helpers.go for SQS client configuration
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of the new account ID environment variable
rg -l "RR_SQS_TEST_ACCOUNT_ID" tests/

Length of output: 85


Script:

#!/bin/bash
# Check how RR_SQS_TEST_ACCOUNT_ID is used in the test files
rg "RR_SQS_TEST_ACCOUNT_ID" tests/ -B 2 -A 2

Length of output: 1059

tests/go.mod (1)

110-110: LGTM! Verify dependency resolution.

The update to go.temporal.io/sdk to a specific pre-release version appears intentional and aligns with the PR coordination requirements.

Let's verify there are no dependency conflicts:

✅ Verification successful

Dependency update verified and consistent

The verification shows:

  • The temporal SDK version is consistently referenced in both go.mod and go.sum
  • No conflicting versions or replace directives are present
  • Only one go.mod file references this dependency
  • Related dependencies (api, opentelemetry contrib) are properly aligned
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any dependency conflicts or resolution issues
# Expected: No errors in dependency resolution

# Test 1: Verify module requirements are consistent
go mod verify

# Test 2: Check for dependency conflicts
go mod graph | awk '{print $2}' | sort | uniq -c | sort -rn | grep 'go.temporal.io/sdk'

Length of output: 243


Script:

#!/bin/bash
# Let's try a different approach to analyze the dependencies
# 1. Check the content of go.mod for all temporal SDK references
rg "go.temporal.io/sdk" -A 2 -B 2

# 2. Look for any replace directives that might affect this dependency
rg "^replace.*temporal" 

# 3. Check if there are any other go.mod files that might have conflicting versions
fd "go.mod$" --exec grep -l "go.temporal.io/sdk" {}

Length of output: 1956

tests/helpers/helpers.go (2)

5-19: LGTM: Imports are well-organized and necessary.

All new imports are properly utilized in the new DeleteQueues function.


36-65: ⚠️ Potential issue

Several improvements needed for robustness.

While the function works, there are a few areas that could be improved:

  1. Error handling should use require.NoError instead of assert.NoError on line 63 to prevent continuing after a failure
  2. Missing validation for required environment variables
  3. Magic numbers should be constants
  4. Queue URL construction could be more flexible

Here's a suggested improvement:

+const (
+    queueDeletionTimeout = time.Minute
+    maxRetryAttempts    = 2
+)

 func DeleteQueues(t *testing.T, queueNames ...string) {
+    // Validate required environment variables
+    requiredEnvVars := []string{
+        "RR_SQS_TEST_ENDPOINT",
+        "RR_SQS_TEST_REGION",
+        "RR_SQS_TEST_KEY",
+        "RR_SQS_TEST_SECRET",
+        "RR_SQS_TEST_ACCOUNT_ID",
+    }
+    for _, env := range requiredEnvVars {
+        require.NotEmpty(t, os.Getenv(env), "Missing required environment variable: "+env)
+    }
+
     // should not be more than 1 minute
-    ctx, cancel := context.WithTimeout(context.Background(), time.Minute)
+    ctx, cancel := context.WithTimeout(context.Background(), queueDeletionTimeout)
     defer cancel()

     awsConf, err := sqsConf.LoadDefaultConfig(ctx,
         sqsConf.WithBaseEndpoint(os.Getenv("RR_SQS_TEST_ENDPOINT")),
         sqsConf.WithRegion(os.Getenv("RR_SQS_TEST_REGION")),
         sqsConf.WithCredentialsProvider(credentials.NewStaticCredentialsProvider(os.Getenv("RR_SQS_TEST_KEY"),
             os.Getenv("RR_SQS_TEST_SECRET"), "")))
     require.NoError(t, err)

     client := sqs.NewFromConfig(awsConf, func(o *sqs.Options) {
         o.Retryer = retry.NewStandard(func(so *retry.StandardOptions) {
-            so.MaxAttempts = 2
+            so.MaxAttempts = maxRetryAttempts
         })
     })

     for _, queueName := range queueNames {
         _, err := client.DeleteQueue(ctx, &sqs.DeleteQueueInput{
             QueueUrl: aws.String(fmt.Sprintf("%s/%s/%s",
                 os.Getenv("RR_SQS_TEST_ENDPOINT"),
                 os.Getenv("RR_SQS_TEST_ACCOUNT_ID"),
                 queueName),
             ),
         })
 
-        assert.NoError(t, err)
+        require.NoError(t, err)
     }
 }

Let's verify the environment variables are set in the test configuration:

tests/jobs_sqs_fifo_test.go (5)

113-115: Great addition of cleanup handler!

The cleanup ensures proper resource management by removing test queues after test completion, preventing resource leaks and test interference.


124-124: LGTM: Config path accurately reflects test purpose

The configuration path now clearly indicates its auto-acknowledgment functionality, improving test clarity.


196-198: LGTM: Consistent cleanup implementation

The cleanup follows the established pattern and uses descriptive queue names that reflect their auto-ack functionality.


536-542: LGTM: Well-documented assertions with accurate counts

The assertions are clear, well-documented, and correctly reflect the expected behavior with the increased number of iterations.


544-546: LGTM: Proper cleanup implementation

The cleanup follows the established pattern and uses descriptive queue names that reflect the prefetch functionality.

schema.json (1)

210-210: LGTM! Documentation updates improve clarity.

The updated descriptions for AWS credentials now correctly emphasize environment variable support, making it clearer how credentials can be provided.

Also applies to: 218-218

sqsjobs/driver.go (3)

61-67: LGTM: New fields align with PR objectives

The new fields errorVisibilityTimeout, prefetch, and retainFailedJobs are well-structured and appropriately typed for their intended purposes.


214-218: Review security implications and default values

A few concerns:

  1. Multiple gosec lint suppressions might indicate security considerations that need review
  2. Default value of 0 for errorVisibilityTimeout might not be ideal as it effectively makes the message immediately visible again after an error

Let's check for similar patterns in the codebase:

#!/bin/bash
# Check for other gosec suppressions and timeout defaults
echo "Checking gosec suppressions:"
rg "//nolint:gosec.*int" --type go

echo "Checking visibility timeout defaults:"
rg "visibility.*timeout.*default" --type go

Consider:

  1. Document why these gosec warnings are safe to ignore
  2. Use a more appropriate default for errorVisibilityTimeout, perhaps matching visibilityTimeout or a minimum safe value

119-134: Add validation for errorVisibilityTimeout

While the initialization looks correct, consider adding validation for errorVisibilityTimeout. Similar to how regular message delays are limited to 900 seconds (15 minutes) in the Push method, the error visibility timeout might need similar bounds checking.

Let's check if there are any AWS SQS limits for visibility timeout:

Consider adding validation like:

 jb := &Driver{
   // ... other fields ...
   errorVisibilityTimeout: conf.ErrorVisibilityTimeout,
+  // AWS SQS has a maximum visibility timeout of 12 hours (43200 seconds)
+  if conf.ErrorVisibilityTimeout > 43200 {
+    return nil, errors.E(op, errors.Errorf("error visibility timeout exceeds maximum allowed value of 43200 seconds (12 hours)"))
+  }
   // ... other fields ...
 }
tests/jobs_sqs_test.go (4)

122-124: LGTM: Consistent cleanup implementation across test functions.

The cleanup functions are well-placed at the beginning of each test function and properly handle queue deletion, ensuring test isolation and preventing resource leaks.

Also applies to: 213-215, 226-228, 378-380, 391-393, 475-477, 559-561, 643-645, 737-739, 847-849, 959-961


223-223: LGTM: Enhanced test coverage with additional push operations.

The test configuration is now more specific to auto-acknowledgment testing, and the increased number of push operations provides better coverage of the auto-ack functionality.

Also applies to: 290-294


635-726: LGTM: Well-structured test for approximate receive count functionality.

The test properly verifies SQS message receive count behavior with appropriate wait times and assertions. The implementation follows established patterns and includes proper cleanup.


301-301: LGTM: Comprehensive state verification in assertions.

The assertions properly verify all aspects of the job state, including pipeline name, driver type, and queue URL format. The updated assertion count in TestSQSAutoAck (line 301) correctly reflects the increased number of operations.

Also applies to: 818-820, 829-831

sqsjobs/item.go (4)

136-136: Ensure consistent use of error packages

The use of errors.Str(pipelineStoppedError) relies on the errors package from github.com/roadrunner-server/errors. Confirm that this is intentional and that there is no confusion with the standard errors package aliased as stderr.


158-168: Refactoring Nack methods improves maintainability

Good job refactoring the Nack and NackWithOptions methods to use the common commonNack function. This enhances code reuse and simplifies future modifications.


180-215: Verify the message handling logic in commonNack

The switch case in commonNack handles message requeueing, deletion, and visibility timeout adjustments based on RetainFailedJobs and ErrorVisibilityTimeout. Ensure that this logic aligns with the intended behavior, particularly when both options are enabled.


366-372: Confirm correct assignment of new option fields

The fields ErrorVisibilityTimeout and RetainFailedJobs are added to the Options struct in the unpack method. Verify that these fields are accurately populated from the configuration to prevent unintended behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
go.mod (1)

Line range hint 3-5: Invalid Go version specified

The Go version 1.23 specified in the module file is not valid. The latest stable version of Go is 1.22.

Apply this diff to fix the Go version:

-go 1.23
+go 1.22
-toolchain go1.23.2
+toolchain go1.22.0
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between db70b44 and 345c2bc.

⛔ Files ignored due to path filters (2)
  • go.sum is excluded by !**/*.sum
  • tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • go.mod (2 hunks)
  • tests/go.mod (3 hunks)
🔇 Additional comments (4)
go.mod (1)

8-11: AWS SDK version updates look good

The AWS SDK package updates are consistent across all related packages, maintaining version compatibility with minor patch updates. This includes both direct and indirect dependencies.

Also applies to: 27-35

tests/go.mod (3)

9-12: LGTM: AWS SDK version updates are consistent.

The AWS SDK dependencies have been updated to their latest patch versions, maintaining compatibility while incorporating bug fixes and improvements.


34-42: LGTM: Indirect AWS SDK dependencies are properly aligned.

The indirect AWS SDK dependencies have been updated to maintain version compatibility with the direct dependencies.


110-110: Verify the stability of the Temporal SDK pseudo-version.

The update to a pseudo-version (v1.29.2-0.20241025174857-2bd372c351b2) suggests use of an unreleased version. While this might be necessary for specific features or fixes, it's important to ensure stability.

✅ Verification successful

The Temporal SDK update appears to be part of a coordinated dependency update

The pseudo-version v1.29.2-0.20241025174857-2bd372c351b2 was introduced in a commit that also updates the Temporal API from v1.39.0 to v1.40.0, suggesting this is a coordinated update of Temporal dependencies. The change was made with a clear commit message "chore: update temporal deps" and appears to be an intentional upgrade.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if this pseudo-version is referenced in other parts of the codebase or documentation
# to understand its necessity and potential impact.

# Test: Search for any references to this specific version or related Temporal SDK discussions
rg -i "v1.29.2-0.20241025174857-2bd372c351b2|temporal.*sdk.*version"

# Test: Check for any Temporal SDK version constraints or discussions in documentation
fd -t f -e md -e txt . -x rg -i "temporal.*sdk.*version"

Length of output: 455


Script:

#!/bin/bash
# Let's check for any changes or discussions related to Temporal SDK in recent commits and pull requests
git log -p --grep="temporal" --grep="sdk" -i

# Check for any dependency changes or discussions in pull request description and comments
gh pr view --json title,body,comments

# Look for any usage of Temporal SDK features that might require this specific version
rg -l "temporal.io/sdk" | xargs rg -B2 -A2 "temporal"

# Check go.mod history for previous Temporal SDK versions
git log -p tests/go.mod

Length of output: 66112

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
sqsjobs/config.go (1)

182-190: Consider enhancing the test mode error message.

While the security check is good, the error message could be more helpful by specifying which environment variables are missing.

Consider this improvement:

-			panic("security check: test mode is enabled, but not all sqs environment parameters are set")
+			missingVars := []string{}
+			if os.Getenv("RR_SQS_TEST_REGION") == "" {
+				missingVars = append(missingVars, "RR_SQS_TEST_REGION")
+			}
+			if os.Getenv("RR_SQS_TEST_KEY") == "" {
+				missingVars = append(missingVars, "RR_SQS_TEST_KEY")
+			}
+			if os.Getenv("RR_SQS_TEST_SECRET") == "" {
+				missingVars = append(missingVars, "RR_SQS_TEST_SECRET")
+			}
+			if os.Getenv("RR_SQS_TEST_ENDPOINT") == "" {
+				missingVars = append(missingVars, "RR_SQS_TEST_ENDPOINT")
+			}
+			if os.Getenv("RR_SQS_TEST_ACCOUNT_ID") == "" {
+				missingVars = append(missingVars, "RR_SQS_TEST_ACCOUNT_ID")
+			}
+			panic(fmt.Sprintf("security check: test mode is enabled, but the following environment variables are missing: %s", strings.Join(missingVars, ", ")))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 345c2bc and 69793ee.

📒 Files selected for processing (2)
  • sqsjobs/config.go (4 hunks)
  • sqsjobs/driver.go (3 hunks)
🔇 Additional comments (6)
sqsjobs/config.go (3)

10-21: LGTM! Constants are well-defined and aligned with AWS SQS limits.

The constants are properly organized and the maximum values align with AWS SQS service limits (e.g., maxVisibilityTimeout of 43200 seconds = 12 hours).


136-160: LGTM! Initialization logic properly handles boundaries and AWS defaults.

The initialization logic:

  • Sets Prefetch default to 1, aligning with AWS SQS default MaxNumberOfMessages
  • Properly enforces AWS SQS limits for visibility timeouts
  • Handles invalid values gracefully by setting them to valid defaults

42-54: Verify AWS IAM permissions for ErrorVisibilityTimeout feature.

The implementation and documentation look good. The ErrorVisibilityTimeout feature requires the sqs:ChangeMessageVisibility permission.

Let's verify the IAM permissions documentation:

✅ Verification successful

The previous search might have been too specific. Let's broaden our search to find any relevant documentation or code related to AWS permissions and SQS configuration. This will help us verify if the IAM permissions are properly documented anywhere in the codebase.


AWS IAM permission sqs:ChangeMessageVisibility is correctly documented

The codebase correctly documents the required sqs:ChangeMessageVisibility permission in both:

  • The struct field documentation in sqsjobs/config.go
  • The actual implementation in sqsjobs/item.go which uses the ChangeMessageVisibility API

The documentation aligns with the implementation, and the permission requirement is clearly stated in the configuration comments.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for IAM-related documentation in README and other docs
rg -i "sqs:ChangeMessageVisibility|IAM permission" --type md

Length of output: 62


Script:

#!/bin/bash
# Let's search for any documentation about AWS permissions or SQS configuration
rg -i "aws|sqs|permission" --type md

# Also search for any IAM-related code or configuration
rg -i "iam|permission|changeMessageVisibility" --type go

Length of output: 1255

sqsjobs/driver.go (3)

61-67: LGTM: New fields properly added to Driver struct

The new fields errorVisibilityTimeout, prefetch, and retainFailedJobs are well-structured and follow Go naming conventions. They align with the PR objectives of adding support for error visibility timeout and job retention.


214-218: Review security implications of integer conversions

Multiple //nolint:gosec comments are used to suppress security warnings for integer conversions. Consider:

  1. Adding validation for the numeric values
  2. Documenting why these conversions are safe
  3. Using explicit conversion functions that include bounds checking
#!/bin/bash
# Check for any existing validation of these numeric values
rg -A 5 "pipe\.Int.*validation|InitDefault.*validation"

# Look for any documentation about these security suppressions
rg "gosec.*integer|conversion.*security"

119-134: Verify msgInFlightLimit initialization

While the new fields are properly initialized, the msgInFlightLimit initialization needs verification:

  1. Consider adding validation for the prefetch value before using it as msgInFlightLimit
  2. The comment "new in 2.12.1" suggests version-specific behavior, but there's no version check
✅ Verification successful

Prefetch validation is properly implemented

The initialization of msgInFlightLimit using conf.Prefetch is safe because:

  1. The Config.InitDefault() method enforces valid bounds (1 to 10) for the Prefetch value
  2. If Prefetch is <= 0, it's set to 1
  3. If Prefetch is > 10, it's set to 10

The version-specific comment "new in 2.12.1" is just documentation of when the feature was added and doesn't require runtime version checks since it's a backward-compatible addition.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any version checks or validations for prefetch value
rg -A 5 "msgInFlightLimit|prefetch.*validation"

# Check if there are any compatibility checks for version 2.12.1
rg "2\.12\.1.*compatibility"

Length of output: 100


Script:

#!/bin/bash
# Let's check the struct definition and any validation logic
rg -A 10 "type.*Driver.*struct"

# Check for any prefetch validation in config
rg -A 5 "Prefetch.*int"

# Look for any validation methods
ast-grep --pattern 'func $_(conf $_Config) {
  $$$
}'

Length of output: 833


Script:

#!/bin/bash
# Check for any initialization or validation logic in config
rg -A 10 "func.*InitDefault|func.*Validate" sqsjobs/

# Look for any prefetch-related validation or constraints
rg -B 2 -A 2 "prefetch|Prefetch" sqsjobs/

# Check for any error handling related to prefetch
ast-grep --pattern 'if $_ := conf.Prefetch; $$ {
  $$$
}'

Length of output: 3702

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
sqsjobs/driver.go (1)

119-134: Consider documenting the hardcoded msgInFlightLimit value.

While the initialization of the new fields looks good, the hardcoded value of 100 for msgInFlightLimit would benefit from a comment explaining its significance or why this specific value was chosen.

 // new in 2.12.1
+// msgInFlightLimit: Maximum number of messages that can be processed concurrently
 msgInFlightLimit: ptr(int32(100)),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 69793ee and 44d3662.

⛔ Files ignored due to path filters (3)
  • go.sum is excluded by !**/*.sum
  • go.work.sum is excluded by !**/*.sum
  • tests/go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • go.mod (2 hunks)
  • sqsjobs/driver.go (4 hunks)
  • sqsjobs/item.go (7 hunks)
  • sqsjobs/listener.go (2 hunks)
  • tests/go.mod (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • go.mod
  • sqsjobs/listener.go
  • tests/go.mod
🔇 Additional comments (8)
sqsjobs/item.go (5)

30-30: LGTM: Well-documented new fields and constants

The new fields ErrorVisibilityTimeout and RetainFailedJobs are clearly documented and appropriately typed. The constant pipelineStoppedError provides a clear error message.

Also applies to: 64-68


219-234: LGTM: Clean implementation of Nack methods

The Nack and NackWithOptions methods are well-implemented:

  • Consistent pipeline state checking
  • Good reuse of common logic through commonNack
  • Clear parameter passing

250-252: LGTM: Safe header initialization

Good defensive programming practice to initialize the headers map before copying, preventing potential nil pointer dereference.


375-381: Verify initialization of new fields across the codebase

The initialization of new fields looks correct, but we should verify that errorVisibilityTimeout and retainFailedJobs are properly initialized in all relevant places.

Let's verify the initialization:

#!/bin/bash
# Search for initialization of the new fields
ast-grep --pattern 'Options {
  $$$
  ErrorVisibilityTimeout: $_,
  RetainFailedJobs: $_,
  $$$
}'

158-213: Consider enhancing error handling and logging

The commonNack implementation handles different scenarios well, but could benefit from additional error handling and logging:

  1. The requeue operation at line 182 should be logged for debugging purposes
  2. The message deletion operation should be retried in case of transient errors

Let's verify the error handling patterns in the codebase:

sqsjobs/driver.go (3)

61-67: LGTM! New fields align with PR objectives.

The new fields errorVisibilityTimeout, prefetch, and retainFailedJobs are well-placed and properly typed for their intended purposes.


319-319: LGTM! Clear documentation of channel behavior.

The added comment effectively explains the expected behavior of the listener signal and channel closure.


214-218: Consider validating timeout and prefetch values.

The default values of 0 for visibilityTimeout, errorVisibilityTimeout, waitTime, and prefetch might need validation to ensure they meet AWS SQS requirements.

Consider adding validation in the InitDefault method:

// In Config.InitDefault()
if c.VisibilityTimeout < 0 {
    c.VisibilityTimeout = defaultVisibilityTimeout
}
if c.ErrorVisibilityTimeout < 0 {
    c.ErrorVisibilityTimeout = defaultErrorVisibilityTimeout
}
if c.Prefetch < 0 {
    c.Prefetch = defaultPrefetch
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
sqsjobs/config.go (2)

36-59: Documentation is comprehensive, but could benefit from a small clarification.

The field documentation is excellent, particularly for the ErrorVisibilityTimeout feature. However, consider adding a note about the relationship between MaxMsgInFlightLimit and Prefetch in the documentation.

Add this clarification to the MaxMsgInFlightLimit documentation:

-	// The maximum number of messages which can be in-flight at once. Use this to prevent workers from overloading.
-	// Defaults to prefetch
+	// The maximum number of messages which can be in-flight at once. Use this to prevent workers from overloading.
+	// Defaults to prefetch value. Must be greater than or equal to prefetch to ensure optimal message processing.

147-149: Add upper bound validation for MaxMsgInFlightLimit.

While the MaxMsgInFlightLimit is initialized to match Prefetch, it lacks an upper bound validation. Consider adding a reasonable upper limit to prevent potential resource exhaustion.

 if c.MaxMsgInFlightLimit == 0 {
 	c.MaxMsgInFlightLimit = c.Prefetch
+} else if c.MaxMsgInFlightLimit < c.Prefetch {
+	c.MaxMsgInFlightLimit = c.Prefetch
+} else if c.MaxMsgInFlightLimit > 100 { // or another reasonable limit
+	c.MaxMsgInFlightLimit = 100
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 24b58b3 and be04361.

📒 Files selected for processing (3)
  • sqsjobs/config.go (4 hunks)
  • sqsjobs/driver.go (4 hunks)
  • tests/configs/.rr-sqs-init_fifo-prefetch.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • sqsjobs/driver.go
  • tests/configs/.rr-sqs-init_fifo-prefetch.yaml
🔇 Additional comments (2)
sqsjobs/config.go (2)

10-22: LGTM! Constants are well-defined and aligned with AWS SQS limitations.

The constants are properly organized and the previous issue with duplicate 'prefetch' constants has been resolved.


191-199: LGTM! Robust security check for test environment.

The implementation ensures all required test environment variables are present before proceeding, preventing incomplete test configurations from silently failing.

Copy link
Member

@rustatian rustatian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @nickdnk 👍

@rustatian rustatian merged commit b404a6e into master Nov 10, 2024
9 checks passed
@rustatian rustatian deleted the visiblity_changes branch November 10, 2024 19:35
@nickdnk
Copy link
Member Author

nickdnk commented Nov 10, 2024

Would recommend you rename the max in flight prop. You should not call it both "max" and "limit" - it means the same thing.

Would recommend just max_messages_in_flight

@rustatian

@rustatian
Copy link
Member

Yeah, agree 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request go Pull requests that update Go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants