Skip to content

Enhance job statistics commands with optional reporting database support#418

Merged
amotl merged 4 commits intomainfrom
bw/reasonable-toad-ivory
Apr 25, 2025
Merged

Enhance job statistics commands with optional reporting database support#418
amotl merged 4 commits intomainfrom
bw/reasonable-toad-ivory

Conversation

@WalBeh
Copy link
Copy Markdown
Contributor

@WalBeh WalBeh commented Apr 23, 2025

Summary of the changes

Enhance job statistics commands with optional reporting database support.

Adds a --reportdb/-r CLI option, that allows writing statement statistics to a separate database.

Checklist

  • Link to issue this PR refers to (if applicable): Fixes #???

@WalBeh WalBeh requested review from amotl and Copilot April 23, 2025 16:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 23, 2025

Walkthrough

The changes introduce support for specifying a separate reporting database for job statistics collection and viewing in the CrateDB toolkit's CFR module. The CLI commands job_statistics_collect and job_statistics_view now accept an optional --reportdb parameter, which is parsed and passed to the job statistics logic. The core job statistics bootstrapping function is updated to handle this optional reporting database, establishing a separate connection and cursor when provided. All relevant database operations for job statistics now use the reporting database if specified, while maintaining compatibility with the original behavior.

Changes

File(s) Change Summary
cratedb_toolkit/cfr/cli.py Extended CLI commands job_statistics_collect and job_statistics_view to accept optional --reportdb parameter; updated function signatures, parsing, and added logging.
cratedb_toolkit/cfr/jobstats.py Modified boot function to accept optional report_address; established separate reporting database connection and cursor; updated all job statistics operations to use reporting cursor if provided.
tests/cfr/test_jobstats.py Renamed existing test to clarify it tests collection into the same database; added new test verifying job statistics collection using the --reportdb option into a separate report database.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant MainDB
    participant ReportDB

    User->>CLI: Invoke job_statistics_collect/view [--reportdb]
    CLI->>CLI: Parse --reportdb (if provided)
    CLI->>MainDB: Connect to main database
    alt --reportdb provided
        CLI->>ReportDB: Connect to reporting database
        CLI->>ReportDB: Use report_cursor for job stats operations
    else
        CLI->>MainDB: Use main cursor for job stats operations
    end
    CLI->>User: Output results or logs
Loading

Poem

In fields of data, bunnies hop and play,
Now with two burrows—one for stats each day!
A flag for reports, a cursor anew,
Collect or view, the choice is up to you.
With every hop, the toolkit grows bright,
Separate stats, handled just right!
🐇📊


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a73672 and 585325a.

📒 Files selected for processing (1)
  • tests/cfr/test_jobstats.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/cfr/test_jobstats.py
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: build-and-test
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Enhances the job statistics commands by adding optional support for a separate reporting database.

  • Updates the boot function in jobstats.py to accept a new optional report_address and use a separate report_cursor.
  • Modifies both the "collect" and "view" CLI commands in cli.py to support a new --reportdb/-r flag for reporting database connection configuration.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cratedb_toolkit/cfr/jobstats.py Extends the boot function and uses a separate report_cursor for queries.
cratedb_toolkit/cfr/cli.py Adds a --reportdb CLI option to both collection and view commands.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
cratedb_toolkit/cfr/jobstats.py (1)

47-113: 💡 Verification agent

🧩 Analysis chain

Add comprehensive tests for the reporting database functionality.

Static analysis shows that most of the added code is not covered by tests. This is critical functionality that should be thoroughly tested.


🏁 Script executed:

#!/bin/bash
# Check the test coverage for job statistics functionality
rg -g "test_*.py" "job_statistics|jobstats" | grep -v "from"

Length of output: 1069


Add tests for the separate reporting‐database code path

Although we have coverage for the default collect/view flows, we’re still missing tests for the branch in boot() where a separate report_address is passed. Please add tests that:

  • Exercise boot(address, report_address=…) so that
    • stmt_log_table and last_exec_table are created in the report database, not the primary one
    • dbinit() uses report_cursor to read/write the tables
    • init_last_execution() correctly inserts the initial 0 when the report table is empty
  • Invoke via the CLI (e.g. ctk cfr jobstats collect --report-db <uri>) and assert that the report schema contains both jobstats_statements and jobstats_last with the expected records

Files to target:

  • cratedb_toolkit/cfr/jobstats.py – lines 47–113 (boot(), dbinit(), init_last_execution())
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests


[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests


[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests


[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests


[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests


[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests


[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests


[warning] 99-100: cratedb_toolkit/cfr/jobstats.py#L99-L100
Added lines #L99 - L100 were not covered by tests


[warning] 102-102: cratedb_toolkit/cfr/jobstats.py#L102
Added line #L102 was not covered by tests


[warning] 104-105: cratedb_toolkit/cfr/jobstats.py#L104-L105
Added lines #L104 - L105 were not covered by tests


[warning] 113-113: cratedb_toolkit/cfr/jobstats.py#L113
Added line #L113 was not covered by tests

🧹 Nitpick comments (3)
cratedb_toolkit/cfr/cli.py (1)

132-134: Consider defensive validation for reportdb string.

While the code correctly creates a DatabaseAddress from the reportdb string, there's no explicit error handling if the string format is invalid.

 if reportdb:
-    report_address = DatabaseAddress.from_string(reportdb)
-    logger.info(f"Reading from report database: {reportdb}")
+    try:
+        report_address = DatabaseAddress.from_string(reportdb)
+        logger.info(f"Reading from report database: {reportdb}")
+    except ValueError as e:
+        logger.error(f"Invalid report database URL: {e}")
+        ctx.fail(f"Invalid report database URL: {e}")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 132-134: cratedb_toolkit/cfr/cli.py#L132-L134
Added lines #L132 - L134 were not covered by tests

cratedb_toolkit/cfr/jobstats.py (2)

176-176: Consider adding resource management for database connections.

The code correctly uses report_cursor for all statements, but there's no explicit closing of database connections, which might lead to resource leaks in long-running scenarios.

Consider updating the implementation to properly close connections, possibly with a cleanup function:

def cleanup():
    """Close database connections."""
    global cursor, report_cursor
    if cursor:
        try:
            cursor.close()
        except Exception as e:
            logger.warning(f"Error closing main cursor: {e}")
    
    if report_cursor and report_cursor != cursor:
        try:
            report_cursor.close()
        except Exception as e:
            logger.warning(f"Error closing report cursor: {e}")

This function could be called before exiting or when no longer needed.

Also applies to: 189-189, 192-192, 200-201

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 176-176: cratedb_toolkit/cfr/jobstats.py#L176
Added line #L176 was not covered by tests


48-48: Consider addressing the TODO comment in a future update.

The TODO comment indicates that the code should be refactored to avoid global variables, which would improve testability and maintainability.

Consider refactoring this module to use a class-based approach rather than global variables. This would make the code more maintainable, testable, and thread-safe. For example:

class JobStatisticsCollector:
    def __init__(self, address: DatabaseAddress, report_address: Optional[DatabaseAddress] = None):
        self.address = address
        self.report_address = report_address
        self.stmt_log_table = None
        self.last_exec_table = None
        self.cursor = None
        self.report_cursor = None
        self.last_scrape = None
        self.interval = None
        self.sys_jobs_log = {}
        # Initialize other state
        
    def boot(self):
        # Initialize connections, tables, etc.
        pass
        
    def record_once(self):
        # Record a single snapshot
        pass
        
    # Additional methods...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d539603 and 33aa97c.

📒 Files selected for processing (2)
  • cratedb_toolkit/cfr/cli.py (1 hunks)
  • cratedb_toolkit/cfr/jobstats.py (5 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cfr/cli.py

[warning] 107-107: cratedb_toolkit/cfr/cli.py#L107
Added line #L107 was not covered by tests


[warning] 109-111: cratedb_toolkit/cfr/cli.py#L109-L111
Added lines #L109 - L111 were not covered by tests


[warning] 113-113: cratedb_toolkit/cfr/cli.py#L113
Added line #L113 was not covered by tests


[warning] 130-130: cratedb_toolkit/cfr/cli.py#L130
Added line #L130 was not covered by tests


[warning] 132-134: cratedb_toolkit/cfr/cli.py#L132-L134
Added lines #L132 - L134 were not covered by tests


[warning] 136-136: cratedb_toolkit/cfr/cli.py#L136
Added line #L136 was not covered by tests

cratedb_toolkit/cfr/jobstats.py

[warning] 42-42: cratedb_toolkit/cfr/jobstats.py#L42
Added line #L42 was not covered by tests


[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests


[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests


[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests


[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests


[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests


[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests


[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests


[warning] 99-100: cratedb_toolkit/cfr/jobstats.py#L99-L100
Added lines #L99 - L100 were not covered by tests


[warning] 102-102: cratedb_toolkit/cfr/jobstats.py#L102
Added line #L102 was not covered by tests


[warning] 104-105: cratedb_toolkit/cfr/jobstats.py#L104-L105
Added lines #L104 - L105 were not covered by tests


[warning] 113-113: cratedb_toolkit/cfr/jobstats.py#L113
Added line #L113 was not covered by tests


[warning] 176-176: cratedb_toolkit/cfr/jobstats.py#L176
Added line #L176 was not covered by tests


[warning] 189-189: cratedb_toolkit/cfr/jobstats.py#L189
Added line #L189 was not covered by tests


[warning] 192-192: cratedb_toolkit/cfr/jobstats.py#L192
Added line #L192 was not covered by tests


[warning] 200-201: cratedb_toolkit/cfr/jobstats.py#L200-L201
Added lines #L200 - L201 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: build-and-test
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (6)
cratedb_toolkit/cfr/cli.py (2)

98-98: LGTM: Well-defined CLI option for report database.

The --reportdb option is clearly defined with both long and short forms, along with helpful documentation that includes an example connection string.


107-111: Good defensive programming with clear logging.

Properly initializing report_address to None and only creating it when the option is provided follows good programming practices. The informative log message clearly indicates when a separate reporting database is being used.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 107-107: cratedb_toolkit/cfr/cli.py#L107
Added line #L107 was not covered by tests


[warning] 109-111: cratedb_toolkit/cfr/cli.py#L109-L111
Added lines #L109 - L111 were not covered by tests

cratedb_toolkit/cfr/jobstats.py (4)

42-42: LGTM: Global variable for report database cursor.

The addition of a global report_cursor variable aligns with the existing pattern used for cursor.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 42-42: cratedb_toolkit/cfr/jobstats.py#L42
Added line #L42 was not covered by tests


47-47: Good function signature update.

The function signature properly uses type annotations and makes the report_address parameter optional, maintaining backward compatibility.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests


67-84: LGTM: Robust handling of optional reporting database.

The code correctly handles both scenarios (with and without a separate reporting database) and maintains the appropriate schema and table names.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests


[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests


[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests


[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests


[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests


97-97: LGTM: Consistent use of report_cursor for database operations.

All database operations for initializing tables and fetching initial data consistently use report_cursor instead of cursor.

Also applies to: 99-100, 102-102, 104-105, 113-113

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests

report_address = DatabaseAddress.from_string(reportdb)
logger.info(f"Using separate database for reporting: {reportdb}")

cratedb_toolkit.cfr.jobstats.boot(address=address, report_address=report_address)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Consider adding tests for this new functionality.

The code correctly passes the optional report_address to the boot function, but static analysis indicates this code path isn't covered by tests.


🏁 Script executed:

#!/bin/bash
# Check if there are any tests for the job statistics with reportdb functionality
rg -g "test_*.py" "reportdb|report_address" 

Length of output: 45


🏁 Script executed:

#!/bin/bash
# Search for any reference to report_address across the codebase
rg "report_address"

# Search for references to reportdb
rg "reportdb"

# Search for CLI flag naming in code
rg "report-address"

# Search for any 'report' occurrences in test files
rg -g "test_*.py" "report"

# Search for jobstats usage in tests
rg "jobstats"
rg -g "test_*.py" "jobstats"

# Search for boot() invocation in tests
rg "boot("
rg -g "test_*.py" "boot("

Length of output: 7021


Add tests for --reportdb (report_address) code paths

The CLI now accepts --reportdb and passes report_address to jobstats.boot, but there are no existing tests covering that branch. Please add tests to exercise both collect and view commands with the --reportdb flag to verify that statistics are written to—and read from—the separate reporting database.

• tests/cfr/test_jobstats.py

  • New test for ctk cfr jobstats collect --once --reportdb <URL>: assert records land in the expected report schema/table
  • New test for ctk cfr jobstats view --reportdb <URL>: assert the output is sourced from the separate report database
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 113-113: cratedb_toolkit/cfr/cli.py#L113
Added line #L113 was not covered by tests

Copy link
Copy Markdown
Member

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo addressing formatting nitpicks, potential adjustments as suggested by CodeRabbit, and possible test case improvements, thanks and ack. 💯

Comment thread tests/cfr/test_jobstats.py Outdated
@amotl amotl merged commit 0dab10d into main Apr 25, 2025
19 checks passed
@amotl amotl deleted the bw/reasonable-toad-ivory branch April 25, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants