Enhance job statistics commands with optional reporting database support by WalBeh · Pull Request #418 · crate/cratedb-toolkit

WalBeh · 2025-04-23T16:04:16Z

Summary of the changes

Enhance job statistics commands with optional reporting database support.

Adds a --reportdb/-r CLI option, that allows writing statement statistics to a separate database.

Checklist

Link to issue this PR refers to (if applicable): Fixes #???

coderabbitai · 2025-04-23T16:04:25Z

Walkthrough

The changes introduce support for specifying a separate reporting database for job statistics collection and viewing in the CrateDB toolkit's CFR module. The CLI commands job_statistics_collect and job_statistics_view now accept an optional --reportdb parameter, which is parsed and passed to the job statistics logic. The core job statistics bootstrapping function is updated to handle this optional reporting database, establishing a separate connection and cursor when provided. All relevant database operations for job statistics now use the reporting database if specified, while maintaining compatibility with the original behavior.

Changes

File(s)	Change Summary
cratedb_toolkit/cfr/cli.py	Extended CLI commands `job_statistics_collect` and `job_statistics_view` to accept optional `--reportdb` parameter; updated function signatures, parsing, and added logging.
cratedb_toolkit/cfr/jobstats.py	Modified `boot` function to accept optional `report_address`; established separate reporting database connection and cursor; updated all job statistics operations to use reporting cursor if provided.
tests/cfr/test_jobstats.py	Renamed existing test to clarify it tests collection into the same database; added new test verifying job statistics collection using the `--reportdb` option into a separate report database.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant MainDB
    participant ReportDB

    User->>CLI: Invoke job_statistics_collect/view [--reportdb]
    CLI->>CLI: Parse --reportdb (if provided)
    CLI->>MainDB: Connect to main database
    alt --reportdb provided
        CLI->>ReportDB: Connect to reporting database
        CLI->>ReportDB: Use report_cursor for job stats operations
    else
        CLI->>MainDB: Use main cursor for job stats operations
    end
    CLI->>User: Output results or logs

Poem

In fields of data, bunnies hop and play,
Now with two burrows—one for stats each day!
A flag for reports, a cursor anew,
Collect or view, the choice is up to you.
With every hop, the toolkit grows bright,
Separate stats, handled just right!
🐇📊

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a73672 and 585325a.

📒 Files selected for processing (1)

tests/cfr/test_jobstats.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/cfr/test_jobstats.py

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
GitHub Check: CFR for OS windows-latest
GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
GitHub Check: build-and-test
GitHub Check: CFR for OS ubuntu-latest
GitHub Check: CFR for OS macos-latest
GitHub Check: CFR for OS macos-13

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Copilot

Pull Request Overview

Enhances the job statistics commands by adding optional support for a separate reporting database.

Updates the boot function in jobstats.py to accept a new optional report_address and use a separate report_cursor.
Modifies both the "collect" and "view" CLI commands in cli.py to support a new --reportdb/-r flag for reporting database connection configuration.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
cratedb_toolkit/cfr/jobstats.py	Extends the boot function and uses a separate report_cursor for queries.
cratedb_toolkit/cfr/cli.py	Adds a --reportdb CLI option to both collection and view commands.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

cratedb_toolkit/cfr/jobstats.py (1)
47-113: 💡 Verification agent
🧩 Analysis chain

Add comprehensive tests for the reporting database functionality.

Static analysis shows that most of the added code is not covered by tests. This is critical functionality that should be thoroughly tested.

🏁 Script executed:
#!/bin/bash
# Check the test coverage for job statistics functionality
rg -g "test_*.py" "job_statistics|jobstats" | grep -v "from"
Length of output: 1069
Add tests for the separate reporting‐database code path

Although we have coverage for the default collect/view flows, we’re still missing tests for the branch in boot() where a separate report_address is passed. Please add tests that:

Exercise boot(address, report_address=…) so that

stmt_log_table and last_exec_table are created in the report database, not the primary one

dbinit() uses report_cursor to read/write the tables

init_last_execution() correctly inserts the initial 0 when the report table is empty

Invoke via the CLI (e.g. ctk cfr jobstats collect --report-db <uri>) and assert that the report schema contains both jobstats_statements and jobstats_last with the expected records

Files to target:

cratedb_toolkit/cfr/jobstats.py – lines 47–113 (boot(), dbinit(), init_last_execution())

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests

[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests

[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests

[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests

[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests

[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests

[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests

[warning] 99-100: cratedb_toolkit/cfr/jobstats.py#L99-L100
Added lines #L99 - L100 were not covered by tests

[warning] 102-102: cratedb_toolkit/cfr/jobstats.py#L102
Added line #L102 was not covered by tests

[warning] 104-105: cratedb_toolkit/cfr/jobstats.py#L104-L105
Added lines #L104 - L105 were not covered by tests

[warning] 113-113: cratedb_toolkit/cfr/jobstats.py#L113
Added line #L113 was not covered by tests

🧹 Nitpick comments (3)

cratedb_toolkit/cfr/cli.py (1)
132-134: Consider defensive validation for reportdb string.

While the code correctly creates a DatabaseAddress from the reportdb string, there's no explicit error handling if the string format is invalid.
 if reportdb:
-    report_address = DatabaseAddress.from_string(reportdb)
-    logger.info(f"Reading from report database: {reportdb}")
+    try:
+        report_address = DatabaseAddress.from_string(reportdb)
+        logger.info(f"Reading from report database: {reportdb}")
+    except ValueError as e:
+        logger.error(f"Invalid report database URL: {e}")
+        ctx.fail(f"Invalid report database URL: {e}")
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 132-134: cratedb_toolkit/cfr/cli.py#L132-L134
Added lines #L132 - L134 were not covered by tests
cratedb_toolkit/cfr/jobstats.py (2)
176-176: Consider adding resource management for database connections.

The code correctly uses report_cursor for all statements, but there's no explicit closing of database connections, which might lead to resource leaks in long-running scenarios.

Consider updating the implementation to properly close connections, possibly with a cleanup function:
def cleanup():
    """Close database connections."""
    global cursor, report_cursor
    if cursor:
        try:
            cursor.close()
        except Exception as e:
            logger.warning(f"Error closing main cursor: {e}")
    
    if report_cursor and report_cursor != cursor:
        try:
            report_cursor.close()
        except Exception as e:
            logger.warning(f"Error closing report cursor: {e}")
This function could be called before exiting or when no longer needed.

Also applies to: 189-189, 192-192, 200-201

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 176-176: cratedb_toolkit/cfr/jobstats.py#L176
Added line #L176 was not covered by tests

48-48: Consider addressing the TODO comment in a future update.

The TODO comment indicates that the code should be refactored to avoid global variables, which would improve testability and maintainability.

Consider refactoring this module to use a class-based approach rather than global variables. This would make the code more maintainable, testable, and thread-safe. For example:
class JobStatisticsCollector:
    def __init__(self, address: DatabaseAddress, report_address: Optional[DatabaseAddress] = None):
        self.address = address
        self.report_address = report_address
        self.stmt_log_table = None
        self.last_exec_table = None
        self.cursor = None
        self.report_cursor = None
        self.last_scrape = None
        self.interval = None
        self.sys_jobs_log = {}
        # Initialize other state
        
    def boot(self):
        # Initialize connections, tables, etc.
        pass
        
    def record_once(self):
        # Record a single snapshot
        pass
        
    # Additional methods...

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d539603 and 33aa97c.

📒 Files selected for processing (2)

cratedb_toolkit/cfr/cli.py (1 hunks)
cratedb_toolkit/cfr/jobstats.py (5 hunks)

🧰 Additional context used

🪛 GitHub Check: codecov/patch

cratedb_toolkit/cfr/cli.py

[warning] 107-107: cratedb_toolkit/cfr/cli.py#L107
Added line #L107 was not covered by tests

[warning] 109-111: cratedb_toolkit/cfr/cli.py#L109-L111
Added lines #L109 - L111 were not covered by tests

[warning] 113-113: cratedb_toolkit/cfr/cli.py#L113
Added line #L113 was not covered by tests

[warning] 130-130: cratedb_toolkit/cfr/cli.py#L130
Added line #L130 was not covered by tests

[warning] 132-134: cratedb_toolkit/cfr/cli.py#L132-L134
Added lines #L132 - L134 were not covered by tests

[warning] 136-136: cratedb_toolkit/cfr/cli.py#L136
Added line #L136 was not covered by tests

cratedb_toolkit/cfr/jobstats.py

[warning] 42-42: cratedb_toolkit/cfr/jobstats.py#L42
Added line #L42 was not covered by tests

[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests

[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests

[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests

[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests

[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests

[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests

[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests

[warning] 99-100: cratedb_toolkit/cfr/jobstats.py#L99-L100
Added lines #L99 - L100 were not covered by tests

[warning] 102-102: cratedb_toolkit/cfr/jobstats.py#L102
Added line #L102 was not covered by tests

[warning] 104-105: cratedb_toolkit/cfr/jobstats.py#L104-L105
Added lines #L104 - L105 were not covered by tests

[warning] 113-113: cratedb_toolkit/cfr/jobstats.py#L113
Added line #L113 was not covered by tests

[warning] 176-176: cratedb_toolkit/cfr/jobstats.py#L176
Added line #L176 was not covered by tests

[warning] 189-189: cratedb_toolkit/cfr/jobstats.py#L189
Added line #L189 was not covered by tests

[warning] 192-192: cratedb_toolkit/cfr/jobstats.py#L192
Added line #L192 was not covered by tests

[warning] 200-201: cratedb_toolkit/cfr/jobstats.py#L200-L201
Added lines #L200 - L201 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
GitHub Check: CFR for OS windows-latest
GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
GitHub Check: build-and-test
GitHub Check: CFR for OS ubuntu-latest
GitHub Check: CFR for OS macos-latest
GitHub Check: CFR for OS macos-13

🔇 Additional comments (6)

cratedb_toolkit/cfr/cli.py (2)

98-98: LGTM: Well-defined CLI option for report database.

The --reportdb option is clearly defined with both long and short forms, along with helpful documentation that includes an example connection string.

107-111: Good defensive programming with clear logging.

Properly initializing report_address to None and only creating it when the option is provided follows good programming practices. The informative log message clearly indicates when a separate reporting database is being used.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 107-107: cratedb_toolkit/cfr/cli.py#L107
Added line #L107 was not covered by tests

[warning] 109-111: cratedb_toolkit/cfr/cli.py#L109-L111
Added lines #L109 - L111 were not covered by tests

cratedb_toolkit/cfr/jobstats.py (4)

42-42: LGTM: Global variable for report database cursor.

The addition of a global report_cursor variable aligns with the existing pattern used for cursor.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 42-42: cratedb_toolkit/cfr/jobstats.py#L42
Added line #L42 was not covered by tests

47-47: Good function signature update.

The function signature properly uses type annotations and makes the report_address parameter optional, maintaining backward compatibility.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests

67-84: LGTM: Robust handling of optional reporting database.

The code correctly handles both scenarios (with and without a separate reporting database) and maintains the appropriate schema and table names.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests

[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests

[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests

[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests

[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests

97-97: LGTM: Consistent use of report_cursor for database operations.

All database operations for initializing tables and fetching initial data consistently use report_cursor instead of cursor.

Also applies to: 99-100, 102-102, 104-105, 113-113

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests

coderabbitai · 2025-04-23T16:07:55Z

+        report_address = DatabaseAddress.from_string(reportdb)
+        logger.info(f"Using separate database for reporting: {reportdb}")
+
+    cratedb_toolkit.cfr.jobstats.boot(address=address, report_address=report_address)


💡 Verification agent

🧩 Analysis chain

Consider adding tests for this new functionality.

The code correctly passes the optional report_address to the boot function, but static analysis indicates this code path isn't covered by tests.

🏁 Script executed:

#!/bin/bash # Check if there are any tests for the job statistics with reportdb functionality rg -g "test_*.py" "reportdb|report_address"

Length of output: 45

🏁 Script executed:

#!/bin/bash # Search for any reference to report_address across the codebase rg "report_address" # Search for references to reportdb rg "reportdb" # Search for CLI flag naming in code rg "report-address" # Search for any 'report' occurrences in test files rg -g "test_*.py" "report" # Search for jobstats usage in tests rg "jobstats" rg -g "test_*.py" "jobstats" # Search for boot() invocation in tests rg "boot(" rg -g "test_*.py" "boot("

Length of output: 7021

Add tests for --reportdb (report_address) code paths

The CLI now accepts --reportdb and passes report_address to jobstats.boot, but there are no existing tests covering that branch. Please add tests to exercise both collect and view commands with the --reportdb flag to verify that statistics are written to—and read from—the separate reporting database.

• tests/cfr/test_jobstats.py

New test for ctk cfr jobstats collect --once --reportdb <URL>: assert records land in the expected report schema/table

New test for ctk cfr jobstats view --reportdb <URL>: assert the output is sourced from the separate report database

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 113-113: cratedb_toolkit/cfr/cli.py#L113
Added line #L113 was not covered by tests

amotl

Modulo addressing formatting nitpicks, potential adjustments as suggested by CodeRabbit, and possible test case improvements, thanks and ack. 💯

Enhance job statistics commands with optional reporting database support

33aa97c

WalBeh requested review from amotl and Copilot April 23, 2025 16:04

Copilot AI reviewed Apr 23, 2025

View reviewed changes

coderabbitai bot reviewed Apr 23, 2025

View reviewed changes

amotl approved these changes Apr 23, 2025

View reviewed changes

WalBeh added 2 commits April 25, 2025 15:18

Linting changes for cli.py

fbfcb4e

Adds a test for cfr collect to report db

8a73672

amotl reviewed Apr 25, 2025

View reviewed changes

Comment thread tests/cfr/test_jobstats.py Outdated

Update tests/cfr/test_jobstats.py

585325a

amotl merged commit 0dab10d into main Apr 25, 2025
19 checks passed

amotl deleted the bw/reasonable-toad-ivory branch April 25, 2025 16:11

amotl mentioned this pull request Apr 27, 2025

CFR: Improvements for ctk cfr jobstats #413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance job statistics commands with optional reporting database support#418

Enhance job statistics commands with optional reporting database support#418
amotl merged 4 commits intomainfrom
bw/reasonable-toad-ivory

WalBeh commented Apr 23, 2025

Uh oh!

coderabbitai bot commented Apr 23, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 23, 2025

Uh oh!

amotl left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

WalBeh commented Apr 23, 2025

Summary of the changes

Checklist

Uh oh!

coderabbitai bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

amotl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Apr 23, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)