Enhance job statistics commands with optional reporting database support#418
Enhance job statistics commands with optional reporting database support#418
Conversation
WalkthroughThe changes introduce support for specifying a separate reporting database for job statistics collection and viewing in the CrateDB toolkit's CFR module. The CLI commands Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant MainDB
participant ReportDB
User->>CLI: Invoke job_statistics_collect/view [--reportdb]
CLI->>CLI: Parse --reportdb (if provided)
CLI->>MainDB: Connect to main database
alt --reportdb provided
CLI->>ReportDB: Connect to reporting database
CLI->>ReportDB: Use report_cursor for job stats operations
else
CLI->>MainDB: Use main cursor for job stats operations
end
CLI->>User: Output results or logs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (10)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Pull Request Overview
Enhances the job statistics commands by adding optional support for a separate reporting database.
- Updates the boot function in jobstats.py to accept a new optional report_address and use a separate report_cursor.
- Modifies both the "collect" and "view" CLI commands in cli.py to support a new --reportdb/-r flag for reporting database connection configuration.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| cratedb_toolkit/cfr/jobstats.py | Extends the boot function and uses a separate report_cursor for queries. |
| cratedb_toolkit/cfr/cli.py | Adds a --reportdb CLI option to both collection and view commands. |
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
cratedb_toolkit/cfr/jobstats.py (1)
47-113: 💡 Verification agent🧩 Analysis chain
Add comprehensive tests for the reporting database functionality.
Static analysis shows that most of the added code is not covered by tests. This is critical functionality that should be thoroughly tested.
🏁 Script executed:
#!/bin/bash # Check the test coverage for job statistics functionality rg -g "test_*.py" "job_statistics|jobstats" | grep -v "from"Length of output: 1069
Add tests for the separate reporting‐database code path
Although we have coverage for the default
collect/viewflows, we’re still missing tests for the branch inboot()where a separatereport_addressis passed. Please add tests that:
- Exercise
boot(address, report_address=…)so that
stmt_log_tableandlast_exec_tableare created in the report database, not the primary onedbinit()usesreport_cursorto read/write the tablesinit_last_execution()correctly inserts the initial0when the report table is empty- Invoke via the CLI (e.g.
ctk cfr jobstats collect --report-db <uri>) and assert that the report schema contains bothjobstats_statementsandjobstats_lastwith the expected recordsFiles to target:
- cratedb_toolkit/cfr/jobstats.py – lines 47–113 (
boot(),dbinit(),init_last_execution())🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests
[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests
[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests
[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests
[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests
[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests
[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests
[warning] 99-100: cratedb_toolkit/cfr/jobstats.py#L99-L100
Added lines #L99 - L100 were not covered by tests
[warning] 102-102: cratedb_toolkit/cfr/jobstats.py#L102
Added line #L102 was not covered by tests
[warning] 104-105: cratedb_toolkit/cfr/jobstats.py#L104-L105
Added lines #L104 - L105 were not covered by tests
[warning] 113-113: cratedb_toolkit/cfr/jobstats.py#L113
Added line #L113 was not covered by tests
🧹 Nitpick comments (3)
cratedb_toolkit/cfr/cli.py (1)
132-134: Consider defensive validation for reportdb string.While the code correctly creates a DatabaseAddress from the reportdb string, there's no explicit error handling if the string format is invalid.
if reportdb: - report_address = DatabaseAddress.from_string(reportdb) - logger.info(f"Reading from report database: {reportdb}") + try: + report_address = DatabaseAddress.from_string(reportdb) + logger.info(f"Reading from report database: {reportdb}") + except ValueError as e: + logger.error(f"Invalid report database URL: {e}") + ctx.fail(f"Invalid report database URL: {e}")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 132-134: cratedb_toolkit/cfr/cli.py#L132-L134
Added lines #L132 - L134 were not covered by testscratedb_toolkit/cfr/jobstats.py (2)
176-176: Consider adding resource management for database connections.The code correctly uses
report_cursorfor all statements, but there's no explicit closing of database connections, which might lead to resource leaks in long-running scenarios.Consider updating the implementation to properly close connections, possibly with a cleanup function:
def cleanup(): """Close database connections.""" global cursor, report_cursor if cursor: try: cursor.close() except Exception as e: logger.warning(f"Error closing main cursor: {e}") if report_cursor and report_cursor != cursor: try: report_cursor.close() except Exception as e: logger.warning(f"Error closing report cursor: {e}")This function could be called before exiting or when no longer needed.
Also applies to: 189-189, 192-192, 200-201
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 176-176: cratedb_toolkit/cfr/jobstats.py#L176
Added line #L176 was not covered by tests
48-48: Consider addressing the TODO comment in a future update.The TODO comment indicates that the code should be refactored to avoid global variables, which would improve testability and maintainability.
Consider refactoring this module to use a class-based approach rather than global variables. This would make the code more maintainable, testable, and thread-safe. For example:
class JobStatisticsCollector: def __init__(self, address: DatabaseAddress, report_address: Optional[DatabaseAddress] = None): self.address = address self.report_address = report_address self.stmt_log_table = None self.last_exec_table = None self.cursor = None self.report_cursor = None self.last_scrape = None self.interval = None self.sys_jobs_log = {} # Initialize other state def boot(self): # Initialize connections, tables, etc. pass def record_once(self): # Record a single snapshot pass # Additional methods...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
cratedb_toolkit/cfr/cli.py(1 hunks)cratedb_toolkit/cfr/jobstats.py(5 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cfr/cli.py
[warning] 107-107: cratedb_toolkit/cfr/cli.py#L107
Added line #L107 was not covered by tests
[warning] 109-111: cratedb_toolkit/cfr/cli.py#L109-L111
Added lines #L109 - L111 were not covered by tests
[warning] 113-113: cratedb_toolkit/cfr/cli.py#L113
Added line #L113 was not covered by tests
[warning] 130-130: cratedb_toolkit/cfr/cli.py#L130
Added line #L130 was not covered by tests
[warning] 132-134: cratedb_toolkit/cfr/cli.py#L132-L134
Added lines #L132 - L134 were not covered by tests
[warning] 136-136: cratedb_toolkit/cfr/cli.py#L136
Added line #L136 was not covered by tests
cratedb_toolkit/cfr/jobstats.py
[warning] 42-42: cratedb_toolkit/cfr/jobstats.py#L42
Added line #L42 was not covered by tests
[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests
[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests
[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests
[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests
[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests
[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests
[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests
[warning] 99-100: cratedb_toolkit/cfr/jobstats.py#L99-L100
Added lines #L99 - L100 were not covered by tests
[warning] 102-102: cratedb_toolkit/cfr/jobstats.py#L102
Added line #L102 was not covered by tests
[warning] 104-105: cratedb_toolkit/cfr/jobstats.py#L104-L105
Added lines #L104 - L105 were not covered by tests
[warning] 113-113: cratedb_toolkit/cfr/jobstats.py#L113
Added line #L113 was not covered by tests
[warning] 176-176: cratedb_toolkit/cfr/jobstats.py#L176
Added line #L176 was not covered by tests
[warning] 189-189: cratedb_toolkit/cfr/jobstats.py#L189
Added line #L189 was not covered by tests
[warning] 192-192: cratedb_toolkit/cfr/jobstats.py#L192
Added line #L192 was not covered by tests
[warning] 200-201: cratedb_toolkit/cfr/jobstats.py#L200-L201
Added lines #L200 - L201 were not covered by tests
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
- GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
- GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
- GitHub Check: CFR for OS windows-latest
- GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
- GitHub Check: build-and-test
- GitHub Check: CFR for OS ubuntu-latest
- GitHub Check: CFR for OS macos-latest
- GitHub Check: CFR for OS macos-13
🔇 Additional comments (6)
cratedb_toolkit/cfr/cli.py (2)
98-98: LGTM: Well-defined CLI option for report database.The
--reportdboption is clearly defined with both long and short forms, along with helpful documentation that includes an example connection string.
107-111: Good defensive programming with clear logging.Properly initializing
report_addressto None and only creating it when the option is provided follows good programming practices. The informative log message clearly indicates when a separate reporting database is being used.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 107-107: cratedb_toolkit/cfr/cli.py#L107
Added line #L107 was not covered by tests
[warning] 109-111: cratedb_toolkit/cfr/cli.py#L109-L111
Added lines #L109 - L111 were not covered by testscratedb_toolkit/cfr/jobstats.py (4)
42-42: LGTM: Global variable for report database cursor.The addition of a global
report_cursorvariable aligns with the existing pattern used forcursor.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 42-42: cratedb_toolkit/cfr/jobstats.py#L42
Added line #L42 was not covered by tests
47-47: Good function signature update.The function signature properly uses type annotations and makes the
report_addressparameter optional, maintaining backward compatibility.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 47-47: cratedb_toolkit/cfr/jobstats.py#L47
Added line #L47 was not covered by tests
67-84: LGTM: Robust handling of optional reporting database.The code correctly handles both scenarios (with and without a separate reporting database) and maintains the appropriate schema and table names.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 67-68: cratedb_toolkit/cfr/jobstats.py#L67-L68
Added lines #L67 - L68 were not covered by tests
[warning] 70-71: cratedb_toolkit/cfr/jobstats.py#L70-L71
Added lines #L70 - L71 were not covered by tests
[warning] 73-74: cratedb_toolkit/cfr/jobstats.py#L73-L74
Added lines #L73 - L74 were not covered by tests
[warning] 81-81: cratedb_toolkit/cfr/jobstats.py#L81
Added line #L81 was not covered by tests
[warning] 84-84: cratedb_toolkit/cfr/jobstats.py#L84
Added line #L84 was not covered by tests
97-97: LGTM: Consistent use of report_cursor for database operations.All database operations for initializing tables and fetching initial data consistently use
report_cursorinstead ofcursor.Also applies to: 99-100, 102-102, 104-105, 113-113
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 97-97: cratedb_toolkit/cfr/jobstats.py#L97
Added line #L97 was not covered by tests
| report_address = DatabaseAddress.from_string(reportdb) | ||
| logger.info(f"Using separate database for reporting: {reportdb}") | ||
|
|
||
| cratedb_toolkit.cfr.jobstats.boot(address=address, report_address=report_address) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Consider adding tests for this new functionality.
The code correctly passes the optional report_address to the boot function, but static analysis indicates this code path isn't covered by tests.
🏁 Script executed:
#!/bin/bash
# Check if there are any tests for the job statistics with reportdb functionality
rg -g "test_*.py" "reportdb|report_address" Length of output: 45
🏁 Script executed:
#!/bin/bash
# Search for any reference to report_address across the codebase
rg "report_address"
# Search for references to reportdb
rg "reportdb"
# Search for CLI flag naming in code
rg "report-address"
# Search for any 'report' occurrences in test files
rg -g "test_*.py" "report"
# Search for jobstats usage in tests
rg "jobstats"
rg -g "test_*.py" "jobstats"
# Search for boot() invocation in tests
rg "boot("
rg -g "test_*.py" "boot("Length of output: 7021
Add tests for --reportdb (report_address) code paths
The CLI now accepts --reportdb and passes report_address to jobstats.boot, but there are no existing tests covering that branch. Please add tests to exercise both collect and view commands with the --reportdb flag to verify that statistics are written to—and read from—the separate reporting database.
• tests/cfr/test_jobstats.py
- New test for
ctk cfr jobstats collect --once --reportdb <URL>: assert records land in the expected report schema/table - New test for
ctk cfr jobstats view --reportdb <URL>: assert the output is sourced from the separate report database
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 113-113: cratedb_toolkit/cfr/cli.py#L113
Added line #L113 was not covered by tests
amotl
left a comment
There was a problem hiding this comment.
Modulo addressing formatting nitpicks, potential adjustments as suggested by CodeRabbit, and possible test case improvements, thanks and ack. 💯
Summary of the changes
Enhance job statistics commands with optional reporting database support.
Adds a
--reportdb/-rCLI option, that allows writing statement statistics to a separate database.Checklist