Conversation
This comment was marked as outdated.
This comment was marked as outdated.
89f8c3a to
f794fd9
Compare
119c7ac to
a9f4d82
Compare
| # FIXME: Fix `croud clusters deploy`. | ||
| # It yields *two* payloads to stdout, making it | ||
| # unusable in JSON-capturing situations. | ||
| # The main advantage of the `JSONDecoder` class is that it also provides | ||
| # a `.raw_decode` method, which will ignore extra data after the end of the JSON. | ||
| # https://stackoverflow.com/a/75168292 | ||
| payload = wr.invoke() | ||
| decoder = json.JSONDecoder() | ||
| data = decoder.raw_decode(payload) |
There was a problem hiding this comment.
Observation
This works around a minor flaw of croud, which yields a JSON output two times, rendering the output not parseable.
Suggestion
croud should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy.
There was a problem hiding this comment.
We submitted an upstream fix about this obstacle.
| def fix_job_info_table_name(self): | ||
| """ | ||
| Adjust full-qualified table name by adding appropriate quotes. | ||
| Fixes a minor flaw on the upstream API. | ||
|
|
||
| Currently, the API returns `testdrive.pems-1`, but that can not be used at | ||
| all, because it is not properly quoted. It also can not be used 1:1, because | ||
| it is not properly quoted. | ||
|
|
||
| So, converge the table name into `"testdrive"."pems-1"` manually, for a | ||
| full-qualified representation. | ||
|
|
||
| FIXME: Remove after upstream has fixed the flaw. | ||
| """ | ||
| job_info = self.info | ||
| if "destination" in job_info and "table" in job_info["destination"]: | ||
| table = job_info["destination"]["table"] | ||
| if '"' not in table and "." in table: | ||
| schema, table = table.split(".") | ||
| table = f'"{schema}"."{table}"' | ||
| job_info["destination"]["table"] = table |
There was a problem hiding this comment.
Observation
This works around another minor upstream flaw.
Suggestion
The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table attribute so that it can be re-used without further ado.
Example
{
"destination": {
"table": "\"testdrive\".\"pems-1\""
}
}There was a problem hiding this comment.
We created an upstream ticket about this observation.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| def configure_croud(no_spinner: bool = None, use_spinner: bool = None): | ||
| """ | ||
| Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well. | ||
|
|
||
| - https://github.com/ManrajGrover/halo/issues/32 | ||
| - https://github.com/manrajgrover/halo/issues/179 | ||
| """ | ||
| if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner): | ||
| mod = types.ModuleType( | ||
| "croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner" | ||
| ) | ||
| setattr(mod, "HALO", NoopContextManager()) # noqa: B010 | ||
| sys.modules["croud.tools.spinner"] = mod | ||
|
|
||
|
|
||
| class NoopContextManager: | ||
| """ | ||
| For making the Halo progressbar a no-op. | ||
| """ | ||
|
|
||
| def __init__(self, *args, **kwargs): | ||
| pass | ||
|
|
||
| def __enter__(self): | ||
| pass | ||
|
|
||
| def __exit__(self, exc_type, exc_value, exc_traceback): | ||
| pass | ||
|
|
||
| def stop(self): | ||
| pass |
There was a problem hiding this comment.
Observation
We had to turn off the Halo spinner currently used in croud, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.
- Jupyter Notebooks don't support Halo manrajgrover/halo#32
- BUG: Halo.__init__.<locals>.clean_up() takes 0 positional arguments but 1 was given manrajgrover/halo#179
Suggestion
Submit a patch to croud to only use interactivity when is_tty() is True, or such. At least, don't start the HALO at module-scope level, but initialize/configure it at runtime instead.
There was a problem hiding this comment.
We've reported this to the issue tracker of the upstream project.
062ee9b to
7d89748
Compare
| # Log in to CrateDB Cloud. | ||
| croud login --idp azuread |
There was a problem hiding this comment.
Observation
There is an alternative, headless way of doing that: Using croud config show, you can display the location of the croud configuration file in YAML format.
current-profile: cratedb.cloud
default-format: table
profiles:
cratedb.cloud:
auth-token: xxxxxxxxxx
endpoint: https://console.cratedb.cloud
key: REDACTED
organization-id: null
region: _any_
secret: xxxxxxxxxxIf you fill in the key and secret values, obtained by running croud api-keys create, to create an API key 1, operations like croud clusters list will start working without further ado, even after logging out again using croud logout.
It works well. Thanks, @proddata.
Suggestion
Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_API_KEY and CRATEDB_CLOUD_API_SECRET.
Footnotes
-
Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings ↩
There was a problem hiding this comment.
We've implemented headless/unattended operations in croud and ctk. Thanks again.
cratedb-toolkit/cratedb_toolkit/util/croud.py
Lines 149 to 157 in 3b601c1
There was a problem hiding this comment.
I think the interface could be further improved by using the new JWT authentication mechanism instead of traditional credentials?
There was a problem hiding this comment.
Added the topic about using JWT to the backlog, to be processed on a later iteration.
There was a problem hiding this comment.
Fluent authentication using provided per-cluster JWT token has been implemented per 72d3fd9.
8ca895a to
0eabc99
Compare
46203fb to
62a3a62
Compare
- `CRATEDB_CLOUD_CLUSTER_ID` to `CRATEDB_CLUSTER_ID` - `CRATEDB_CLOUD_CLUSTER_NAME` to `CRATEDB_CLUSTER_NAME`
The current set of input parameters for addressing a database cluster, `cluster_id`, `cluster_name`, `sqlalchemy_url`, and `http_url`, needed a few bits of bundling, so they are now backed by a real model class. Likewise, the `ManagedCluster` and `StandaloneCluster` classes now provide the `adapter` property, which will return a suitable `DatabaseAdapter` instance. This significantly connects the dots for a better composite object model. In this spirit, the `DatabaseCluster` class emerges as a new entry point to acquire a database cluster handle universally.
... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.
... for addressing a database cluster by URL. The new option accepts URLs in both SQLAlchemy HTTP formats.
Also, implement token refresh after expiry.
There was a problem hiding this comment.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
cratedb_toolkit/cluster/croud.py (1)
436-467:⚠️ Potential issueSecurity risk in JWT token retrieval
The
get_jwt_tokenmethod uses a hardcoded fallback password for the keyring, which presents a security risk.Remove the hardcoded fallback and either require the environment variable or generate a unique key:
-kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD") or "TruaframEkEk" +kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD") +if not kr.keyring_key: + # Generate a unique key based on machine-specific information + import uuid + import getpass + user_id = getpass.getuser() + machine_id = str(uuid.getnode()) # MAC address as integer + kr.keyring_key = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"ctk-{user_id}-{machine_id}")) + logger.debug("Generated keyring key from machine-specific information")This approach generates a machine-specific key that is still deterministic (same on each run on the same machine) but not hardcoded in the source code.
♻️ Duplicate comments (12)
cratedb_toolkit/cfr/marimo.py (1)
53-54: Environment variable and variable name standardized correctlyThe change from
CRATEDB_SQLALCHEMY_URLtoCRATEDB_CLUSTER_URLaligns with the PR objective to unify cluster addressing across the codebase. The local variablesqlalchemy_urlis appropriately named as it represents a SQLAlchemy connection URL.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by testscratedb_toolkit/__init__.py (1)
14-16: Automatic side effects during import should be avoidedThe call to
preconfigure()during module import can lead to side effects that may break tests or cause unexpected behavior in interactive sessions. This was previously flagged in an earlier review.Consider making initialization explicit or providing a way to disable this behavior:
- Expose an explicit
initialize()function that users must call- Add an environment variable to control whether auto-configuration happens
- Move the
preconfigure()call to a separate function that users can call when needed-from .config import preconfigure +# Import but don't call directly +from .config import preconfigure -preconfigure() +# Allow explicit initialization instead🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests
[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by testsdoc/cluster/python.md (4)
1-3: 🛠️ Refactor suggestionFix Markdown heading anchor syntax.
The
(cluster-api-python)=reference definition does not follow standard CommonMark or MyST conventions. Combine the anchor with the heading using MyST's heading ID syntax.-(cluster-api-python)= -# CrateDB Cluster Python API +# CrateDB Cluster Python API {#cluster-api-python}
42-43: 🛠️ Refactor suggestionImprove environment variables listing format.
The current definition-list style (
:Environment variables:) may not render as expected in MyST. Use a standard Markdown list format instead.-:Environment variables: - `CRATEDB_CLUSTER_ID`, `CRATEDB_CLUSTER_NAME`, `CRATEDB_CLUSTER_URL` +**Environment variables:** +- `CRATEDB_CLUSTER_ID` +- `CRATEDB_CLUSTER_NAME` +- `CRATEDB_CLUSTER_URL`
45-50: 🛠️ Refactor suggestionUse standard MyST admonition syntax for note blocks.
The
:::{note}block may not render correctly. Use standard MyST admonition syntax.-:::{note} -- All address options are mutually exclusive. -- The cluster identifier takes precedence over the cluster name. -- The cluster url takes precedence over the cluster id and name. -- Environment variables can be stored into an `.env` file in your working directory. -::: +```{note} +- All address options are mutually exclusive. +- The cluster identifier takes precedence over the cluster name. +- The cluster url takes precedence over the cluster id and name. +- Environment variables can be stored into an `.env` file in your working directory. +```
73-75: 🛠️ Refactor suggestionUse standard MyST admonition syntax for seealso blocks.
Similar to the note block above, the
:::{seealso}block should use standard MyST syntax.-:::{seealso} -{ref}`cluster-api-tutorial` includes a full end-to-end tutorial. -::: +```{seealso} +{ref}`cluster-api-tutorial` includes a full end-to-end tutorial. +```doc/cluster/backlog.md (1)
33-39: 🛠️ Refactor suggestionComplete the resolution action for concurrent resume operations.
The explanation of handling concurrent cluster resume operations ends abruptly after evaluating the status without specifying what action should be taken next.
=> Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`. See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384. + => Implement a retry mechanism with exponential backoff when the cluster is already in a resume operation, or display a clear error message to the user about the need to wait.cratedb_toolkit/cluster/cli.py (2)
145-151: Same null reference risk instopcommandSimilar to previous comments, the
stopcommand also accessescluster.info.asdict()without checking ifcluster.infois None.Apply this fix to prevent a potential
AttributeError, consistent with the recommendation for thestartcommand:with handle_command_errors("stop cluster"): # Acquire the database cluster handle and submit the `suspend` command. cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).probe().stop() logger.info(f"Successfully suspended cluster: {cluster}") # Display cluster information. - jd(cluster.info.asdict()) + if cluster.info: + jd(cluster.info.asdict()) + else: + logger.warning("No cluster information available")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 145-145: cratedb_toolkit/cluster/cli.py#L145
Added line #L145 was not covered by tests
[warning] 147-148: cratedb_toolkit/cluster/cli.py#L147-L148
Added lines #L147 - L148 were not covered by tests
[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests
127-134:⚠️ Potential issueNull reference risk in
startcommandThe
startcommand accessescluster.info.asdict()without checking ifcluster.infois None, similar to the issue flagged in past reviews for thestopcommand.Apply this fix to prevent a potential
AttributeError:with handle_command_errors("start cluster"): # Acquire the database cluster handle and submit the `start` command. cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).start() logger.info(f"Successfully acquired cluster: {cluster}") # Display cluster information. - jd(cluster.info.asdict()) + if cluster.info: + jd(cluster.info.asdict()) + else: + logger.warning("No cluster information available")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 127-127: cratedb_toolkit/cluster/cli.py#L127
Added line #L127 was not covered by tests
[warning] 129-130: cratedb_toolkit/cluster/cli.py#L129-L130
Added lines #L129 - L130 were not covered by tests
[warning] 133-133: cratedb_toolkit/cluster/cli.py#L133
Added line #L133 was not covered by testscratedb_toolkit/cluster/core.py (3)
377-380:⚠️ Potential issueFix type error in DataImportGuide initialization
The
DataImportGuideexpects a non-optionalClusterInformation, butself.infoisOptional[ClusterInformation].Add a null check before passing to DataImportGuide:
- texts = DataImportGuide( - cluster_info=self.info, - job=cloud_job, - ) + if not self.info: + logger.warning("Cluster information not available for detailed guidance") + # Create a default ClusterInformation or use another approach + cluster_info = ClusterInformation() + else: + cluster_info = self.info + + texts = DataImportGuide( + cluster_info=cluster_info, + job=cloud_job, + )
251-281: 🛠️ Refactor suggestionAdd timeout handling for asynchronous operations in
acquire()The
deploy()andresume()operations are asynchronous and can take minutes to complete, yetacquire()returns immediately after invoking them and callingprobe()once.Implement a timeout mechanism to poll until the cluster is ready or a timeout is reached:
def acquire(self) -> "ManagedCluster": """ Acquire a database cluster. This means going through the steps of deploy and/or start, as applicable. - When cluster does not exist, create/deploy it. - When a cluster exists, but is stopped/hibernated, start/resume it. """ self.probe() if self.exists: suspended = self.info.cloud.get("suspended") if suspended is True: logger.info(f"Cluster is suspended, resuming it: id={self.cluster_id}, name={self.cluster_name}") self.resume() + # Poll with timeout until the cluster is ready + start_time = time.time() + timeout = 300 # 5 minutes + poll_interval = 5 # seconds + while True: + self.probe() + if not self.info.cloud.get("suspended"): + logger.info(f"Cluster successfully resumed: id={self.cluster_id}, name={self.cluster_name}") + break + if time.time() - start_time > timeout: + raise OperationFailed(f"Timeout waiting for cluster to resume: id={self.cluster_id}, name={self.cluster_name}") + logger.debug(f"Waiting for cluster to resume: id={self.cluster_id}, name={self.cluster_name}") + time.sleep(poll_interval) logger.info(f"Cluster resumed: id={self.cluster_id}, name={self.cluster_name}") - self.probe() elif suspended is False: logger.info(f"Cluster is running: id={self.cluster_id}, name={self.cluster_name}") else: raise CroudException(f"Cluster in unknown state: {self.cluster_name}") else: logger.info(f"Cluster does not exist, deploying it: id={self.cluster_id}, name={self.cluster_name}") self.deploy() + # Poll with timeout until the cluster exists and is ready + start_time = time.time() + timeout = 600 # 10 minutes + poll_interval = 10 # seconds + while True: + self.probe() + if self.exists and self.info.ready: + logger.info(f"Cluster successfully deployed: id={self.cluster_id}, name={self.cluster_name}") + break + if time.time() - start_time > timeout: + raise OperationFailed(f"Timeout waiting for cluster to deploy: id={self.cluster_id}, name={self.cluster_name}") + logger.debug(f"Waiting for cluster to deploy: id={self.cluster_id}, name={self.cluster_name}") + time.sleep(poll_interval) logger.info(f"Cluster deployed: id={self.cluster_id}, name={self.cluster_name}") - self.probe() if not self.exists: # TODO: Gather and propagate more information why the deployment failed. # Capture deployment logs or status if available. # Possibly use details from `last_async_operation`. raise CroudException(f"Deployment of cluster failed: {self.cluster_name}") return self
319-324:⚠️ Potential issueAdd validation for empty
operationin resume methodThe
resumemethod callsself.operation.resume()without checking ifself.operationisNone, which could lead to anAttributeError.Add a validation check similar to what's in the
stopmethod:if self.cluster_id is None: raise DatabaseAddressMissingError("Need cluster identifier to resume cluster") logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}") - if self.operation: - self.operation.resume() + if not self.operation: + # If the operation is None, try to probe the cluster first to initialize it + logger.debug("Operation not initialized, probing cluster") + self.probe() + if not self.operation: + raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}") + + self.operation.resume() self.probe() return self
🧹 Nitpick comments (10)
cratedb_toolkit/util/setting.py (2)
40-69: Consider adding exception handling for edge cases inobtain_settingsThe function handles the Click exit code exceptions but may not handle other potential exceptions like file I/O errors from
init_dotenv(). Also, it returns an empty dictionary when there's an exit code of 0, which might be unexpected in some cases.def obtain_settings(specs: t.List[Setting], prog_name: str = None) -> t.Dict[str, str]: """ Employ command-line parsing at runtime, using the `click` parser. Obtain configuration setting from different sources, DWIM-style. This is the generic workhorse utility variant. - Command line argument, in long form. Example: `--foobar=bazqux`. - Positional argument on command line. Example: `bazqux`. - Environment variable. Example: `export FOOBAR=bazqux`. - Environment variable prefix. Example: `export APPNAME_FOOBAR=bazqux`. """ # Load environment variables from `.env` file. - init_dotenv() + try: + init_dotenv() + except Exception as ex: + logger.warning(f"Failed to load environment variables from .env file: {ex}") # Decode settings from command-line arguments or environment variables. prog_name = prog_name or sys.argv[0] click_specs = [spec.click for spec in specs] command = click.Command(prog_name, params=click_specs) if CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST: args = [] else: args = sys.argv[1:] try: with command.make_context(prog_name, args=args) as ctx: return ctx.params except click.exceptions.Exit as ex: if ex.exit_code != 0: raise + logger.debug("Click parser exited with code 0") return {}
72-104: Enhance error messages in mutual exclusiveness checkThe function correctly validates that exactly one setting among a mutually exclusive group is specified. However, the error messages could be more specific about which values were actually provided.
Consider enhancing the error messages to include the specific values that were provided for the mutually exclusive settings to make debugging easier:
def check_mutual_exclusiveness( specs: t.List[Setting], settings: t.Dict[str, str], message_none: str = None, message_multiple: str = None ): """ Check settings for mutual exclusiveness. It has been inspired by click-option-group's RequiredMutuallyExclusiveOptionGroup. https://github.com/click-contrib/click-option-group """ parameter_names = [] environment_variables = [] values = [] for setting in specs: if setting.group is None: continue if setting.click.name is None: raise ValueError("Setting specification has no name") parameter_names.append(setting.click.opts[0]) environment_variables.append(setting.click.envvar) value = settings.get(setting.click.name) values.append(value) guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}" if all(value is None for value in values): if message_none is None: message_none = f"One of the settings is required, but none of them have been specified. {guidance}" raise ValueError(message_none) if values.count(None) < len(values) - 1: + # Collect and format the specified settings + specified = [f"{param}={val}" for param, val in zip(parameter_names, values) if val is not None] if message_multiple is None: message_multiple = ( - f"The settings are mutually exclusive, but multiple of them have been specified. {guidance}" + f"The settings are mutually exclusive, but multiple were specified: {', '.join(specified)}. {guidance}" ) raise ValueError(message_multiple) return valuesdoc/cluster/_address.md (1)
1-17: Clear documentation of the cluster addressing optionsThe documentation clearly explains the various options for specifying cluster addresses and their precedence rules. This will be very helpful for users of the toolkit.
One minor suggestion for improved formatting consistency - consider adding a blank line between the rubric and the first paragraph to match the spacing used elsewhere in the document.
:::{rubric} Cluster address ::: + The program provides various options for addressing the database cluster in different situations, managed or not.🧰 Tools
🪛 LanguageTool
[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options:--cluster-id, `--cluste...(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/python.md (1)
5-7: Add missing article "the" before "higher level API/SDK".For proper grammar, add the missing article.
-The `cratedb_toolkit.ManagedCluster` class provides higher level API/SDK +The `cratedb_toolkit.ManagedCluster` class provides the higher level API/SDK entrypoints to start/deploy/resume a database cluster, inquire information about it, and stop/suspend it again.🧰 Tools
🪛 LanguageTool
[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API Thecratedb_toolkit.ManagedClusterclass provides higher level API/SDK en...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
doc/cluster/backlog.md (1)
98-98: Add missing preposition in token expiry compensation.For better grammar, add the preposition "for" after "compensate".
-Managed: Use `keyring` for caching the JWT token, and compensate token expiry +Managed: Use `keyring` for caching the JWT token, and compensate for token expiry🧰 Tools
🪛 LanguageTool
[uncategorized] ~98-~98: Possible missing preposition found.
Context: ...r caching the JWT token, and compensate token expiry(AI_EN_LECTOR_MISSING_PREPOSITION)
cratedb_toolkit/testing/testcontainers/util.py (2)
141-162: Robust Docker daemon detection with graceful test skippingThis new
DockerSkippingContainerclass elegantly handles scenarios where Docker is unavailable by intercepting connection exceptions and converting them to pytest skips, preventing test failures when Docker isn't running. The early_containerattribute initialization also prevents destructor issues.However, I notice a TODO about synchronizing with
PytestTestcontainerAdapter. Consider creating a shared helper function for Docker error detection to maintain consistency between the two implementations:+def is_docker_unavailable(ex: DockerException) -> bool: + """Check if exception indicates Docker daemon is unavailable.""" + return any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")) # In DockerSkippingContainer except DockerException as ex: - if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")): + if is_docker_unavailable(ex): raise pytest.skip(reason="Skipping test because Docker is not running", allow_module_level=True) from ex # In PytestTestcontainerAdapter except DockerException as ex: - if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")): + if is_docker_unavailable(ex): raise pytest.skip(reason="Skipping test because Docker daemon is not available", allow_module_level=True) from ex
164-207: Well-designed Pytest adapter for testcontainersThe
PytestTestcontainerAdapterprovides a clean abstraction for handling testcontainers within Pytest fixtures. It correctly manages container lifecycle and gracefully handles Docker unavailability.I noticed that
setup()is made abstract butstop()has a default implementation. Consider documenting this asymmetry to guide implementers:def setup(self): + """ + Override this method to initialize self.container with a DockerContainer instance. + This method should create the container but NOT start it, as start() will be called + automatically after setup() completes successfully. + """ raise NotImplementedError("Must be implemented by child class")cratedb_toolkit/util/croud.py (2)
213-241: Good custom CroudClient implementation with User-Agent headerThe
CroudClientclass extends the base croud API client with a custom User-Agent header, which is a good practice for identifying the toolkit in API requests.However, I noticed that the implementation of
create()doesn't add any error handling for cases where environment variables might be missing. Consider adding validation to check for required credentials:@staticmethod def create() -> "croud.api.Client": """ Canonical factory method for creating a `croud.api.Client` instance. """ with headless_config(): from croud.config import CONFIG + # Validate that either token or key/secret are available + has_token = CONFIG.token is not None + has_key_secret = CONFIG.key is not None and CONFIG.secret is not None + if not has_token and not has_key_secret: + logger.warning("Neither token nor API key/secret available. Authentication may fail.") return croud.api.Client( CONFIG.endpoint, token=CONFIG.token, on_token=CONFIG.set_current_auth_token, key=CONFIG.key, secret=CONFIG.secret, region=CONFIG.region, sudo=False, )
242-258: Headless config implementation could be more robustThe implementation of
get_headless_configfalls back to using the regular config file if it exists and no API key is set in the environment. This is good for flexibility, but it doesn't validate that the headless config has the necessary credentials.Consider adding validation to ensure the credentials are available and log a warning if they're not:
@classproperty def get_headless_config(cls) -> Configuration: cfg = Configuration("croud.yaml") if cfg._file_path.exists() and "CRATEDB_CLOUD_API_KEY" not in os.environ: return cfg tmp_file = NamedTemporaryFile() tmp_path = Path(tmp_file.name) config = Configuration("headless.yaml", tmp_path) # Get credentials from the environment. config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY") config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET") config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID") # config.profile["endpoint"] = os.environ.get("CRATEDB_CLOUD_ENDPOINT") # noqa: ERA001 + if not config.profile["key"] or not config.profile["secret"]: + logger.warning( + "Missing required cloud credentials: CRATEDB_CLOUD_API_KEY and/or " + "CRATEDB_CLOUD_API_SECRET. Authentication may fail." + ) + return configcratedb_toolkit/cluster/croud.py (1)
211-293: Robust deployment logic with good environment variable handlingThe
deploy_clustermethod has excellent handling of subscription selection and proper validation of required parameters. The environment variable usage is well-documented and the error messages are clear and helpful.Consider replacing the TODO comment about documenting environment variables with a reference to actual documentation:
-# TODO: Add documentation about those environment variables. +# Environment variables are documented in the project README or docs/configuration.md.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (107)
.github/workflows/main.yml(1 hunks).github/workflows/mongodb.yml(1 hunks)CHANGES.md(2 hunks)cratedb_toolkit/__init__.py(2 hunks)cratedb_toolkit/adapter/rockset/cli.py(2 hunks)cratedb_toolkit/adapter/rockset/server/dependencies.py(1 hunks)cratedb_toolkit/api/cli.py(0 hunks)cratedb_toolkit/api/guide.py(0 hunks)cratedb_toolkit/api/main.py(0 hunks)cratedb_toolkit/cfr/cli.py(9 hunks)cratedb_toolkit/cfr/marimo.py(1 hunks)cratedb_toolkit/cli.py(2 hunks)cratedb_toolkit/cluster/cli.py(2 hunks)cratedb_toolkit/cluster/core.py(1 hunks)cratedb_toolkit/cluster/croud.py(1 hunks)cratedb_toolkit/cluster/guide.py(1 hunks)cratedb_toolkit/cluster/model.py(1 hunks)cratedb_toolkit/cluster/util.py(0 hunks)cratedb_toolkit/cmd/tail/cli.py(3 hunks)cratedb_toolkit/config.py(1 hunks)cratedb_toolkit/exception.py(2 hunks)cratedb_toolkit/info/cli.py(6 hunks)cratedb_toolkit/info/core.py(2 hunks)cratedb_toolkit/info/http.py(1 hunks)cratedb_toolkit/io/cli.py(4 hunks)cratedb_toolkit/io/croud.py(4 hunks)cratedb_toolkit/io/dynamodb/api.py(1 hunks)cratedb_toolkit/io/influxdb.py(1 hunks)cratedb_toolkit/io/mongodb/adapter.py(3 hunks)cratedb_toolkit/io/mongodb/api.py(4 hunks)cratedb_toolkit/io/processor/kinesis_lambda.py(2 hunks)cratedb_toolkit/job/cli.py(0 hunks)cratedb_toolkit/job/croud.py(0 hunks)cratedb_toolkit/model.py(4 hunks)cratedb_toolkit/option.py(1 hunks)cratedb_toolkit/options.py(0 hunks)cratedb_toolkit/settings/compare.py(5 hunks)cratedb_toolkit/shell/cli.py(3 hunks)cratedb_toolkit/testing/testcontainers/cratedb.py(5 hunks)cratedb_toolkit/testing/testcontainers/influxdb2.py(2 hunks)cratedb_toolkit/testing/testcontainers/minio.py(2 hunks)cratedb_toolkit/testing/testcontainers/mongodb.py(2 hunks)cratedb_toolkit/testing/testcontainers/util.py(4 hunks)cratedb_toolkit/util/app.py(1 hunks)cratedb_toolkit/util/client.py(1 hunks)cratedb_toolkit/util/common.py(2 hunks)cratedb_toolkit/util/crash.py(2 hunks)cratedb_toolkit/util/croud.py(8 hunks)cratedb_toolkit/util/database.py(6 hunks)cratedb_toolkit/util/runtime.py(1 hunks)cratedb_toolkit/util/setting.py(1 hunks)doc/adapter/rockset.md(1 hunks)doc/backlog/main.md(2 hunks)doc/bugs.md(1 hunks)doc/cfr/jobstats.md(1 hunks)doc/cfr/systable.md(3 hunks)doc/cluster/_address.md(1 hunks)doc/cluster/backlog.md(1 hunks)doc/cluster/cli.md(1 hunks)doc/cluster/index.md(1 hunks)doc/cluster/python.md(1 hunks)doc/cluster/tutorial.md(1 hunks)doc/conf.py(1 hunks)doc/index.md(1 hunks)doc/info/index.md(1 hunks)doc/io/dms/standalone.md(1 hunks)doc/io/dynamodb/cdc-lambda.md(4 hunks)doc/io/dynamodb/cdc.md(2 hunks)doc/io/dynamodb/loader.md(2 hunks)doc/io/index.md(2 hunks)doc/io/influxdb/loader.md(3 hunks)doc/io/mongodb/cdc.md(3 hunks)doc/io/mongodb/loader.md(4 hunks)doc/io/mongodb/migr8.md(1 hunks)doc/sandbox.md(2 hunks)doc/util/index.md(1 hunks)doc/util/settings.md(1 hunks)doc/util/shell.md(1 hunks)examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py(1 hunks)examples/cloud_import.py(0 hunks)examples/notebook/cloud_import.ipynb(1 hunks)examples/python/cloud_cluster.py(1 hunks)examples/python/cloud_import.py(1 hunks)examples/shell/cloud_cluster.sh(1 hunks)examples/shell/cloud_import.sh(1 hunks)pyproject.toml(8 hunks)tests/adapter/test_rockset.py(3 hunks)tests/api/test_util.py(1 hunks)tests/cfr/test_info.py(1 hunks)tests/cfr/test_jobstats.py(3 hunks)tests/cfr/test_systable.py(6 hunks)tests/cluster/test_cli.py(1 hunks)tests/cluster/test_core.py(1 hunks)tests/cluster/test_guide.py(1 hunks)tests/cluster/test_model.py(1 hunks)tests/cmd/test_tail.py(1 hunks)tests/conftest.py(3 hunks)tests/examples/test_python.py(1 hunks)tests/examples/test_shell.py(1 hunks)tests/examples/test_shell.sh(1 hunks)tests/info/test_cli.py(3 hunks)tests/info/test_http.py(1 hunks)tests/io/dynamodb/__init__.py(0 hunks)tests/io/dynamodb/conftest.py(1 hunks)tests/io/dynamodb/test_cli.py(1 hunks)tests/io/influxdb/conftest.py(4 hunks)tests/io/influxdb/test_cli.py(3 hunks)
⛔ Files not processed due to max files limit (12)
- tests/io/mongodb/conftest.py
- tests/io/mongodb/test_cli.py
- tests/io/test_import.py
- tests/io/test_processor.py
- tests/retention/test_cli.py
- tests/retention/test_examples.py
- tests/settings/test_cli.py
- tests/shell/test_cli.py
- tests/test_model.py
- tests/test_shell.py
- tests/util/database.py
- tests/util/shunit2
💤 Files with no reviewable changes (9)
- tests/io/dynamodb/init.py
- cratedb_toolkit/api/cli.py
- cratedb_toolkit/job/cli.py
- cratedb_toolkit/cluster/util.py
- cratedb_toolkit/api/guide.py
- examples/cloud_import.py
- cratedb_toolkit/options.py
- cratedb_toolkit/job/croud.py
- cratedb_toolkit/api/main.py
✅ Files skipped from review due to trivial changes (7)
- cratedb_toolkit/io/dynamodb/api.py
- doc/io/dynamodb/cdc.md
- tests/cfr/test_info.py
- tests/io/dynamodb/conftest.py
- examples/shell/cloud_import.sh
- cratedb_toolkit/option.py
- doc/util/settings.md
🚧 Files skipped from review as they are similar to previous changes (74)
- cratedb_toolkit/io/processor/kinesis_lambda.py
- doc/io/mongodb/cdc.md
- doc/cfr/jobstats.md
- cratedb_toolkit/io/influxdb.py
- doc/conf.py
- doc/util/index.md
- doc/adapter/rockset.md
- doc/io/mongodb/migr8.md
- doc/io/dynamodb/loader.md
- examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py
- tests/cluster/test_model.py
- examples/shell/cloud_cluster.sh
- tests/cfr/test_jobstats.py
- doc/io/dms/standalone.md
- cratedb_toolkit/info/http.py
- .github/workflows/mongodb.yml
- tests/info/test_http.py
- tests/cfr/test_systable.py
- doc/cluster/cli.md
- tests/adapter/test_rockset.py
- doc/io/influxdb/loader.md
- examples/python/cloud_cluster.py
- doc/bugs.md
- cratedb_toolkit/info/cli.py
- tests/io/dynamodb/test_cli.py
- tests/examples/test_python.py
- doc/io/mongodb/loader.md
- tests/io/influxdb/conftest.py
- doc/cfr/systable.md
- tests/io/influxdb/test_cli.py
- doc/info/index.md
- tests/cmd/test_tail.py
- cratedb_toolkit/testing/testcontainers/influxdb2.py
- cratedb_toolkit/cli.py
- doc/cluster/tutorial.md
- doc/index.md
- tests/examples/test_shell.py
- cratedb_toolkit/testing/testcontainers/minio.py
- cratedb_toolkit/cmd/tail/cli.py
- cratedb_toolkit/config.py
- cratedb_toolkit/util/client.py
- examples/notebook/cloud_import.ipynb
- doc/cluster/index.md
- cratedb_toolkit/util/common.py
- doc/io/dynamodb/cdc-lambda.md
- cratedb_toolkit/util/app.py
- doc/sandbox.md
- cratedb_toolkit/cluster/guide.py
- cratedb_toolkit/io/mongodb/adapter.py
- cratedb_toolkit/testing/testcontainers/mongodb.py
- tests/info/test_cli.py
- cratedb_toolkit/info/core.py
- CHANGES.md
- cratedb_toolkit/io/mongodb/api.py
- cratedb_toolkit/util/crash.py
- cratedb_toolkit/util/runtime.py
- tests/cluster/test_core.py
- tests/examples/test_shell.sh
- cratedb_toolkit/shell/cli.py
- tests/cluster/test_guide.py
- cratedb_toolkit/exception.py
- doc/io/index.md
- .github/workflows/main.yml
- tests/api/test_util.py
- doc/util/shell.md
- cratedb_toolkit/settings/compare.py
- examples/python/cloud_import.py
- cratedb_toolkit/testing/testcontainers/cratedb.py
- pyproject.toml
- cratedb_toolkit/util/database.py
- tests/cluster/test_cli.py
- cratedb_toolkit/model.py
- cratedb_toolkit/cluster/model.py
- cratedb_toolkit/io/croud.py
🧰 Additional context used
🧠 Learnings (3)
tests/conftest.py (2)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:55-62
Timestamp: 2025-04-26T17:45:20.650Z
Learning: In the CrateDB toolkit, the container start operation is idempotent, meaning it can be safely called multiple times without causing issues. This allows for flexible container lifecycle management where both `setup()` may call `container.start()` and parent classes like `PytestTestcontainerAdapter` may also invoke `start()` methods.
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:168-170
Timestamp: 2025-05-02T15:26:10.385Z
Learning: In the `cloud_environment` fixture in tests/conftest.py, environment variables like TEST_CRATEDB_CLOUD_API_KEY, TEST_CRATEDB_CLOUD_API_SECRET, TEST_CRATEDB_CLOUD_ORGANIZATION_ID, and TEST_CRATEDB_CLUSTER_NAME are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.
cratedb_toolkit/cluster/core.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/api/main.py:263-265
Timestamp: 2025-04-26T21:59:59.463Z
Learning: When handling DNS propagation delays after resource creation (like CrateDB Cloud clusters), both fixed sleeps and immediate polling have drawbacks. Fixed sleeps are brittle, but immediate polling risks negative DNS caching. A hybrid approach with a short initial sleep followed by polling with exponential backoff is preferred.
cratedb_toolkit/cfr/marimo.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cfr/marimo.py:53-54
Timestamp: 2025-05-02T11:43:42.639Z
Learning: The submodules within `cratedb_toolkit/io` employ a different naming scheme than other parts of the codebase and retain their own conventions (e.g., may continue using `cratedb_sqlalchemy_url` instead of `sqlalchemy_url`).
🧬 Code Graph Analysis (6)
cratedb_toolkit/util/setting.py (1)
cratedb_toolkit/config.py (2)
RUNNING_ON_PYTEST(26-27)RUNNING_ON_JUPYTER(22-23)
cratedb_toolkit/io/cli.py (3)
cratedb_toolkit/cluster/core.py (4)
DatabaseCluster(603-656)create(623-656)load_table(342-395)load_table(523-600)cratedb_toolkit/model.py (3)
InputOutputResource(205-212)TableAddress(182-201)schema(174-178)cratedb_toolkit/cluster/model.py (1)
load_table(184-193)
cratedb_toolkit/testing/testcontainers/util.py (5)
tests/conftest.py (1)
setup(56-62)tests/io/dynamodb/conftest.py (1)
setup(45-56)tests/io/mongodb/conftest.py (1)
setup(38-48)tests/io/influxdb/conftest.py (1)
setup(28-34)cratedb_toolkit/testing/testcontainers/cratedb.py (2)
start(176-182)stop(184-189)
cratedb_toolkit/__init__.py (4)
cratedb_toolkit/config.py (2)
preconfigure(54-65)configure(35-51)cratedb_toolkit/info/cli.py (1)
cluster(60-64)cratedb_toolkit/cluster/core.py (3)
DatabaseCluster(603-656)ManagedCluster(104-445)StandaloneCluster(449-600)cratedb_toolkit/model.py (2)
InputOutputResource(205-212)TableAddress(182-201)
cratedb_toolkit/adapter/rockset/server/dependencies.py (2)
cratedb_toolkit/util/database.py (1)
DatabaseAdapter(43-421)cratedb_toolkit/model.py (1)
dburi(103-107)
cratedb_toolkit/cfr/cli.py (4)
cratedb_toolkit/util/app.py (1)
make_cli(11-52)cratedb_toolkit/util/data.py (1)
path_from_url(51-54)cratedb_toolkit/cfr/systable.py (2)
SystemTableExporter(116-190)SystemTableImporter(193-257)cratedb_toolkit/model.py (4)
dburi(103-107)DatabaseAddress(55-178)from_string(63-72)from_string(200-201)
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/core.py
[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests
[warning] 121-128: cratedb_toolkit/cluster/core.py#L121-L128
Added lines #L121 - L128 were not covered by tests
[warning] 131-134: cratedb_toolkit/cluster/core.py#L131-L134
Added lines #L131 - L134 were not covered by tests
[warning] 138-140: cratedb_toolkit/cluster/core.py#L138-L140
Added lines #L138 - L140 were not covered by tests
[warning] 144-144: cratedb_toolkit/cluster/core.py#L144
Added line #L144 was not covered by tests
[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests
cratedb_toolkit/cluster/cli.py
[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests
[warning] 93-95: cratedb_toolkit/cluster/cli.py#L93-L95
Added lines #L93 - L95 were not covered by tests
[warning] 106-109: cratedb_toolkit/cluster/cli.py#L106-L109
Added lines #L106 - L109 were not covered by tests
[warning] 114-115: cratedb_toolkit/cluster/cli.py#L114-L115
Added lines #L114 - L115 were not covered by tests
[warning] 127-127: cratedb_toolkit/cluster/cli.py#L127
Added line #L127 was not covered by tests
[warning] 129-130: cratedb_toolkit/cluster/cli.py#L129-L130
Added lines #L129 - L130 were not covered by tests
[warning] 133-133: cratedb_toolkit/cluster/cli.py#L133
Added line #L133 was not covered by tests
[warning] 145-145: cratedb_toolkit/cluster/cli.py#L145
Added line #L145 was not covered by tests
[warning] 147-148: cratedb_toolkit/cluster/cli.py#L147-L148
Added lines #L147 - L148 were not covered by tests
[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests
[warning] 171-171: cratedb_toolkit/cluster/cli.py#L171
Added line #L171 was not covered by tests
[warning] 173-176: cratedb_toolkit/cluster/cli.py#L173-L176
Added lines #L173 - L176 were not covered by tests
[warning] 179-182: cratedb_toolkit/cluster/cli.py#L179-L182
Added lines #L179 - L182 were not covered by tests
[warning] 188-191: cratedb_toolkit/cluster/cli.py#L188-L191
Added lines #L188 - L191 were not covered by tests
[warning] 197-202: cratedb_toolkit/cluster/cli.py#L197-L202
Added lines #L197 - L202 were not covered by tests
cratedb_toolkit/__init__.py
[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests
[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by tests
[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests
[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by tests
cratedb_toolkit/adapter/rockset/cli.py
[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests
[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests
cratedb_toolkit/adapter/rockset/server/dependencies.py
[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by tests
cratedb_toolkit/cfr/cli.py
[warning] 124-124: cratedb_toolkit/cfr/cli.py#L124
Added line #L124 was not covered by tests
[warning] 167-167: cratedb_toolkit/cfr/cli.py#L167
Added line #L167 was not covered by tests
[warning] 198-199: cratedb_toolkit/cfr/cli.py#L198-L199
Added lines #L198 - L199 were not covered by tests
[warning] 215-216: cratedb_toolkit/cfr/cli.py#L215-L216
Added lines #L215 - L216 were not covered by tests
[warning] 241-241: cratedb_toolkit/cfr/cli.py#L241
Added line #L241 was not covered by tests
cratedb_toolkit/cfr/marimo.py
[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests
🪛 LanguageTool
doc/backlog/main.md
[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...
(UNLIKELY_OPENING_PUNCTUATION)
[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...
(SOONER_OR_LATER)
[uncategorized] ~62-~62: The preposition ‘to’ seems more likely in this position.
Context: ... Accept alias --format {jsonl,ndjson} for --format json_row - Catch recursion ...
(AI_HYDRA_LEO_REPLACE_FOR_TO)
doc/cluster/_address.md
[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...
(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/backlog.md
[uncategorized] ~63-~63: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start: Avoid creating a project on each deploy...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~98-~98: Possible missing preposition found.
Context: ...r caching the JWT token, and compensate token expiry
(AI_EN_LECTOR_MISSING_PREPOSITION)
doc/cluster/python.md
[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API The cratedb_toolkit.ManagedCluster class provides higher level API/SDK en...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~41-~41: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: CFR for OS windows-latest
- GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
🔇 Additional comments (31)
cratedb_toolkit/util/setting.py (3)
11-21: Good job defining a clean Setting dataclassThe
Settingclass effectively encapsulates aclick.Parameterand an optional group identifier. Theasdict()method is a handy utility for converting instances to dictionaries, which will be useful for serialization and debugging.
23-28: Straightforward utility function for command line argument detectionThis function cleanly checks for the presence of long options in command-line arguments, which is useful for determining whether to prioritize CLI arguments over environment variables.
30-38: Well-implemented environment variable loading with environment detectionThe function appropriately skips dotenv loading when running under pytest, which prevents test environments from being affected by local
.envfiles. The logging of the dotenv file path is also helpful for debugging.cratedb_toolkit/cfr/cli.py (2)
15-15: Updated import path reflects architectural improvementsThe change from
cratedb_toolkit.api.clitocratedb_toolkit.util.appfor themake_clifunction aligns with the architectural refactoring to consolidate cluster addressing options and improve the structure of the codebase.
32-32:Details
✅ Verification successful
Unified cluster addressing approach applied consistently
All instances of
cratedb_sqlalchemy_urlandcratedb_http_urlhave been replaced with a singlecluster_urlparameter derived fromctx.meta["cluster_url"]. This standardization aligns with the PR objective to unify cluster addressing across the codebase.The help text examples have also been updated to reflect the new environment variable name
CRATEDB_CLUSTER_URL.The consistency of this change can be verified by checking that all occurrences have been updated:
Also applies to: 61-61, 77-79, 124-125, 167-168, 198-199, 215-216, 241-241
🏁 Script executed:
#!/bin/bash # Check for any remaining occurrences of the old variable names echo "Checking for remaining occurrences of cratedb_sqlalchemy_url:" rg "cratedb_sqlalchemy_url" --type py "cratedb_toolkit/cfr/" echo "Checking for remaining occurrences of cratedb_http_url:" rg "cratedb_http_url" --type py "cratedb_toolkit/cfr/" echo "Checking for new usages of cluster_url:" rg "cluster_url" --type py "cratedb_toolkit/cfr/"Length of output: 1320
Unified cluster_url usage verified
All references to
cratedb_sqlalchemy_urlandcratedb_http_urlhave been removed, andcluster_urlis now consistently used incratedb_toolkit/cfr/cli.pyat the following locations:
- Lines 32, 61
- Lines 77–79
- Lines 124–125
- Lines 167–168
- Lines 198–199
- Lines 215–216
- Line 241
No remaining occurrences of the old variables were detected. Changes are approved.
cratedb_toolkit/__init__.py (1)
18-29: Well-structured public API with__all__Good job defining the public API explicitly using
__all__. This makes it clear what parts of the package are intended for external use and helps with wildcard imports.The inclusion of both the unified
DatabaseClusterand the specific implementation classes (ManagedCluster,StandaloneCluster) gives users flexibility while encouraging the use of the factory pattern.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests
[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by testscratedb_toolkit/adapter/rockset/cli.py (1)
9-9:Details
✅ Verification successful
Successfully refactored to use unified cluster addressing
The changes correctly replace the local decorator with the reusable
option_cluster_urland update all related parameters and environment variables for consistency.This aligns well with the PR objective of unifying cluster connection options across the codebase.
Note: The error path for missing
cluster_urlstill lacks test coverage. While this was flagged in a previous review and marked as out of scope, consider adding tests in a future iteration to validate this critical validation path.Also applies to: 22-22, 29-30, 34-34, 38-38, 41-41
🏁 Script executed:
#!/bin/bash # Verify this is the only place in the file that hasn't been updated rg -n "cratedb_sqlalchemy" --glob="**/adapter/rockset/**"Length of output: 57
No leftover
cratedb_sqlalchemyreferences found
Ranrg -n "cratedb_sqlalchemy" --glob="**/adapter/rockset/**"with no matches, confirming all imports have been updated.
Approve code changes.
Note: the missingcluster_urlerror path still lacks test coverage; consider adding tests for that validation in a future iteration.doc/backlog/main.md (2)
3-19: Good backlog planning for cluster addressing simplificationThe new "Iteration +0" section appropriately documents the planned simplifications to the CLI API by consolidating connection options and cluster addressing methods.
The note about software tests not pruning tables on CrateDB Cloud is an important issue to track, as this could lead to resource exhaustion in the test environment.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. -cratedb_toolkit/cfr/cli.py: Migrate toctx.meta["address"], like ...(UNLIKELY_OPENING_PUNCTUATION)
[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...(SOONER_OR_LATER)
60-60: Consistent updates to command naming and environment variablesThe changes correctly update the UX command name from
ctk list-jobstoctk cluster list-jobsand update environment variable references to match the new unified addressing scheme.These changes maintain consistency with the broader refactoring of cluster addressing throughout the codebase.
Also applies to: 65-65, 73-73
tests/conftest.py (3)
45-71: Well-implemented CrateDBFixture class.The new
CrateDBFixtureclass provides a clean wrapper for the CrateDB test container and database adapter. The container start operation is idempotent, so callingself.container.start()insetup()is safe even thoughPytestTestcontainerAdapter.run_setup()might also invokeself.start()afterward.
89-106: Well-designed environment management fixture.The
reset_environmentfixture complements the existingprune_environmentfixture by specifically targeting variables fromManagedClusterSettings. The code correctly guards againstNoneenvvar names with the condition check.
140-175: Well-structured cloud environment fixtures.The new cloud-related fixtures provide a clean way to set up environment variables for CrateDB Cloud integration tests. Based on the learnings, the environment variables here are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.
cratedb_toolkit/io/cli.py (4)
8-11: Good refactoring of imports for unified cluster management.The imports have been updated to use the new
DatabaseClusterabstraction and option decorators, which aligns with the broader refactoring to unify cluster management.
30-32: Improved CLI option decorators.The CLI options have been updated to use the new unified option decorators, providing a more consistent interface across the toolkit.
60-61: Good variable renaming for clarity.Renaming
resourcetosourceimproves code readability by being more descriptive of its purpose.
64-70: Simplified cluster creation and operation logic.Using the
DatabaseCluster.createfactory method centralizes cluster instantiation logic and makes the code more concise. This approach properly handles the choice betweenManagedClusterandStandaloneClusterbased on the provided parameters.cratedb_toolkit/testing/testcontainers/util.py (1)
48-54: Improved port handling for Docker container endpointsThe enhanced port discovery logic correctly retrieves the port attribute from either
self.portorself.port_to_expose, and strips any protocol suffixes to ensure valid connection URLs. The fallback mechanism and validation are well-implemented.cratedb_toolkit/cluster/cli.py (3)
21-58: Well-documented command help functions with clear examplesThe help functions provide excellent documentation with practical examples for each command, making the CLI more user-friendly.
72-83: Good pattern for CLI command implementationThe
infocommand is well-structured with proper error handling through the context manager, and uses the newClusterInformation.from_id_or_namemethod for flexible cluster identification.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests
173-182: Operation check inlist_jobsis goodThe code correctly checks
if not cluster.operationbefore attempting to access the operation's methods, preventing potential null reference issues. Good defensive programming.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 173-176: cratedb_toolkit/cluster/cli.py#L173-L176
Added lines #L173 - L176 were not covered by tests
[warning] 179-182: cratedb_toolkit/cluster/cli.py#L179-L182
Added lines #L179 - L182 were not covered by testscratedb_toolkit/util/croud.py (6)
47-63: Good workaround forcroud clusters deployJSON parsing issueThe
invoke_safedecodemethod correctly handles the issue with multiple JSON payloads fromcroud clusters deployby usingJSONDecoder.raw_decode. The fix properly unpacks the tuple returned byraw_decodeto extract the JSON object.The TODO and FIXME comments indicate this is a temporary solution. Tracking the upstream fix in croud is a good approach (as mentioned in crate/croud#564).
97-107: Good solution for removing conflicting version argumentsThe
remove_version_argumentmethod effectively removes the--versionargument from the parser, which is necessary to avoid conflicts with Click's version option.
173-178: Improved logging for croud outputThe updated logging approach redirects croud's output to DEBUG level with a clear prefix, making it easier to trace croud operations in logs.
191-193: Properly using headless configuration for croudThe code correctly patches croud's global configuration with a headless configuration, allowing for non-interactive usage.
204-211: Clean context manager implementation for headless configurationThe
headless_configcontext manager is well-designed, allowing temporary patching of the croud configuration during execution.
282-321: Well-implemented table name quoting functionThe
table_fqnfunction handles different table name formats correctly, including schema-qualified names. The error handling for empty table names is good, and the function handles already-quoted names properly.cratedb_toolkit/cluster/croud.py (4)
26-64: Well-structured URL generator for cloud root API endpointsThe
CloudRootUrlGeneratorclass provides a clean API for generating CrateDB Cloud API URLs at the organization level. The implementation correctly handles both global and organization-specific endpoints.
95-210: Well-implemented CloudRootServices with good error handlingThe
CloudRootServicesclass provides comprehensive functionality for interacting with CrateDB Cloud API at the root level. The implementation includes proper error handling, useful logging, and good project management facilities.
295-339: CloudClusterServices with comprehensive cluster operationsThe
CloudClusterServicesclass provides a good set of operations for managing individual clusters, including info retrieval and suspension/resumption.
340-435: Well-implemented import job managementThe import job management functions correctly handle both local and remote resources, with proper validation of required parameters and appropriate error handling.
cratedb_toolkit/cluster/core.py (1)
434-445: 🛠️ Refactor suggestionJWT token expiration is not handled
The
querymethod uses the JWT token context once obtained fromprobe(), but doesn't refresh it when it expires (typically after 1 hour).Track token creation time and refresh the token when needed:
def __init__(self, ...): # Existing code... self._jwt_ctx = nullcontext() + self._jwt_token_created_at = 0 # Timestamp when token was created def probe(self): # Existing code... self._jwt_ctx = jwt_token_patch(self.info.jwt.token) + self._jwt_token_created_at = time.time() # Rest of the method... def query(self, sql: str): # Ensure we have cluster connection details if not self.info or not self.info.cloud.get("url"): self.probe() + + # Refresh token if it's older than 30 minutes (to be safe before the 1h expiry) + current_time = time.time() + if current_time - self._jwt_token_created_at > 1800: # 30 minutes + logger.debug("Refreshing JWT token") + self._jwt_ctx = jwt_token_patch(self.info.jwt.token) + self._jwt_token_created_at = current_time with self._jwt_ctx: # Rest of the method...Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (11)
doc/cluster/python.md (4)
1-3: 🛠️ Refactor suggestionFix MyST syntax for document anchor.
The current anchor syntax
(cluster-api-python)=is non-standard and may not work correctly with MyST Markdown.-(cluster-api-python)= - -# CrateDB Cluster Python API +# CrateDB Cluster Python API {#cluster-api-python}
42-43: 🛠️ Refactor suggestionImprove environment variables listing format.
The current definition-list style may not render as expected in MyST Markdown.
-:Environment variables: - `CRATEDB_CLUSTER_ID`, `CRATEDB_CLUSTER_NAME`, `CRATEDB_CLUSTER_URL` +**Environment variables** +- `CRATEDB_CLUSTER_ID` +- `CRATEDB_CLUSTER_NAME` +- `CRATEDB_CLUSTER_URL`
45-50: 🛠️ Refactor suggestionUpdate MyST admonition syntax for better compatibility.
The current
:::{note}block format may not render correctly in all MyST processors.-:::{note} -- All address options are mutually exclusive. -- The cluster identifier takes precedence over the cluster name. -- The cluster url takes precedence over the cluster id and name. -- Environment variables can be stored into an `.env` file in your working directory. -::: +```{note} +- All address options are mutually exclusive. +- The cluster identifier takes precedence over the cluster name. +- The cluster url takes precedence over the cluster id and name. +- Environment variables can be stored into an `.env` file in your working directory. +```
73-75: 🛠️ Refactor suggestionUse standard MyST admonition syntax for seealso blocks.
Similar to the note block above, the
:::{seealso}block should use standard MyST syntax.-:::{seealso} -{ref}`cluster-api-tutorial` includes a full end-to-end tutorial. -::: +```{seealso} +{ref}`cluster-api-tutorial` includes a full end-to-end tutorial. +```doc/cluster/backlog.md (2)
33-39: 🛠️ Refactor suggestionAdd missing resolution action for concurrent resume operations
When describing how to handle concurrent cluster resume operations, the section ends with just checking the status but doesn't specify what action to take next.
=> Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`. See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384. + => Implement a retry mechanism with exponential backoff, or provide clear error messages + instructing users to wait until the current operation completes before trying again.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
39-39: Bare URL used
null(MD034, no-bare-urls)
40-43: 🛠️ Refactor suggestionAdd test coverage task for context manager behavior
Based on static analysis hints and previous comments, you should add a specific task to test the
ManagedClustercontext manager behavior.- CI: Software tests fail when invoked with the whole test suite, but work in isolation `test_import_managed_csv_local` - Expose `ManagedCluster`'s `stop_on_exit` option per environment variable +- CI: Software tests fail when invoked with the whole test suite, but work in isolation + `test_import_managed_csv_local` +- Add comprehensive test coverage for `ManagedCluster` context manager behavior + (specifically `__enter__` and `__exit__` methods) +- Expose `ManagedCluster`'s `stop_on_exit` option per environment variablecratedb_toolkit/cluster/cli.py (3)
190-207: Add tests for handle_command_errors context managerThe
handle_command_errorscontext manager lacks test coverage according to static analysis hints. Adding tests for this critical error handling component would ensure it works as expected.#!/bin/bash # Check for existing tests of handle_command_errors rg -l "handle_command_errors" tests/Please create a new test file (e.g.
tests/test_cli_error_handling.py) with tests for each error path:import pytest from click import ClickException from cratedb_toolkit.cluster.cli import handle_command_errors from cratedb_toolkit.exception import CroudException def test_croud_exception_handling(caplog): """Test that CroudException is logged and re-raised.""" with pytest.raises(CroudException): with handle_command_errors("test operation"): raise CroudException("test error") assert "Failed to test operation" in caplog.text assert "test error" in caplog.text def test_click_exception_passthrough(): """Test that ClickException passes through unchanged.""" with pytest.raises(ClickException): with handle_command_errors("test operation"): raise ClickException("click error") def test_generic_exception_becomes_system_exit(caplog): """Test that generic exceptions cause SystemExit(1).""" with pytest.raises(SystemExit) as excinfo: with handle_command_errors("test operation"): raise ValueError("value error") assert excinfo.value.code == 1 assert "Unexpected error" in caplog.text🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 193-196: cratedb_toolkit/cluster/cli.py#L193-L196
Added lines #L193 - L196 were not covered by tests
[warning] 202-207: cratedb_toolkit/cluster/cli.py#L202-L207
Added lines #L202 - L207 were not covered by tests
133-139:⚠️ Potential issueEnsure cluster.info is not None before accessing asdict() in the start command
The
startcommand usescluster.info.asdict()without verifying thatcluster.infois notNone. This could cause type errors if cluster information is unavailable.with handle_command_errors("start cluster"): # Acquire the database cluster handle and submit the `start` command. cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).start() logger.info(f"Successfully acquired cluster: {cluster}") # Display cluster information. - jd(cluster.info.asdict()) + if cluster.info: + jd(cluster.info.asdict()) + else: + logger.warning("No cluster information available")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 134-135: cratedb_toolkit/cluster/cli.py#L134-L135
Added lines #L134 - L135 were not covered by tests
[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests
151-156:⚠️ Potential issueEnsure cluster.info is not None before accessing asdict() in the stop command
The
stopcommand usescluster.info.asdict()without verifying thatcluster.infois notNone. This could cause type errors if cluster information is unavailable.with handle_command_errors("stop cluster"): # Acquire the database cluster handle and submit the `suspend` command. cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).probe().stop() logger.info(f"Successfully suspended cluster: {cluster}") # Display cluster information. - jd(cluster.info.asdict()) + if cluster.info: + jd(cluster.info.asdict()) + else: + logger.warning("No cluster information available")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 152-153: cratedb_toolkit/cluster/cli.py#L152-L153
Added lines #L152 - L153 were not covered by tests
[warning] 156-156: cratedb_toolkit/cluster/cli.py#L156
Added line #L156 was not covered by testscratedb_toolkit/cluster/croud.py (2)
253-258: Document environment variables in project documentationThe TODO comment about documenting environment variables should be addressed to ensure users understand the available configuration options.
How to document environment variables in Python packages?Create a new "Environment Variables" section in your README.md or docs/configuration.md with a table listing all environment variables used in the code:
## Environment Variables | Name | Description | Required | Default | Example | |-------------------------------|--------------------------------------------------|----------|-------------------------------|----------------| | `CRATEDB_CLOUD_TIER` | CrateDB Cloud cluster tier | No | `default` | `high-memory` | | `CRATEDB_CLOUD_PRODUCT` | CrateDB Cloud product type | No | `crfree` | `s2` | | `CRATEDB_CHANNEL` | Release channel for CrateDB | No | `stable` | `nightly` | | `CRATEDB_VERSION` | CrateDB version to install | No | `5.10.4` | `nightly` | | `CRATEDB_USERNAME` | Username for cluster access | Yes | N/A | `admin` | | `CRATEDB_PASSWORD` | Password for cluster access | Yes | N/A | `securepass123`| | `CTK_KEYRING_CRYPTFILE_PASSWORD` | Password for keyring encryption | No | `TruaframEkEk` | `custom-key` |After adding this documentation, remove the TODO comment from the code.
448-452:⚠️ Potential issueAvoid hardcoded fallback for keyring password
Using a hardcoded fallback for the keyring password presents a security risk. A more secure approach would be to generate a unique key or require the environment variable.
- kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD") or "TruaframEkEk" + kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD") + if not kr.keyring_key: + # Generate a unique key based on user environment + import uuid + import getpass + user_id = getpass.getuser() + kr.keyring_key = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"ctk-{user_id}")) + logger.debug("Generated unique keyring key based on user identity")This generates a unique key based on the user's identity when the environment variable is not set, which is more secure than a hardcoded value.
🧹 Nitpick comments (11)
cratedb_toolkit/testing/testcontainers/util.py (1)
190-205: Comprehensive error handling inrun_setup.The method properly handles Docker connection issues, but contains duplicate error detection logic compared to
DockerSkippingContainer.Consider extracting the common error detection patterns into a shared helper function to fulfill the "Synchronize with
DockerSkippingContainer" TODO:def is_docker_unavailable_error(ex): """Determine if an exception indicates Docker daemon unavailability.""" return any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version"))Then use this in both classes:
if is_docker_unavailable_error(ex): # Skip logic heredoc/cluster/backlog.md (4)
39-39: Format bare URL as proper Markdown linkThe URL should be formatted as a proper Markdown link for better readability and accessibility.
- See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384. + See also [GitHub Actions run for PR #81](https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384).🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
39-39: Bare URL used
null(MD034, no-bare-urls)
45-45: Format bare URL as proper Markdown linkThe URL should be formatted as a proper Markdown link for better readability and accessibility.
-- Update `shunit2`. -- https://github.com/kward/shunit2 +- Update `shunit2`. -- [shunit2 repository](https://github.com/kward/shunit2)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
45-45: Bare URL used
null(MD034, no-bare-urls)
70-76: Add language identifier to fenced code blockCode blocks should include a language identifier for proper syntax highlighting.
- ``` + ```json "health": { "last_seen": "", "running_operation": "CREATE", "status": "UNREACHABLE" },<details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> 70-70: Fenced code blocks should have a language specified null (MD040, fenced-code-language) </details> </details> --- `19-19`: **Consider future strategy for default CrateDB version** The TODO comment about using `latest` or `nightly` as default raises an important architectural question that should be addressed in a future iteration. Consider adding this as a dedicated item to your Iteration +1 or +2 backlog. A proper implementation would require: 1. A mechanism to determine the "latest" stable version dynamically 2. Clear documentation about stability implications when using "nightly" builds 3. Environment variable configuration options for controlling this behavior This is especially important for CI/CD pipelines that need predictable behavior. </blockquote></details> <details> <summary>cratedb_toolkit/cluster/croud.py (2)</summary><blockquote> `401-417`: **Improve URL vs. local file detection logic** The current method of detecting whether a resource URL refers to a local file or a remote URL could be improved. The check `Path(resource.url).exists()` might not work correctly for all URL types. Your approach with `is_remote = "://" in resource.url` on line 403 is good. However, the combined condition on line 404 could be clearer by separating the checks: ```diff - is_remote = "://" in resource.url - if not is_remote and Path(resource.url).exists(): + # Determine if this is a local file or remote URL + is_remote = "://" in resource.url + is_local_file = not is_remote and Path(resource.url).exists() + + if is_local_file:This makes the logic more explicit and easier to understand.
19-20: Consider semantic versioning for default CrateDB versionThe default CrateDB version is hardcoded, which might require frequent updates.
Consider using semantic version specifiers that allow for compatible updates:
-# TODO: Use `latest` CrateDB by default, or even `nightly`? -DEFAULT_CRATEDB_VERSION = "5.10.4" +# Default to the latest stable 5.x version +DEFAULT_CRATEDB_VERSION = "5.10" # Will use latest patch versionAlternatively, implement a mechanism to determine the latest stable version dynamically at runtime by querying an API endpoint or release feed.
cratedb_toolkit/util/setting.py (4)
30-38: Consider adding dotenv file override parameterThe current implementation works well, but you might want to add a parameter to allow specifying a custom dotenv file path for more flexibility.
-def init_dotenv(): +def init_dotenv(dotenv_path: str = None): """ Load environment variables from `.env` file. Args: dotenv_path: Optional path to the dotenv file. If not provided, will try to find it automatically. """ if not CONFIG.RUNNING_ON_PYTEST: - dotenv_file = find_dotenv() + dotenv_file = dotenv_path or find_dotenv() logger.info(f"Loading environment variables from .env file: {dotenv_file}") load_dotenv(dotenv_file)
40-74: Robust setting resolution with multiple sourcesThis function elegantly handles obtaining settings from multiple sources with appropriate environment detection for Jupyter and pytest. The docstring clearly documents the priority order and sources.
One minor suggestion is to consider returning early if in Jupyter/pytest and no args are provided, rather than creating an empty args list.
if CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST: args = [] + # Early return for notebook/test environments with empty args to avoid unnecessary context creation + try: + with command.make_context(prog_name, args=args) as ctx: + return ctx.params + except click.exceptions.Exit as ex: + if ex.exit_code != 0: + raise + return {} else: args = sys.argv[1:]
97-98: Improve error message readabilityThe guidance message directly includes Python lists (
parameter_namesandenvironment_variables), which might not be the most user-friendly format. Consider formatting these lists more readably.- guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}" + formatted_params = ", ".join(f"`{param}`" for param in parameter_names) + formatted_envvars = ", ".join(f"`{env}`" for env in environment_variables) + guidance = f"Use one of the CLI arguments {formatted_params} or environment variables {formatted_envvars}"
53-58: Consider logging the exception details at debug levelWhen handling exceptions from
init_dotenv(), consider logging the full exception details at debug level for easier troubleshooting while keeping the warning message concise.try: init_dotenv() except Exception as ex: logger.warning(f"Failed to load environment variables from .env file: {ex}") + logger.debug("Exception details:", exc_info=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
cratedb_toolkit/cluster/cli.py(2 hunks)cratedb_toolkit/cluster/croud.py(1 hunks)cratedb_toolkit/testing/testcontainers/util.py(4 hunks)cratedb_toolkit/util/setting.py(1 hunks)doc/cluster/_address.md(1 hunks)doc/cluster/backlog.md(1 hunks)doc/cluster/python.md(1 hunks)tests/cluster/test_cli.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
cratedb_toolkit/cluster/cli.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cluster/cli.py:82-89
Timestamp: 2025-04-27T03:07:04.375Z
Learning: In the cratedb-toolkit codebase, the `handle_command_errors` context manager is designed as an error handler for specific operations (not as a general protective wrapper). It's intentionally used to handle errors for targeted operations rather than wrapping entire function executions.
🧬 Code Graph Analysis (3)
cratedb_toolkit/util/setting.py (1)
cratedb_toolkit/config.py (2)
RUNNING_ON_PYTEST(26-27)RUNNING_ON_JUPYTER(22-23)
tests/cluster/test_cli.py (3)
cratedb_toolkit/cluster/cli.py (2)
cli(65-69)handle_command_errors(191-207)cratedb_toolkit/exception.py (1)
CroudException(12-13)tests/conftest.py (1)
cloud_environment(149-174)
cratedb_toolkit/testing/testcontainers/util.py (5)
tests/conftest.py (1)
setup(56-62)tests/io/mongodb/conftest.py (1)
setup(38-48)tests/io/dynamodb/conftest.py (1)
setup(45-56)tests/io/influxdb/conftest.py (1)
setup(28-34)cratedb_toolkit/testing/testcontainers/cratedb.py (2)
start(176-182)stop(184-189)
🪛 LanguageTool
doc/cluster/backlog.md
[uncategorized] ~63-~63: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start: Avoid creating a project on each deploy...
(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/_address.md
[uncategorized] ~6-~6: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...
(UNLIKELY_OPENING_PUNCTUATION)
doc/cluster/python.md
[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API The cratedb_toolkit.ManagedCluster class provides the higher level API/SD...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
[uncategorized] ~41-~41: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 markdownlint-cli2 (0.17.2)
doc/cluster/backlog.md
39-39: Bare URL used
null
(MD034, no-bare-urls)
45-45: Bare URL used
null
(MD034, no-bare-urls)
70-70: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/cli.py
[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests
[warning] 93-95: cratedb_toolkit/cluster/cli.py#L93-L95
Added lines #L93 - L95 were not covered by tests
[warning] 106-108: cratedb_toolkit/cluster/cli.py#L106-L108
Added lines #L106 - L108 were not covered by tests
[warning] 111-112: cratedb_toolkit/cluster/cli.py#L111-L112
Added lines #L111 - L112 were not covered by tests
[warning] 114-114: cratedb_toolkit/cluster/cli.py#L114
Added line #L114 was not covered by tests
[warning] 119-120: cratedb_toolkit/cluster/cli.py#L119-L120
Added lines #L119 - L120 were not covered by tests
[warning] 132-132: cratedb_toolkit/cluster/cli.py#L132
Added line #L132 was not covered by tests
[warning] 134-135: cratedb_toolkit/cluster/cli.py#L134-L135
Added lines #L134 - L135 were not covered by tests
[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests
[warning] 150-150: cratedb_toolkit/cluster/cli.py#L150
Added line #L150 was not covered by tests
[warning] 152-153: cratedb_toolkit/cluster/cli.py#L152-L153
Added lines #L152 - L153 were not covered by tests
[warning] 156-156: cratedb_toolkit/cluster/cli.py#L156
Added line #L156 was not covered by tests
[warning] 176-176: cratedb_toolkit/cluster/cli.py#L176
Added line #L176 was not covered by tests
[warning] 178-181: cratedb_toolkit/cluster/cli.py#L178-L181
Added lines #L178 - L181 were not covered by tests
[warning] 184-187: cratedb_toolkit/cluster/cli.py#L184-L187
Added lines #L184 - L187 were not covered by tests
[warning] 193-196: cratedb_toolkit/cluster/cli.py#L193-L196
Added lines #L193 - L196 were not covered by tests
[warning] 202-207: cratedb_toolkit/cluster/cli.py#L202-L207
Added lines #L202 - L207 were not covered by tests
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: build-and-publish
- GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
- GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
- GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
- GitHub Check: CFR for OS windows-latest
- GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
- GitHub Check: CFR for OS macos-13
🔇 Additional comments (30)
doc/cluster/python.md (6)
5-7: Great introduction to the SDK capabilities.The introduction clearly outlines the purpose and capabilities of the
ManagedClusterclass.🧰 Tools
🪛 LanguageTool
[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API Thecratedb_toolkit.ManagedClusterclass provides the higher level API/SD...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
14-19: Clear installation instructions with multiple options.The installation section effectively recommends the modern
uvpackage manager while providing alternatives.
23-35: Well-documented authentication options.The authentication section provides clear instructions for both interactive and headless authentication with CrateDB Cloud, including the necessary environment variables.
54-62: Excellent usage example with context manager.The example clearly demonstrates how to acquire a cluster handle from environment variables and run SQL queries within a context manager.
64-71: Good explanation of context manager behavior.The documentation clarifies the default behavior of not shutting down on exit and shows how to change this with
stop_on_exit=True.
77-80: Well-structured reference links.The reference links are properly formatted and provide useful resources for users.
cratedb_toolkit/testing/testcontainers/util.py (5)
48-54: Improved container port resolution with error handling.The enhanced
get_real_host_addressmethod now dynamically determines port attributes, raises a meaningful error when no port is found, and properly handles Docker protocol suffixes.
141-162: Well-implementedDockerSkippingContainerfor graceful test skipping.The new
DockerSkippingContainerclass provides a robust way to handle Docker unavailability by converting connection exceptions to pytest skips, improving test reliability in environments without Docker.However, note the FIXME to synchronize error detection logic with
PytestTestcontainerAdapter.You'll need to ensure that both classes use the same error detection patterns when updating one of them.
164-180: Good abstraction withPytestTestcontainerAdapter.This adapter provides a clean interface for test fixtures that use Docker containers, with proper error handling and a well-defined lifecycle.
181-189: Clear documentation for the abstractsetup()method.The docstring clearly explains the responsibilities of subclasses and the sequence of operations during initialization.
206-211: Simple and effective container lifecycle methods.The
start()andstop()methods provide a clean interface for subclasses to manage container lifecycle.doc/cluster/_address.md (1)
1-6: Good introduction to cluster addressing options.The document clearly explains the purpose of the addressing options for different cluster scenarios.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~6-~6: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options:--cluster-id, `--cluste...(UNLIKELY_OPENING_PUNCTUATION)
tests/cluster/test_cli.py (9)
9-15: Good test for CroudException handling.The test properly verifies that the error handler logs the failure and re-raises the exception.
17-21: Simple test for ClickException passthrough.Confirms that ClickException is passed through without any transformation.
23-29: Effective test for generic exception conversion.Verifies that generic exceptions are converted to SystemExit with a code of 1 and properly logged.
31-44: Comprehensive test forcluster infocommand.The test verifies successful execution and checks for expected output content when using a valid cloud environment.
46-59: Thorough verification of cluster health command.Tests for specific expected output fields like cluster name and status.
61-75: Complete test for ping command with flag verification.Checks for expected output including the cloud and database flags.
77-89: Good error case testing for unknown cluster names.Verifies that the command fails with the expected exit code and error message.
91-103: Effective test for unconfigured standalone cluster.Confirms the command fails with the expected exit code and error message when no cluster is configured.
105-118: Thorough test for unknown standalone cluster.Verifies both the logged error message and the command output for an unknown cluster name.
cratedb_toolkit/cluster/cli.py (2)
82-89: I see the context manager pattern is used as designedBased on the retrieved learning, I understand that
handle_command_errorsis intentionally designed as an error handler for specific operations, not as a general protective wrapper.The implementation correctly follows the design intent where the context manager provides consistent error handling for the specific operation of displaying cluster information.
159-187: Improve list-jobs implementation with YAML output supportThe implementation adds support for YAML output formatting, which is a useful extension of the CLI's capabilities.
The code properly handles the format option with clear choices and defaults, and outputs in the requested format. The conditional check for
cluster.operationis a good defensive programming practice.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 176-176: cratedb_toolkit/cluster/cli.py#L176
Added line #L176 was not covered by tests
[warning] 178-181: cratedb_toolkit/cluster/cli.py#L178-L181
Added lines #L178 - L181 were not covered by tests
[warning] 184-187: cratedb_toolkit/cluster/cli.py#L184-L187
Added lines #L184 - L187 were not covered by testscratedb_toolkit/cluster/croud.py (4)
79-81: Good implementation of URL validationThe
validatemethod properly checks for the presence of a cluster ID before allowing URL generation, which prevents potential errors from malformed URLs.This is a robust implementation that addresses validation concerns raised in previous reviews.
180-183: Command arguments passed safely - good practiceThe code uses proper argument formatting, passing the argument name and value separately.
arguments = ["--name", name]This is the correct approach to handle command-line arguments, especially when values might contain spaces or special characters. It addresses concerns raised in previous reviews about argument concatenation safety.
235-249: Robust subscription selection logicThe implementation intelligently selects a subscription when only one exists and provides helpful error messages when there are none or multiple subscriptions.
This is a well-designed approach that:
- Attempts to auto-select a subscription only if there's exactly one
- Provides clear, user-friendly error messages explaining why auto-selection failed
- Includes the list of available subscriptions when multiple exist, making it easy for users to choose
This improves usability while maintaining security by not making assumptions when choices are available.
274-297: Good implementation of command argumentsThe code now correctly passes command arguments as separate items rather than concatenated strings with values.
arguments=[ "--subscription-id", subscription_id, "--project-id", project_id, # ...other arguments... ]This pattern avoids shell injection issues and properly handles special characters in values, addressing concerns raised in previous reviews.
cratedb_toolkit/util/setting.py (3)
14-21: Good use of dataclasses for configuration settingsThe
Settingdataclass provides a clean and type-safe way to encapsulate click parameters with optional group information. Theasdict()method makes it easy to serialize settings.
23-28: LGTM: Simple and effective CLI argument detectionThis utility function provides a straightforward way to check if any long-form options were provided on the command line.
76-110:Details
✅ Verification successful
Improved mutual exclusivity check with clear error messages
The mutual exclusivity validation is well-implemented with helpful error messages. However, there's a potential issue with the group handling logic.
Currently, it collects all settings that have a non-None group attribute, but doesn't filter by specific group values. This might lead to unrelated settings being included in the same mutual exclusivity check if they each have different group values.
Please verify if the intention is to check mutual exclusivity within the same group value:
Here's a suggested fix to group settings by their group value:
def check_mutual_exclusiveness( specs: t.List[Setting], settings: t.Dict[str, str], message_none: str = None, message_multiple: str = None ): """ Check settings for mutual exclusiveness. It has been inspired by click-option-group's RequiredMutuallyExclusiveOptionGroup. https://github.com/click-contrib/click-option-group """ - parameter_names = [] - environment_variables = [] - values = [] - for setting in specs: - if setting.group is None: - continue - if setting.click.name is None: - raise ValueError("Setting specification has no name") - parameter_names.append(setting.click.opts[0]) - environment_variables.append(setting.click.envvar) - value = settings.get(setting.click.name) - values.append(value) - guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}" - if all(value is None for value in values): - if message_none is None: - message_none = f"One of the settings is required, but none of them have been specified. {guidance}" - raise ValueError(message_none) - if values.count(None) < len(values) - 1: - if message_multiple is None: - # Collect and format the specified settings. - specified = [f"{param}={val}" for param, val in zip(parameter_names, values) if val is not None] - message_multiple = ( - f"The settings are mutually exclusive, but multiple were specified: {', '.join(specified)}. {guidance}" - ) - raise ValueError(message_multiple) - return values + # Group settings by their group attribute + groups = {} + for setting in specs: + if setting.group is None: + continue + if setting.click.name is None: + raise ValueError("Setting specification has no name") + if setting.group not in groups: + groups[setting.group] = {'settings': [], 'names': [], 'envvars': [], 'values': []} + + group_data = groups[setting.group] + group_data['settings'].append(setting) + group_data['names'].append(setting.click.opts[0]) + group_data['envvars'].append(setting.click.envvar) + group_data['values'].append(settings.get(setting.click.name)) + + # Check each group for mutual exclusiveness + for group_name, group_data in groups.items(): + parameter_names = group_data['names'] + environment_variables = group_data['envvars'] + values = group_data['values'] + + guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}" + + if all(value is None for value in values): + if message_none is None: + message_none = f"One of the settings in group '{group_name}' is required, but none have been specified. {guidance}" + raise ValueError(message_none) + + if values.count(None) < len(values) - 1: + if message_multiple is None: + specified = [f"{param}={val}" for param, val in zip(parameter_names, values) if val is not None] + message_multiple = ( + f"The settings in group '{group_name}' are mutually exclusive, but multiple were specified: {', '.join(specified)}. {guidance}" + ) + raise ValueError(message_multiple) + + # Return all values from all groups + return [val for group_data in groups.values() for val in group_data['values']]
🏁 Script executed:
#!/bin/bash # Check if there are any other files in the codebase that use check_mutual_exclusiveness # and see how they group settings rg -A 3 "check_mutual_exclusiveness" --glob "!*/util/setting.py"Length of output: 491
I’d like to verify how
settings_specis structured and whatSetting.groupactually represents. Let’s pull the relevant definitions and usages:
🏁 Script executed:
#!/bin/bash set -e echo "1) occurrences of settings_spec in cluster/core.py:" rg "settings_spec" -n cratedb_toolkit/cluster/core.py echo echo "2) usage context around check_mutual_exclusiveness:" rg -C5 "check_mutual_exclusiveness" -n cratedb_toolkit/cluster/core.py echo echo "3) definition of Setting in util/setting.py:" rg -C5 "class Setting" -n cratedb_toolkit/util/setting.pyLength of output: 1589
I’d like to inspect how
settings_specis defined and whatgroupvalues are used for eachSetting:
🏁 Script executed:
#!/bin/bash set -e echo "Context around settings_spec definition in cluster/core.py:" rg -C20 "settings_spec =" -n cratedb_toolkit/cluster/core.pyLength of output: 1509
To see all the group values in
settings_specand confirm whether multiple distinct groups exist, let’s dump the full definition:
🏁 Script executed:
#!/bin/bash set -e echo "Full settings_spec (lines 50–110):" sed -n '50,110p' cratedb_toolkit/cluster/core.pyLength of output: 2121
No grouping issue detected; only one group in use
The
check_mutual_exclusivenesshelper only considersSettingentries with a non-Nonegroup, and inManagedClusterSettingsall such entries share the same"cluster-identifier"group. Ungrouped settings are ignored. As there aren’t multiple distinct group values, the current logic correctly enforces exclusivity only within the intended group and can remain as-is.
About
Wraps a few calls of
croudinto a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.Documentation
https://cratedb-toolkit--81.org.readthedocs.build/cluster/
https://cratedb-toolkit--81.org.readthedocs.build/util/shell.html
Configuration
Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (
.env).Python SDK
Deploy/start/resume cluster, run database workload, and suspend cluster again.
See also
examples/python/cloud_cluster.pyandexamples/python/cloud_import.py.CLI interface
Deploy cluster, import data, and query a few samples worth of data.
See also
examples/shell/cloud_cluster.shandexamples/shell/cloud_import.sh.References
Backlog
Trivia
Favorite poems
In the toolkit’s warren, new features abound,
With clusters managed, imports now sound.
CLI and shell, both smarter and spry,
Python and notebooks reach for the sky.
Docker skips gently if it’s not around,
While tests and examples in harmony are found.
Oh, what a hop for this code-loving bunny!
Data to the cloud—smooth, fast, and sunny!
A rabbit hopped to cloud and sky,
With toolkit new, it aimed to try:
"Start a cluster, import with flair,
Use Python, shell, or notebook there.
Settings checked, containers spun,
Tests and docs—oh what fun!
Cloud and code now leap as one! 🐇☁️
In the cloud and on the ground,
Rabbits hop where clusters are found.
With toolkit new and CLI bright,
They probe, deploy, and import with might.
From shell to Python, tests abound,
Data hops in—jobs resound!
🐇✨
🐇
A toolkit grows with clusters grand,
Cloud or local, at your command.
With settings checked and errors clear,
New guides and tests are gathered here.
Import, deploy, or query deep,
The rabbit’s promise: data you’ll keep!
Hopping forward, code refined,
Cloud and shell, now intertwined.