Skip to content

SDK and CLI for CrateDB Cloud Cluster APIs#81

Merged
amotl merged 51 commits intomainfrom
amo/cloud-second
May 6, 2025
Merged

SDK and CLI for CrateDB Cloud Cluster APIs#81
amotl merged 51 commits intomainfrom
amo/cloud-second

Conversation

@amotl
Copy link
Copy Markdown
Member

@amotl amotl commented Nov 14, 2023

About

Wraps a few calls of croud into a different kind of API, mostly about managing clusters. It is in the same spirit as, and is also building upon GH-73.

Documentation

https://cratedb-toolkit--81.org.readthedocs.build/cluster/
https://cratedb-toolkit--81.org.readthedocs.build/util/shell.html

Configuration

Configuration works by defining a few environment variables. For convenience, put them within a dotenv file (.env).

CRATEDB_CLUSTER_NAME=Hotzenplotz
CRATEDB_USERNAME='admin'
CRATEDB_PASSWORD='H3IgNXNvQBJM3CiElOiVHuSp6CjXMCiQYhB4I9dLccVHGvvvitPSYr1vTpt4'

Python SDK

Deploy/start/resume cluster, run database workload, and suspend cluster again.

from pprint import pprint
from cratedb_toolkit import ManagedCluster

# Acquire database cluster handle, obtaining cluster identifier
# or name from the user's environment.
cluster = ManagedCluster.from_env().start()

# Run database workload.
results = cluster.query("SELECT * from sys.summits LIMIT 2;")
pprint(results)

# Suspend cluster.
cluster.suspend()

See also examples/python/cloud_cluster.py and examples/python/cloud_import.py.

CLI interface

Deploy cluster, import data, and query a few samples worth of data.

ctk cluster start
ctk load table "https://cdn.crate.io/downloads/datasets/cratedb-datasets/machine-learning/timeseries/nab-machine-failure.csv"
ctk shell --command 'SELECT * FROM "nab-machine-failure" LIMIT 10;'

See also examples/shell/cloud_cluster.sh and examples/shell/cloud_import.sh.

References

Backlog

  • Software tests
  • Documentation

Trivia

Favorite poems

In the toolkit’s warren, new features abound,
With clusters managed, imports now sound.
CLI and shell, both smarter and spry,
Python and notebooks reach for the sky.
Docker skips gently if it’s not around,
While tests and examples in harmony are found.
Oh, what a hop for this code-loving bunny!
Data to the cloud—smooth, fast, and sunny!

A rabbit hopped to cloud and sky,
With toolkit new, it aimed to try:
"Start a cluster, import with flair,
Use Python, shell, or notebook there.
Settings checked, containers spun,
Tests and docs—oh what fun!
Cloud and code now leap as one! 🐇☁️

In the cloud and on the ground,
Rabbits hop where clusters are found.
With toolkit new and CLI bright,
They probe, deploy, and import with might.
From shell to Python, tests abound,
Data hops in—jobs resound!
🐇✨

🐇
A toolkit grows with clusters grand,
Cloud or local, at your command.
With settings checked and errors clear,
New guides and tests are gathered here.
Import, deploy, or query deep,
The rabbit’s promise: data you’ll keep!
Hopping forward, code refined,
Cloud and shell, now intertwined.

Comment thread cratedb_toolkit/cluster/croud.py Outdated
@codecov

This comment was marked as outdated.

@amotl amotl force-pushed the amo/cloud-second branch 3 times, most recently from 89f8c3a to f794fd9 Compare November 15, 2023 12:19
@amotl amotl marked this pull request as ready for review November 16, 2023 13:14
Comment thread cratedb_toolkit/cluster/croud.py Outdated
Comment on lines +146 to +154
# FIXME: Fix `croud clusters deploy`.
# It yields *two* payloads to stdout, making it
# unusable in JSON-capturing situations.
# The main advantage of the `JSONDecoder` class is that it also provides
# a `.raw_decode` method, which will ignore extra data after the end of the JSON.
# https://stackoverflow.com/a/75168292
payload = wr.invoke()
decoder = json.JSONDecoder()
data = decoder.raw_decode(payload)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

This works around a minor flaw of croud, which yields a JSON output two times, rendering the output not parseable.

Suggestion

croud should be changed to only output a single JSON payload to stdout when invoking croud clusters deploy.

Copy link
Copy Markdown
Member Author

@amotl amotl Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We submitted an upstream fix about this obstacle.

Comment thread cratedb_toolkit/io/croud.py Outdated
Comment on lines +66 to +86
def fix_job_info_table_name(self):
"""
Adjust full-qualified table name by adding appropriate quotes.
Fixes a minor flaw on the upstream API.

Currently, the API returns `testdrive.pems-1`, but that can not be used at
all, because it is not properly quoted. It also can not be used 1:1, because
it is not properly quoted.

So, converge the table name into `"testdrive"."pems-1"` manually, for a
full-qualified representation.

FIXME: Remove after upstream has fixed the flaw.
"""
job_info = self.info
if "destination" in job_info and "table" in job_info["destination"]:
table = job_info["destination"]["table"]
if '"' not in table and "." in table:
schema, table = table.split(".")
table = f'"{schema}"."{table}"'
job_info["destination"]["table"] = table
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

This works around another minor upstream flaw.

Suggestion

The API should either return schema and table names within separate attributes (preferred), but also quote the value of the existing destination.table attribute so that it can be re-used without further ado.

Example

{
  "destination": {
    "table": "\"testdrive\".\"pems-1\""
  }
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We created an upstream ticket about this observation.

This comment was marked as off-topic.

Comment thread cratedb_toolkit/config.py Outdated
Comment on lines +67 to +97
def configure_croud(no_spinner: bool = None, use_spinner: bool = None):
"""
Turn off croud's Halo spinner when running in Jupyter Notebooks. It does not work well.

- https://github.com/ManrajGrover/halo/issues/32
- https://github.com/manrajgrover/halo/issues/179
"""
if no_spinner or ((CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST) and not use_spinner):
mod = types.ModuleType(
"croud.tools.spinner", "Mocking the croud.tools.spinner module, to turn off the Halo spinner"
)
setattr(mod, "HALO", NoopContextManager()) # noqa: B010
sys.modules["croud.tools.spinner"] = mod


class NoopContextManager:
"""
For making the Halo progressbar a no-op.
"""

def __init__(self, *args, **kwargs):
pass

def __enter__(self):
pass

def __exit__(self, exc_type, exc_value, exc_traceback):
pass

def stop(self):
pass
Copy link
Copy Markdown
Member Author

@amotl amotl Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

We had to turn off the Halo spinner currently used in croud, because it did not work well within a Jupyter Notebook environment. We are exactly observing those issues, despite the former being officially resolved. Apparently, it came back.

Suggestion

Submit a patch to croud to only use interactivity when is_tty() is True, or such. At least, don't start the HALO at module-scope level, but initialize/configure it at runtime instead.

Copy link
Copy Markdown
Member Author

@amotl amotl Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've reported this to the issue tracker of the upstream project.

@amotl amotl force-pushed the amo/cloud-second branch 3 times, most recently from 062ee9b to 7d89748 Compare November 18, 2023 01:56
Comment on lines +18 to +19
# Log in to CrateDB Cloud.
croud login --idp azuread
Copy link
Copy Markdown
Member Author

@amotl amotl Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation

There is an alternative, headless way of doing that: Using croud config show, you can display the location of the croud configuration file in YAML format.

current-profile: cratedb.cloud
default-format: table
profiles:
  cratedb.cloud:
    auth-token: xxxxxxxxxx
    endpoint: https://console.cratedb.cloud
    key: REDACTED
    organization-id: null
    region: _any_
    secret: xxxxxxxxxx

If you fill in the key and secret values, obtained by running croud api-keys create, to create an API key 1, operations like croud clusters list will start working without further ado, even after logging out again using croud logout.

It works well. Thanks, @proddata.

Suggestion

Based on those insights, improve the SDK correspondingly, by also accepting environment variables CRATEDB_CLOUD_API_KEY and CRATEDB_CLOUD_API_SECRET.

Footnotes

  1. Alternatively, you can obtain an API key on your account page, at the "API keys" section. -- https://console.cratedb.cloud/account/settings

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've implemented headless/unattended operations in croud and ctk. Thanks again.

@property
def headless_config(self) -> Configuration:
tmp_file = NamedTemporaryFile()
tmp_path = Path(tmp_file.name)
config = Configuration("headless.yaml", tmp_path)
config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")
config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")
config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")
return config

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the interface could be further improved by using the new JWT authentication mechanism instead of traditional credentials?

Copy link
Copy Markdown
Member Author

@amotl amotl Apr 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the topic about using JWT to the backlog, to be processed on a later iteration.

Copy link
Copy Markdown
Member Author

@amotl amotl May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fluent authentication using provided per-cluster JWT token has been implemented per 72d3fd9.

Comment thread examples/python/cloud_cluster.py
Comment thread tests/cluster/test_examples_python.py Outdated
amotl added 17 commits May 6, 2025 21:06
- `CRATEDB_CLOUD_CLUSTER_ID` to `CRATEDB_CLUSTER_ID`
- `CRATEDB_CLOUD_CLUSTER_NAME` to `CRATEDB_CLUSTER_NAME`
The current set of input parameters for addressing a database cluster,
`cluster_id`, `cluster_name`, `sqlalchemy_url`, and `http_url`, needed
a few bits of bundling, so they are now backed by a real model class.

Likewise, the `ManagedCluster` and `StandaloneCluster` classes now
provide the `adapter` property, which will return a suitable
`DatabaseAdapter` instance. This significantly connects the dots for a
better composite object model.

In this spirit, the `DatabaseCluster` class emerges as a new entry point
to acquire a database cluster handle universally.
... for addressing a database cluster by URL. The new option accepts
URLs in both SQLAlchemy HTTP formats.
... for addressing a database cluster by URL. The new option accepts
URLs in both SQLAlchemy HTTP formats.
@amotl amotl force-pushed the amo/cloud-second branch from b77e818 to b66448e Compare May 6, 2025 19:08
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🔭 Outside diff range comments (1)
cratedb_toolkit/cluster/croud.py (1)

436-467: ⚠️ Potential issue

Security risk in JWT token retrieval

The get_jwt_token method uses a hardcoded fallback password for the keyring, which presents a security risk.

Remove the hardcoded fallback and either require the environment variable or generate a unique key:

-kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD") or "TruaframEkEk"
+kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD")
+if not kr.keyring_key:
+    # Generate a unique key based on machine-specific information
+    import uuid
+    import getpass
+    user_id = getpass.getuser()
+    machine_id = str(uuid.getnode())  # MAC address as integer
+    kr.keyring_key = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"ctk-{user_id}-{machine_id}"))
+    logger.debug("Generated keyring key from machine-specific information")

This approach generates a machine-specific key that is still deterministic (same on each run on the same machine) but not hardcoded in the source code.

♻️ Duplicate comments (12)
cratedb_toolkit/cfr/marimo.py (1)

53-54: Environment variable and variable name standardized correctly

The change from CRATEDB_SQLALCHEMY_URL to CRATEDB_CLUSTER_URL aligns with the PR objective to unify cluster addressing across the codebase. The local variable sqlalchemy_url is appropriately named as it represents a SQLAlchemy connection URL.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests

cratedb_toolkit/__init__.py (1)

14-16: Automatic side effects during import should be avoided

The call to preconfigure() during module import can lead to side effects that may break tests or cause unexpected behavior in interactive sessions. This was previously flagged in an earlier review.

Consider making initialization explicit or providing a way to disable this behavior:

  1. Expose an explicit initialize() function that users must call
  2. Add an environment variable to control whether auto-configuration happens
  3. Move the preconfigure() call to a separate function that users can call when needed
-from .config import preconfigure
+# Import but don't call directly
+from .config import preconfigure

-preconfigure()
+# Allow explicit initialization instead
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests


[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by tests

doc/cluster/python.md (4)

1-3: 🛠️ Refactor suggestion

Fix Markdown heading anchor syntax.

The (cluster-api-python)= reference definition does not follow standard CommonMark or MyST conventions. Combine the anchor with the heading using MyST's heading ID syntax.

-(cluster-api-python)=

-# CrateDB Cluster Python API
+# CrateDB Cluster Python API {#cluster-api-python}

42-43: 🛠️ Refactor suggestion

Improve environment variables listing format.

The current definition-list style (:Environment variables:) may not render as expected in MyST. Use a standard Markdown list format instead.

-:Environment variables:
-  `CRATEDB_CLUSTER_ID`, `CRATEDB_CLUSTER_NAME`, `CRATEDB_CLUSTER_URL`
+**Environment variables:**
+- `CRATEDB_CLUSTER_ID`
+- `CRATEDB_CLUSTER_NAME`
+- `CRATEDB_CLUSTER_URL`

45-50: 🛠️ Refactor suggestion

Use standard MyST admonition syntax for note blocks.

The :::{note} block may not render correctly. Use standard MyST admonition syntax.

-:::{note}
-- All address options are mutually exclusive.
-- The cluster identifier takes precedence over the cluster name.
-- The cluster url takes precedence over the cluster id and name.
-- Environment variables can be stored into an `.env` file in your working directory.
-:::
+```{note}
+- All address options are mutually exclusive.
+- The cluster identifier takes precedence over the cluster name.
+- The cluster url takes precedence over the cluster id and name.
+- Environment variables can be stored into an `.env` file in your working directory.
+```

73-75: 🛠️ Refactor suggestion

Use standard MyST admonition syntax for seealso blocks.

Similar to the note block above, the :::{seealso} block should use standard MyST syntax.

-:::{seealso}
-{ref}`cluster-api-tutorial` includes a full end-to-end tutorial.
-:::
+```{seealso}
+{ref}`cluster-api-tutorial` includes a full end-to-end tutorial.
+```
doc/cluster/backlog.md (1)

33-39: 🛠️ Refactor suggestion

Complete the resolution action for concurrent resume operations.

The explanation of handling concurrent cluster resume operations ends abruptly after evaluating the status without specifying what action should be taken next.

  => Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`.

  See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384.
+  => Implement a retry mechanism with exponential backoff when the cluster is already in a resume operation, or display a clear error message to the user about the need to wait.
cratedb_toolkit/cluster/cli.py (2)

145-151: Same null reference risk in stop command

Similar to previous comments, the stop command also accesses cluster.info.asdict() without checking if cluster.info is None.

Apply this fix to prevent a potential AttributeError, consistent with the recommendation for the start command:

 with handle_command_errors("stop cluster"):
     # Acquire the database cluster handle and submit the `suspend` command.
     cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).probe().stop()
     logger.info(f"Successfully suspended cluster: {cluster}")

     # Display cluster information.
-    jd(cluster.info.asdict())
+    if cluster.info:
+        jd(cluster.info.asdict())
+    else:
+        logger.warning("No cluster information available")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 145-145: cratedb_toolkit/cluster/cli.py#L145
Added line #L145 was not covered by tests


[warning] 147-148: cratedb_toolkit/cluster/cli.py#L147-L148
Added lines #L147 - L148 were not covered by tests


[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests


127-134: ⚠️ Potential issue

Null reference risk in start command

The start command accesses cluster.info.asdict() without checking if cluster.info is None, similar to the issue flagged in past reviews for the stop command.

Apply this fix to prevent a potential AttributeError:

 with handle_command_errors("start cluster"):
     # Acquire the database cluster handle and submit the `start` command.
     cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).start()
     logger.info(f"Successfully acquired cluster: {cluster}")

     # Display cluster information.
-    jd(cluster.info.asdict())
+    if cluster.info:
+        jd(cluster.info.asdict())
+    else:
+        logger.warning("No cluster information available")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 127-127: cratedb_toolkit/cluster/cli.py#L127
Added line #L127 was not covered by tests


[warning] 129-130: cratedb_toolkit/cluster/cli.py#L129-L130
Added lines #L129 - L130 were not covered by tests


[warning] 133-133: cratedb_toolkit/cluster/cli.py#L133
Added line #L133 was not covered by tests

cratedb_toolkit/cluster/core.py (3)

377-380: ⚠️ Potential issue

Fix type error in DataImportGuide initialization

The DataImportGuide expects a non-optional ClusterInformation, but self.info is Optional[ClusterInformation].

Add a null check before passing to DataImportGuide:

-            texts = DataImportGuide(
-                cluster_info=self.info,
-                job=cloud_job,
-            )
+            if not self.info:
+                logger.warning("Cluster information not available for detailed guidance")
+                # Create a default ClusterInformation or use another approach
+                cluster_info = ClusterInformation()
+            else:
+                cluster_info = self.info
+                
+            texts = DataImportGuide(
+                cluster_info=cluster_info,
+                job=cloud_job,
+            )

251-281: 🛠️ Refactor suggestion

Add timeout handling for asynchronous operations in acquire()

The deploy() and resume() operations are asynchronous and can take minutes to complete, yet acquire() returns immediately after invoking them and calling probe() once.

Implement a timeout mechanism to poll until the cluster is ready or a timeout is reached:

    def acquire(self) -> "ManagedCluster":
        """
        Acquire a database cluster.
        This means going through the steps of deploy and/or start, as applicable.

        - When cluster does not exist, create/deploy it.
        - When a cluster exists, but is stopped/hibernated, start/resume it.
        """
        self.probe()
        if self.exists:
            suspended = self.info.cloud.get("suspended")
            if suspended is True:
                logger.info(f"Cluster is suspended, resuming it: id={self.cluster_id}, name={self.cluster_name}")
                self.resume()
+               # Poll with timeout until the cluster is ready
+               start_time = time.time()
+               timeout = 300  # 5 minutes
+               poll_interval = 5  # seconds
+               while True:
+                   self.probe()
+                   if not self.info.cloud.get("suspended"):
+                       logger.info(f"Cluster successfully resumed: id={self.cluster_id}, name={self.cluster_name}")
+                       break
+                   if time.time() - start_time > timeout:
+                       raise OperationFailed(f"Timeout waiting for cluster to resume: id={self.cluster_id}, name={self.cluster_name}")
+                   logger.debug(f"Waiting for cluster to resume: id={self.cluster_id}, name={self.cluster_name}")
+                   time.sleep(poll_interval)
                logger.info(f"Cluster resumed: id={self.cluster_id}, name={self.cluster_name}")
-               self.probe()
            elif suspended is False:
                logger.info(f"Cluster is running: id={self.cluster_id}, name={self.cluster_name}")
            else:
                raise CroudException(f"Cluster in unknown state: {self.cluster_name}")
        else:
            logger.info(f"Cluster does not exist, deploying it: id={self.cluster_id}, name={self.cluster_name}")
            self.deploy()
+           # Poll with timeout until the cluster exists and is ready
+           start_time = time.time()
+           timeout = 600  # 10 minutes
+           poll_interval = 10  # seconds
+           while True:
+               self.probe()
+               if self.exists and self.info.ready:
+                   logger.info(f"Cluster successfully deployed: id={self.cluster_id}, name={self.cluster_name}")
+                   break
+               if time.time() - start_time > timeout:
+                   raise OperationFailed(f"Timeout waiting for cluster to deploy: id={self.cluster_id}, name={self.cluster_name}")
+               logger.debug(f"Waiting for cluster to deploy: id={self.cluster_id}, name={self.cluster_name}")
+               time.sleep(poll_interval)
            logger.info(f"Cluster deployed: id={self.cluster_id}, name={self.cluster_name}")
-           self.probe()
            if not self.exists:
                # TODO: Gather and propagate more information why the deployment failed.
                #       Capture deployment logs or status if available.
                #       Possibly use details from `last_async_operation`.
                raise CroudException(f"Deployment of cluster failed: {self.cluster_name}")
        return self

319-324: ⚠️ Potential issue

Add validation for empty operation in resume method

The resume method calls self.operation.resume() without checking if self.operation is None, which could lead to an AttributeError.

Add a validation check similar to what's in the stop method:

        if self.cluster_id is None:
            raise DatabaseAddressMissingError("Need cluster identifier to resume cluster")
        logger.info(f"Resuming CrateDB Cloud Cluster: id={self.cluster_id}, name={self.cluster_name}")
-        if self.operation:
-            self.operation.resume()
+        if not self.operation:
+            # If the operation is None, try to probe the cluster first to initialize it
+            logger.debug("Operation not initialized, probing cluster")
+            self.probe()
+            if not self.operation:
+                raise CroudException(f"Could not initialize operation for cluster: {self.cluster_id}")
+        
+        self.operation.resume()
        self.probe()
        return self
🧹 Nitpick comments (10)
cratedb_toolkit/util/setting.py (2)

40-69: Consider adding exception handling for edge cases in obtain_settings

The function handles the Click exit code exceptions but may not handle other potential exceptions like file I/O errors from init_dotenv(). Also, it returns an empty dictionary when there's an exit code of 0, which might be unexpected in some cases.

def obtain_settings(specs: t.List[Setting], prog_name: str = None) -> t.Dict[str, str]:
    """
    Employ command-line parsing at runtime, using the `click` parser.

    Obtain configuration setting from different sources, DWIM-style.
    This is the generic workhorse utility variant.

    - Command line argument, in long form. Example: `--foobar=bazqux`.
    - Positional argument on command line. Example: `bazqux`.
    - Environment variable. Example: `export FOOBAR=bazqux`.
    - Environment variable prefix. Example: `export APPNAME_FOOBAR=bazqux`.
    """
    # Load environment variables from `.env` file.
-   init_dotenv()
+   try:
+       init_dotenv()
+   except Exception as ex:
+       logger.warning(f"Failed to load environment variables from .env file: {ex}")

    # Decode settings from command-line arguments or environment variables.
    prog_name = prog_name or sys.argv[0]
    click_specs = [spec.click for spec in specs]
    command = click.Command(prog_name, params=click_specs)
    if CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST:
        args = []
    else:
        args = sys.argv[1:]
    try:
        with command.make_context(prog_name, args=args) as ctx:
            return ctx.params
    except click.exceptions.Exit as ex:
        if ex.exit_code != 0:
            raise
+       logger.debug("Click parser exited with code 0")
    return {}

72-104: Enhance error messages in mutual exclusiveness check

The function correctly validates that exactly one setting among a mutually exclusive group is specified. However, the error messages could be more specific about which values were actually provided.

Consider enhancing the error messages to include the specific values that were provided for the mutually exclusive settings to make debugging easier:

def check_mutual_exclusiveness(
    specs: t.List[Setting], settings: t.Dict[str, str], message_none: str = None, message_multiple: str = None
):
    """
    Check settings for mutual exclusiveness.

    It has been inspired by click-option-group's RequiredMutuallyExclusiveOptionGroup.
    https://github.com/click-contrib/click-option-group
    """
    parameter_names = []
    environment_variables = []
    values = []
    for setting in specs:
        if setting.group is None:
            continue
        if setting.click.name is None:
            raise ValueError("Setting specification has no name")
        parameter_names.append(setting.click.opts[0])
        environment_variables.append(setting.click.envvar)
        value = settings.get(setting.click.name)
        values.append(value)
    guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}"
    if all(value is None for value in values):
        if message_none is None:
            message_none = f"One of the settings is required, but none of them have been specified. {guidance}"
        raise ValueError(message_none)
    if values.count(None) < len(values) - 1:
+       # Collect and format the specified settings
+       specified = [f"{param}={val}" for param, val in zip(parameter_names, values) if val is not None]
        if message_multiple is None:
            message_multiple = (
-               f"The settings are mutually exclusive, but multiple of them have been specified. {guidance}"
+               f"The settings are mutually exclusive, but multiple were specified: {', '.join(specified)}. {guidance}"
            )
        raise ValueError(message_multiple)
    return values
doc/cluster/_address.md (1)

1-17: Clear documentation of the cluster addressing options

The documentation clearly explains the various options for specifying cluster addresses and their precedence rules. This will be very helpful for users of the toolkit.

One minor suggestion for improved formatting consistency - consider adding a blank line between the rubric and the first paragraph to match the spacing used elsewhere in the document.

:::{rubric} Cluster address
:::
+
The program provides various options for addressing the database cluster
in different situations, managed or not.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/python.md (1)

5-7: Add missing article "the" before "higher level API/SDK".

For proper grammar, add the missing article.

-The `cratedb_toolkit.ManagedCluster` class provides higher level API/SDK
+The `cratedb_toolkit.ManagedCluster` class provides the higher level API/SDK
 entrypoints to start/deploy/resume a database cluster, inquire information
 about it, and stop/suspend it again.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API The cratedb_toolkit.ManagedCluster class provides higher level API/SDK en...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

doc/cluster/backlog.md (1)

98-98: Add missing preposition in token expiry compensation.

For better grammar, add the preposition "for" after "compensate".

-Managed: Use `keyring` for caching the JWT token, and compensate token expiry
+Managed: Use `keyring` for caching the JWT token, and compensate for token expiry
🧰 Tools
🪛 LanguageTool

[uncategorized] ~98-~98: Possible missing preposition found.
Context: ...r caching the JWT token, and compensate token expiry

(AI_EN_LECTOR_MISSING_PREPOSITION)

cratedb_toolkit/testing/testcontainers/util.py (2)

141-162: Robust Docker daemon detection with graceful test skipping

This new DockerSkippingContainer class elegantly handles scenarios where Docker is unavailable by intercepting connection exceptions and converting them to pytest skips, preventing test failures when Docker isn't running. The early _container attribute initialization also prevents destructor issues.

However, I notice a TODO about synchronizing with PytestTestcontainerAdapter. Consider creating a shared helper function for Docker error detection to maintain consistency between the two implementations:

+def is_docker_unavailable(ex: DockerException) -> bool:
+    """Check if exception indicates Docker daemon is unavailable."""
+    return any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version"))

 # In DockerSkippingContainer
 except DockerException as ex:
-    if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")):
+    if is_docker_unavailable(ex):
         raise pytest.skip(reason="Skipping test because Docker is not running", allow_module_level=True) from ex

 # In PytestTestcontainerAdapter 
 except DockerException as ex:
-    if any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version")):
+    if is_docker_unavailable(ex):
         raise pytest.skip(reason="Skipping test because Docker daemon is not available", allow_module_level=True) from ex

164-207: Well-designed Pytest adapter for testcontainers

The PytestTestcontainerAdapter provides a clean abstraction for handling testcontainers within Pytest fixtures. It correctly manages container lifecycle and gracefully handles Docker unavailability.

I noticed that setup() is made abstract but stop() has a default implementation. Consider documenting this asymmetry to guide implementers:

 def setup(self):
+    """
+    Override this method to initialize self.container with a DockerContainer instance.
+    This method should create the container but NOT start it, as start() will be called
+    automatically after setup() completes successfully.
+    """
     raise NotImplementedError("Must be implemented by child class")
cratedb_toolkit/util/croud.py (2)

213-241: Good custom CroudClient implementation with User-Agent header

The CroudClient class extends the base croud API client with a custom User-Agent header, which is a good practice for identifying the toolkit in API requests.

However, I noticed that the implementation of create() doesn't add any error handling for cases where environment variables might be missing. Consider adding validation to check for required credentials:

@staticmethod
def create() -> "croud.api.Client":
    """
    Canonical factory method for creating a `croud.api.Client` instance.
    """
    with headless_config():
        from croud.config import CONFIG
        
+       # Validate that either token or key/secret are available
+       has_token = CONFIG.token is not None
+       has_key_secret = CONFIG.key is not None and CONFIG.secret is not None
+       if not has_token and not has_key_secret:
+           logger.warning("Neither token nor API key/secret available. Authentication may fail.")

        return croud.api.Client(
            CONFIG.endpoint,
            token=CONFIG.token,
            on_token=CONFIG.set_current_auth_token,
            key=CONFIG.key,
            secret=CONFIG.secret,
            region=CONFIG.region,
            sudo=False,
        )

242-258: Headless config implementation could be more robust

The implementation of get_headless_config falls back to using the regular config file if it exists and no API key is set in the environment. This is good for flexibility, but it doesn't validate that the headless config has the necessary credentials.

Consider adding validation to ensure the credentials are available and log a warning if they're not:

@classproperty
def get_headless_config(cls) -> Configuration:
    cfg = Configuration("croud.yaml")
    if cfg._file_path.exists() and "CRATEDB_CLOUD_API_KEY" not in os.environ:
        return cfg

    tmp_file = NamedTemporaryFile()
    tmp_path = Path(tmp_file.name)
    config = Configuration("headless.yaml", tmp_path)

    # Get credentials from the environment.
    config.profile["key"] = os.environ.get("CRATEDB_CLOUD_API_KEY")
    config.profile["secret"] = os.environ.get("CRATEDB_CLOUD_API_SECRET")
    config.profile["organization-id"] = os.environ.get("CRATEDB_CLOUD_ORGANIZATION_ID")
    # config.profile["endpoint"] = os.environ.get("CRATEDB_CLOUD_ENDPOINT")  # noqa: ERA001

+   if not config.profile["key"] or not config.profile["secret"]:
+       logger.warning(
+           "Missing required cloud credentials: CRATEDB_CLOUD_API_KEY and/or "
+           "CRATEDB_CLOUD_API_SECRET. Authentication may fail."
+       )
+   
    return config
cratedb_toolkit/cluster/croud.py (1)

211-293: Robust deployment logic with good environment variable handling

The deploy_cluster method has excellent handling of subscription selection and proper validation of required parameters. The environment variable usage is well-documented and the error messages are clear and helpful.

Consider replacing the TODO comment about documenting environment variables with a reference to actual documentation:

-# TODO: Add documentation about those environment variables.
+# Environment variables are documented in the project README or docs/configuration.md.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b77e818 and b66448e.

📒 Files selected for processing (107)
  • .github/workflows/main.yml (1 hunks)
  • .github/workflows/mongodb.yml (1 hunks)
  • CHANGES.md (2 hunks)
  • cratedb_toolkit/__init__.py (2 hunks)
  • cratedb_toolkit/adapter/rockset/cli.py (2 hunks)
  • cratedb_toolkit/adapter/rockset/server/dependencies.py (1 hunks)
  • cratedb_toolkit/api/cli.py (0 hunks)
  • cratedb_toolkit/api/guide.py (0 hunks)
  • cratedb_toolkit/api/main.py (0 hunks)
  • cratedb_toolkit/cfr/cli.py (9 hunks)
  • cratedb_toolkit/cfr/marimo.py (1 hunks)
  • cratedb_toolkit/cli.py (2 hunks)
  • cratedb_toolkit/cluster/cli.py (2 hunks)
  • cratedb_toolkit/cluster/core.py (1 hunks)
  • cratedb_toolkit/cluster/croud.py (1 hunks)
  • cratedb_toolkit/cluster/guide.py (1 hunks)
  • cratedb_toolkit/cluster/model.py (1 hunks)
  • cratedb_toolkit/cluster/util.py (0 hunks)
  • cratedb_toolkit/cmd/tail/cli.py (3 hunks)
  • cratedb_toolkit/config.py (1 hunks)
  • cratedb_toolkit/exception.py (2 hunks)
  • cratedb_toolkit/info/cli.py (6 hunks)
  • cratedb_toolkit/info/core.py (2 hunks)
  • cratedb_toolkit/info/http.py (1 hunks)
  • cratedb_toolkit/io/cli.py (4 hunks)
  • cratedb_toolkit/io/croud.py (4 hunks)
  • cratedb_toolkit/io/dynamodb/api.py (1 hunks)
  • cratedb_toolkit/io/influxdb.py (1 hunks)
  • cratedb_toolkit/io/mongodb/adapter.py (3 hunks)
  • cratedb_toolkit/io/mongodb/api.py (4 hunks)
  • cratedb_toolkit/io/processor/kinesis_lambda.py (2 hunks)
  • cratedb_toolkit/job/cli.py (0 hunks)
  • cratedb_toolkit/job/croud.py (0 hunks)
  • cratedb_toolkit/model.py (4 hunks)
  • cratedb_toolkit/option.py (1 hunks)
  • cratedb_toolkit/options.py (0 hunks)
  • cratedb_toolkit/settings/compare.py (5 hunks)
  • cratedb_toolkit/shell/cli.py (3 hunks)
  • cratedb_toolkit/testing/testcontainers/cratedb.py (5 hunks)
  • cratedb_toolkit/testing/testcontainers/influxdb2.py (2 hunks)
  • cratedb_toolkit/testing/testcontainers/minio.py (2 hunks)
  • cratedb_toolkit/testing/testcontainers/mongodb.py (2 hunks)
  • cratedb_toolkit/testing/testcontainers/util.py (4 hunks)
  • cratedb_toolkit/util/app.py (1 hunks)
  • cratedb_toolkit/util/client.py (1 hunks)
  • cratedb_toolkit/util/common.py (2 hunks)
  • cratedb_toolkit/util/crash.py (2 hunks)
  • cratedb_toolkit/util/croud.py (8 hunks)
  • cratedb_toolkit/util/database.py (6 hunks)
  • cratedb_toolkit/util/runtime.py (1 hunks)
  • cratedb_toolkit/util/setting.py (1 hunks)
  • doc/adapter/rockset.md (1 hunks)
  • doc/backlog/main.md (2 hunks)
  • doc/bugs.md (1 hunks)
  • doc/cfr/jobstats.md (1 hunks)
  • doc/cfr/systable.md (3 hunks)
  • doc/cluster/_address.md (1 hunks)
  • doc/cluster/backlog.md (1 hunks)
  • doc/cluster/cli.md (1 hunks)
  • doc/cluster/index.md (1 hunks)
  • doc/cluster/python.md (1 hunks)
  • doc/cluster/tutorial.md (1 hunks)
  • doc/conf.py (1 hunks)
  • doc/index.md (1 hunks)
  • doc/info/index.md (1 hunks)
  • doc/io/dms/standalone.md (1 hunks)
  • doc/io/dynamodb/cdc-lambda.md (4 hunks)
  • doc/io/dynamodb/cdc.md (2 hunks)
  • doc/io/dynamodb/loader.md (2 hunks)
  • doc/io/index.md (2 hunks)
  • doc/io/influxdb/loader.md (3 hunks)
  • doc/io/mongodb/cdc.md (3 hunks)
  • doc/io/mongodb/loader.md (4 hunks)
  • doc/io/mongodb/migr8.md (1 hunks)
  • doc/sandbox.md (2 hunks)
  • doc/util/index.md (1 hunks)
  • doc/util/settings.md (1 hunks)
  • doc/util/shell.md (1 hunks)
  • examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py (1 hunks)
  • examples/cloud_import.py (0 hunks)
  • examples/notebook/cloud_import.ipynb (1 hunks)
  • examples/python/cloud_cluster.py (1 hunks)
  • examples/python/cloud_import.py (1 hunks)
  • examples/shell/cloud_cluster.sh (1 hunks)
  • examples/shell/cloud_import.sh (1 hunks)
  • pyproject.toml (8 hunks)
  • tests/adapter/test_rockset.py (3 hunks)
  • tests/api/test_util.py (1 hunks)
  • tests/cfr/test_info.py (1 hunks)
  • tests/cfr/test_jobstats.py (3 hunks)
  • tests/cfr/test_systable.py (6 hunks)
  • tests/cluster/test_cli.py (1 hunks)
  • tests/cluster/test_core.py (1 hunks)
  • tests/cluster/test_guide.py (1 hunks)
  • tests/cluster/test_model.py (1 hunks)
  • tests/cmd/test_tail.py (1 hunks)
  • tests/conftest.py (3 hunks)
  • tests/examples/test_python.py (1 hunks)
  • tests/examples/test_shell.py (1 hunks)
  • tests/examples/test_shell.sh (1 hunks)
  • tests/info/test_cli.py (3 hunks)
  • tests/info/test_http.py (1 hunks)
  • tests/io/dynamodb/__init__.py (0 hunks)
  • tests/io/dynamodb/conftest.py (1 hunks)
  • tests/io/dynamodb/test_cli.py (1 hunks)
  • tests/io/influxdb/conftest.py (4 hunks)
  • tests/io/influxdb/test_cli.py (3 hunks)
⛔ Files not processed due to max files limit (12)
  • tests/io/mongodb/conftest.py
  • tests/io/mongodb/test_cli.py
  • tests/io/test_import.py
  • tests/io/test_processor.py
  • tests/retention/test_cli.py
  • tests/retention/test_examples.py
  • tests/settings/test_cli.py
  • tests/shell/test_cli.py
  • tests/test_model.py
  • tests/test_shell.py
  • tests/util/database.py
  • tests/util/shunit2
💤 Files with no reviewable changes (9)
  • tests/io/dynamodb/init.py
  • cratedb_toolkit/api/cli.py
  • cratedb_toolkit/job/cli.py
  • cratedb_toolkit/cluster/util.py
  • cratedb_toolkit/api/guide.py
  • examples/cloud_import.py
  • cratedb_toolkit/options.py
  • cratedb_toolkit/job/croud.py
  • cratedb_toolkit/api/main.py
✅ Files skipped from review due to trivial changes (7)
  • cratedb_toolkit/io/dynamodb/api.py
  • doc/io/dynamodb/cdc.md
  • tests/cfr/test_info.py
  • tests/io/dynamodb/conftest.py
  • examples/shell/cloud_import.sh
  • cratedb_toolkit/option.py
  • doc/util/settings.md
🚧 Files skipped from review as they are similar to previous changes (74)
  • cratedb_toolkit/io/processor/kinesis_lambda.py
  • doc/io/mongodb/cdc.md
  • doc/cfr/jobstats.md
  • cratedb_toolkit/io/influxdb.py
  • doc/conf.py
  • doc/util/index.md
  • doc/adapter/rockset.md
  • doc/io/mongodb/migr8.md
  • doc/io/dynamodb/loader.md
  • examples/aws/dynamodb_kinesis_lambda_oci_cratedb.py
  • tests/cluster/test_model.py
  • examples/shell/cloud_cluster.sh
  • tests/cfr/test_jobstats.py
  • doc/io/dms/standalone.md
  • cratedb_toolkit/info/http.py
  • .github/workflows/mongodb.yml
  • tests/info/test_http.py
  • tests/cfr/test_systable.py
  • doc/cluster/cli.md
  • tests/adapter/test_rockset.py
  • doc/io/influxdb/loader.md
  • examples/python/cloud_cluster.py
  • doc/bugs.md
  • cratedb_toolkit/info/cli.py
  • tests/io/dynamodb/test_cli.py
  • tests/examples/test_python.py
  • doc/io/mongodb/loader.md
  • tests/io/influxdb/conftest.py
  • doc/cfr/systable.md
  • tests/io/influxdb/test_cli.py
  • doc/info/index.md
  • tests/cmd/test_tail.py
  • cratedb_toolkit/testing/testcontainers/influxdb2.py
  • cratedb_toolkit/cli.py
  • doc/cluster/tutorial.md
  • doc/index.md
  • tests/examples/test_shell.py
  • cratedb_toolkit/testing/testcontainers/minio.py
  • cratedb_toolkit/cmd/tail/cli.py
  • cratedb_toolkit/config.py
  • cratedb_toolkit/util/client.py
  • examples/notebook/cloud_import.ipynb
  • doc/cluster/index.md
  • cratedb_toolkit/util/common.py
  • doc/io/dynamodb/cdc-lambda.md
  • cratedb_toolkit/util/app.py
  • doc/sandbox.md
  • cratedb_toolkit/cluster/guide.py
  • cratedb_toolkit/io/mongodb/adapter.py
  • cratedb_toolkit/testing/testcontainers/mongodb.py
  • tests/info/test_cli.py
  • cratedb_toolkit/info/core.py
  • CHANGES.md
  • cratedb_toolkit/io/mongodb/api.py
  • cratedb_toolkit/util/crash.py
  • cratedb_toolkit/util/runtime.py
  • tests/cluster/test_core.py
  • tests/examples/test_shell.sh
  • cratedb_toolkit/shell/cli.py
  • tests/cluster/test_guide.py
  • cratedb_toolkit/exception.py
  • doc/io/index.md
  • .github/workflows/main.yml
  • tests/api/test_util.py
  • doc/util/shell.md
  • cratedb_toolkit/settings/compare.py
  • examples/python/cloud_import.py
  • cratedb_toolkit/testing/testcontainers/cratedb.py
  • pyproject.toml
  • cratedb_toolkit/util/database.py
  • tests/cluster/test_cli.py
  • cratedb_toolkit/model.py
  • cratedb_toolkit/cluster/model.py
  • cratedb_toolkit/io/croud.py
🧰 Additional context used
🧠 Learnings (3)
tests/conftest.py (2)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:55-62
Timestamp: 2025-04-26T17:45:20.650Z
Learning: In the CrateDB toolkit, the container start operation is idempotent, meaning it can be safely called multiple times without causing issues. This allows for flexible container lifecycle management where both `setup()` may call `container.start()` and parent classes like `PytestTestcontainerAdapter` may also invoke `start()` methods.
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: tests/conftest.py:168-170
Timestamp: 2025-05-02T15:26:10.385Z
Learning: In the `cloud_environment` fixture in tests/conftest.py, environment variables like TEST_CRATEDB_CLOUD_API_KEY, TEST_CRATEDB_CLOUD_API_SECRET, TEST_CRATEDB_CLOUD_ORGANIZATION_ID, and TEST_CRATEDB_CLUSTER_NAME are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.
cratedb_toolkit/cluster/core.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/api/main.py:263-265
Timestamp: 2025-04-26T21:59:59.463Z
Learning: When handling DNS propagation delays after resource creation (like CrateDB Cloud clusters), both fixed sleeps and immediate polling have drawbacks. Fixed sleeps are brittle, but immediate polling risks negative DNS caching. A hybrid approach with a short initial sleep followed by polling with exponential backoff is preferred.
cratedb_toolkit/cfr/marimo.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cfr/marimo.py:53-54
Timestamp: 2025-05-02T11:43:42.639Z
Learning: The submodules within `cratedb_toolkit/io` employ a different naming scheme than other parts of the codebase and retain their own conventions (e.g., may continue using `cratedb_sqlalchemy_url` instead of `sqlalchemy_url`).
🧬 Code Graph Analysis (6)
cratedb_toolkit/util/setting.py (1)
cratedb_toolkit/config.py (2)
  • RUNNING_ON_PYTEST (26-27)
  • RUNNING_ON_JUPYTER (22-23)
cratedb_toolkit/io/cli.py (3)
cratedb_toolkit/cluster/core.py (4)
  • DatabaseCluster (603-656)
  • create (623-656)
  • load_table (342-395)
  • load_table (523-600)
cratedb_toolkit/model.py (3)
  • InputOutputResource (205-212)
  • TableAddress (182-201)
  • schema (174-178)
cratedb_toolkit/cluster/model.py (1)
  • load_table (184-193)
cratedb_toolkit/testing/testcontainers/util.py (5)
tests/conftest.py (1)
  • setup (56-62)
tests/io/dynamodb/conftest.py (1)
  • setup (45-56)
tests/io/mongodb/conftest.py (1)
  • setup (38-48)
tests/io/influxdb/conftest.py (1)
  • setup (28-34)
cratedb_toolkit/testing/testcontainers/cratedb.py (2)
  • start (176-182)
  • stop (184-189)
cratedb_toolkit/__init__.py (4)
cratedb_toolkit/config.py (2)
  • preconfigure (54-65)
  • configure (35-51)
cratedb_toolkit/info/cli.py (1)
  • cluster (60-64)
cratedb_toolkit/cluster/core.py (3)
  • DatabaseCluster (603-656)
  • ManagedCluster (104-445)
  • StandaloneCluster (449-600)
cratedb_toolkit/model.py (2)
  • InputOutputResource (205-212)
  • TableAddress (182-201)
cratedb_toolkit/adapter/rockset/server/dependencies.py (2)
cratedb_toolkit/util/database.py (1)
  • DatabaseAdapter (43-421)
cratedb_toolkit/model.py (1)
  • dburi (103-107)
cratedb_toolkit/cfr/cli.py (4)
cratedb_toolkit/util/app.py (1)
  • make_cli (11-52)
cratedb_toolkit/util/data.py (1)
  • path_from_url (51-54)
cratedb_toolkit/cfr/systable.py (2)
  • SystemTableExporter (116-190)
  • SystemTableImporter (193-257)
cratedb_toolkit/model.py (4)
  • dburi (103-107)
  • DatabaseAddress (55-178)
  • from_string (63-72)
  • from_string (200-201)
🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/core.py

[warning] 99-101: cratedb_toolkit/cluster/core.py#L99-L101
Added lines #L99 - L101 were not covered by tests


[warning] 121-128: cratedb_toolkit/cluster/core.py#L121-L128
Added lines #L121 - L128 were not covered by tests


[warning] 131-134: cratedb_toolkit/cluster/core.py#L131-L134
Added lines #L131 - L134 were not covered by tests


[warning] 138-140: cratedb_toolkit/cluster/core.py#L138-L140
Added lines #L138 - L140 were not covered by tests


[warning] 144-144: cratedb_toolkit/cluster/core.py#L144
Added line #L144 was not covered by tests


[warning] 148-148: cratedb_toolkit/cluster/core.py#L148
Added line #L148 was not covered by tests

cratedb_toolkit/cluster/cli.py

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests


[warning] 93-95: cratedb_toolkit/cluster/cli.py#L93-L95
Added lines #L93 - L95 were not covered by tests


[warning] 106-109: cratedb_toolkit/cluster/cli.py#L106-L109
Added lines #L106 - L109 were not covered by tests


[warning] 114-115: cratedb_toolkit/cluster/cli.py#L114-L115
Added lines #L114 - L115 were not covered by tests


[warning] 127-127: cratedb_toolkit/cluster/cli.py#L127
Added line #L127 was not covered by tests


[warning] 129-130: cratedb_toolkit/cluster/cli.py#L129-L130
Added lines #L129 - L130 were not covered by tests


[warning] 133-133: cratedb_toolkit/cluster/cli.py#L133
Added line #L133 was not covered by tests


[warning] 145-145: cratedb_toolkit/cluster/cli.py#L145
Added line #L145 was not covered by tests


[warning] 147-148: cratedb_toolkit/cluster/cli.py#L147-L148
Added lines #L147 - L148 were not covered by tests


[warning] 151-151: cratedb_toolkit/cluster/cli.py#L151
Added line #L151 was not covered by tests


[warning] 171-171: cratedb_toolkit/cluster/cli.py#L171
Added line #L171 was not covered by tests


[warning] 173-176: cratedb_toolkit/cluster/cli.py#L173-L176
Added lines #L173 - L176 were not covered by tests


[warning] 179-182: cratedb_toolkit/cluster/cli.py#L179-L182
Added lines #L179 - L182 were not covered by tests


[warning] 188-191: cratedb_toolkit/cluster/cli.py#L188-L191
Added lines #L188 - L191 were not covered by tests


[warning] 197-202: cratedb_toolkit/cluster/cli.py#L197-L202
Added lines #L197 - L202 were not covered by tests

cratedb_toolkit/__init__.py

[warning] 14-14: cratedb_toolkit/init.py#L14
Added line #L14 was not covered by tests


[warning] 16-16: cratedb_toolkit/init.py#L16
Added line #L16 was not covered by tests


[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests


[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by tests

cratedb_toolkit/adapter/rockset/cli.py

[warning] 38-38: cratedb_toolkit/adapter/rockset/cli.py#L38
Added line #L38 was not covered by tests


[warning] 41-41: cratedb_toolkit/adapter/rockset/cli.py#L41
Added line #L41 was not covered by tests

cratedb_toolkit/adapter/rockset/server/dependencies.py

[warning] 12-13: cratedb_toolkit/adapter/rockset/server/dependencies.py#L12-L13
Added lines #L12 - L13 were not covered by tests

cratedb_toolkit/cfr/cli.py

[warning] 124-124: cratedb_toolkit/cfr/cli.py#L124
Added line #L124 was not covered by tests


[warning] 167-167: cratedb_toolkit/cfr/cli.py#L167
Added line #L167 was not covered by tests


[warning] 198-199: cratedb_toolkit/cfr/cli.py#L198-L199
Added lines #L198 - L199 were not covered by tests


[warning] 215-216: cratedb_toolkit/cfr/cli.py#L215-L216
Added lines #L215 - L216 were not covered by tests


[warning] 241-241: cratedb_toolkit/cfr/cli.py#L241
Added line #L241 was not covered by tests

cratedb_toolkit/cfr/marimo.py

[warning] 53-54: cratedb_toolkit/cfr/marimo.py#L53-L54
Added lines #L53 - L54 were not covered by tests

🪛 LanguageTool
doc/backlog/main.md

[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...

(UNLIKELY_OPENING_PUNCTUATION)


[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...

(SOONER_OR_LATER)


[uncategorized] ~62-~62: The preposition ‘to’ seems more likely in this position.
Context: ... Accept alias --format {jsonl,ndjson} for --format json_row - Catch recursion ...

(AI_HYDRA_LEO_REPLACE_FOR_TO)

doc/cluster/_address.md

[uncategorized] ~5-~5: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/backlog.md

[uncategorized] ~63-~63: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start: Avoid creating a project on each deploy...

(UNLIKELY_OPENING_PUNCTUATION)


[uncategorized] ~98-~98: Possible missing preposition found.
Context: ...r caching the JWT token, and compensate token expiry

(AI_EN_LECTOR_MISSING_PREPOSITION)

doc/cluster/python.md

[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API The cratedb_toolkit.ManagedCluster class provides higher level API/SDK en...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~41-~41: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
🔇 Additional comments (31)
cratedb_toolkit/util/setting.py (3)

11-21: Good job defining a clean Setting dataclass

The Setting class effectively encapsulates a click.Parameter and an optional group identifier. The asdict() method is a handy utility for converting instances to dictionaries, which will be useful for serialization and debugging.


23-28: Straightforward utility function for command line argument detection

This function cleanly checks for the presence of long options in command-line arguments, which is useful for determining whether to prioritize CLI arguments over environment variables.


30-38: Well-implemented environment variable loading with environment detection

The function appropriately skips dotenv loading when running under pytest, which prevents test environments from being affected by local .env files. The logging of the dotenv file path is also helpful for debugging.

cratedb_toolkit/cfr/cli.py (2)

15-15: Updated import path reflects architectural improvements

The change from cratedb_toolkit.api.cli to cratedb_toolkit.util.app for the make_cli function aligns with the architectural refactoring to consolidate cluster addressing options and improve the structure of the codebase.


32-32:

Details

✅ Verification successful

Unified cluster addressing approach applied consistently

All instances of cratedb_sqlalchemy_url and cratedb_http_url have been replaced with a single cluster_url parameter derived from ctx.meta["cluster_url"]. This standardization aligns with the PR objective to unify cluster addressing across the codebase.

The help text examples have also been updated to reflect the new environment variable name CRATEDB_CLUSTER_URL.

The consistency of this change can be verified by checking that all occurrences have been updated:

Also applies to: 61-61, 77-79, 124-125, 167-168, 198-199, 215-216, 241-241


🏁 Script executed:

#!/bin/bash
# Check for any remaining occurrences of the old variable names
echo "Checking for remaining occurrences of cratedb_sqlalchemy_url:"
rg "cratedb_sqlalchemy_url" --type py "cratedb_toolkit/cfr/"

echo "Checking for remaining occurrences of cratedb_http_url:"
rg "cratedb_http_url" --type py "cratedb_toolkit/cfr/"

echo "Checking for new usages of cluster_url:"
rg "cluster_url" --type py "cratedb_toolkit/cfr/"

Length of output: 1320


Unified cluster_url usage verified

All references to cratedb_sqlalchemy_url and cratedb_http_url have been removed, and cluster_url is now consistently used in cratedb_toolkit/cfr/cli.py at the following locations:

  • Lines 32, 61
  • Lines 77–79
  • Lines 124–125
  • Lines 167–168
  • Lines 198–199
  • Lines 215–216
  • Line 241

No remaining occurrences of the old variables were detected. Changes are approved.

cratedb_toolkit/__init__.py (1)

18-29: Well-structured public API with __all__

Good job defining the public API explicitly using __all__. This makes it clear what parts of the package are intended for external use and helps with wildcard imports.

The inclusion of both the unified DatabaseCluster and the specific implementation classes (ManagedCluster, StandaloneCluster) gives users flexibility while encouraging the use of the factory pattern.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 18-20: cratedb_toolkit/init.py#L18-L20
Added lines #L18 - L20 were not covered by tests


[warning] 22-22: cratedb_toolkit/init.py#L22
Added line #L22 was not covered by tests

cratedb_toolkit/adapter/rockset/cli.py (1)

9-9:

Details

✅ Verification successful

Successfully refactored to use unified cluster addressing

The changes correctly replace the local decorator with the reusable option_cluster_url and update all related parameters and environment variables for consistency.

This aligns well with the PR objective of unifying cluster connection options across the codebase.

Note: The error path for missing cluster_url still lacks test coverage. While this was flagged in a previous review and marked as out of scope, consider adding tests in a future iteration to validate this critical validation path.

Also applies to: 22-22, 29-30, 34-34, 38-38, 41-41


🏁 Script executed:

#!/bin/bash
# Verify this is the only place in the file that hasn't been updated
rg -n "cratedb_sqlalchemy" --glob="**/adapter/rockset/**"

Length of output: 57


No leftover cratedb_sqlalchemy references found
Ran rg -n "cratedb_sqlalchemy" --glob="**/adapter/rockset/**" with no matches, confirming all imports have been updated.
Approve code changes.
Note: the missing cluster_url error path still lacks test coverage; consider adding tests for that validation in a future iteration.

doc/backlog/main.md (2)

3-19: Good backlog planning for cluster addressing simplification

The new "Iteration +0" section appropriately documents the planned simplifications to the CLI API by consolidating connection options and cluster addressing methods.

The note about software tests not pruning tables on CrateDB Cloud is an important issue to track, as this could lead to resource exhaustion in the test environment.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~8-~8: Loose punctuation mark.
Context: ... methods. - cratedb_toolkit/cfr/cli.py: Migrate to ctx.meta["address"], like ...

(UNLIKELY_OPENING_PUNCTUATION)


[style] ~10-~10: This expression is wordy and overused. Consider replacing it with a more concise and formal alternative.
Context: ...is will probably cause a disk spillover sooner or later on this instance. - When running the te...

(SOONER_OR_LATER)


60-60: Consistent updates to command naming and environment variables

The changes correctly update the UX command name from ctk list-jobs to ctk cluster list-jobs and update environment variable references to match the new unified addressing scheme.

These changes maintain consistency with the broader refactoring of cluster addressing throughout the codebase.

Also applies to: 65-65, 73-73

tests/conftest.py (3)

45-71: Well-implemented CrateDBFixture class.

The new CrateDBFixture class provides a clean wrapper for the CrateDB test container and database adapter. The container start operation is idempotent, so calling self.container.start() in setup() is safe even though PytestTestcontainerAdapter.run_setup() might also invoke self.start() afterward.


89-106: Well-designed environment management fixture.

The reset_environment fixture complements the existing prune_environment fixture by specifically targeting variables from ManagedClusterSettings. The code correctly guards against None envvar names with the condition check.


140-175: Well-structured cloud environment fixtures.

The new cloud-related fixtures provide a clean way to set up environment variables for CrateDB Cloud integration tests. Based on the learnings, the environment variables here are partially mutually exclusive, meaning not all of them need to be present simultaneously for tests to work.

cratedb_toolkit/io/cli.py (4)

8-11: Good refactoring of imports for unified cluster management.

The imports have been updated to use the new DatabaseCluster abstraction and option decorators, which aligns with the broader refactoring to unify cluster management.


30-32: Improved CLI option decorators.

The CLI options have been updated to use the new unified option decorators, providing a more consistent interface across the toolkit.


60-61: Good variable renaming for clarity.

Renaming resource to source improves code readability by being more descriptive of its purpose.


64-70: Simplified cluster creation and operation logic.

Using the DatabaseCluster.create factory method centralizes cluster instantiation logic and makes the code more concise. This approach properly handles the choice between ManagedCluster and StandaloneCluster based on the provided parameters.

cratedb_toolkit/testing/testcontainers/util.py (1)

48-54: Improved port handling for Docker container endpoints

The enhanced port discovery logic correctly retrieves the port attribute from either self.port or self.port_to_expose, and strips any protocol suffixes to ensure valid connection URLs. The fallback mechanism and validation are well-implemented.

cratedb_toolkit/cluster/cli.py (3)

21-58: Well-documented command help functions with clear examples

The help functions provide excellent documentation with practical examples for each command, making the CLI more user-friendly.


72-83: Good pattern for CLI command implementation

The info command is well-structured with proper error handling through the context manager, and uses the new ClusterInformation.from_id_or_name method for flexible cluster identification.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests


173-182: Operation check in list_jobs is good

The code correctly checks if not cluster.operation before attempting to access the operation's methods, preventing potential null reference issues. Good defensive programming.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 173-176: cratedb_toolkit/cluster/cli.py#L173-L176
Added lines #L173 - L176 were not covered by tests


[warning] 179-182: cratedb_toolkit/cluster/cli.py#L179-L182
Added lines #L179 - L182 were not covered by tests

cratedb_toolkit/util/croud.py (6)

47-63: Good workaround for croud clusters deploy JSON parsing issue

The invoke_safedecode method correctly handles the issue with multiple JSON payloads from croud clusters deploy by using JSONDecoder.raw_decode. The fix properly unpacks the tuple returned by raw_decode to extract the JSON object.

The TODO and FIXME comments indicate this is a temporary solution. Tracking the upstream fix in croud is a good approach (as mentioned in crate/croud#564).


97-107: Good solution for removing conflicting version arguments

The remove_version_argument method effectively removes the --version argument from the parser, which is necessary to avoid conflicts with Click's version option.


173-178: Improved logging for croud output

The updated logging approach redirects croud's output to DEBUG level with a clear prefix, making it easier to trace croud operations in logs.


191-193: Properly using headless configuration for croud

The code correctly patches croud's global configuration with a headless configuration, allowing for non-interactive usage.


204-211: Clean context manager implementation for headless configuration

The headless_config context manager is well-designed, allowing temporary patching of the croud configuration during execution.


282-321: Well-implemented table name quoting function

The table_fqn function handles different table name formats correctly, including schema-qualified names. The error handling for empty table names is good, and the function handles already-quoted names properly.

cratedb_toolkit/cluster/croud.py (4)

26-64: Well-structured URL generator for cloud root API endpoints

The CloudRootUrlGenerator class provides a clean API for generating CrateDB Cloud API URLs at the organization level. The implementation correctly handles both global and organization-specific endpoints.


95-210: Well-implemented CloudRootServices with good error handling

The CloudRootServices class provides comprehensive functionality for interacting with CrateDB Cloud API at the root level. The implementation includes proper error handling, useful logging, and good project management facilities.


295-339: CloudClusterServices with comprehensive cluster operations

The CloudClusterServices class provides a good set of operations for managing individual clusters, including info retrieval and suspension/resumption.


340-435: Well-implemented import job management

The import job management functions correctly handle both local and remote resources, with proper validation of required parameters and appropriate error handling.

cratedb_toolkit/cluster/core.py (1)

434-445: 🛠️ Refactor suggestion

JWT token expiration is not handled

The query method uses the JWT token context once obtained from probe(), but doesn't refresh it when it expires (typically after 1 hour).

Track token creation time and refresh the token when needed:

def __init__(self, ...):
    # Existing code...
    self._jwt_ctx = nullcontext()
+   self._jwt_token_created_at = 0  # Timestamp when token was created

def probe(self):
    # Existing code...
    self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
+   self._jwt_token_created_at = time.time()
    # Rest of the method...

def query(self, sql: str):
    # Ensure we have cluster connection details
    if not self.info or not self.info.cloud.get("url"):
        self.probe()
+   
+   # Refresh token if it's older than 30 minutes (to be safe before the 1h expiry)
+   current_time = time.time()
+   if current_time - self._jwt_token_created_at > 1800:  # 30 minutes
+       logger.debug("Refreshing JWT token")
+       self._jwt_ctx = jwt_token_patch(self.info.jwt.token)
+       self._jwt_token_created_at = current_time
    
    with self._jwt_ctx:
        # Rest of the method...

Likely an incorrect or invalid review comment.

Comment thread cratedb_toolkit/adapter/rockset/server/dependencies.py
Comment thread cratedb_toolkit/cluster/cli.py
Comment thread cratedb_toolkit/cluster/cli.py
Comment thread cratedb_toolkit/cluster/croud.py
Comment thread cratedb_toolkit/cluster/core.py
Comment thread cratedb_toolkit/cluster/core.py
Comment thread cratedb_toolkit/cluster/core.py
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (11)
doc/cluster/python.md (4)

1-3: 🛠️ Refactor suggestion

Fix MyST syntax for document anchor.

The current anchor syntax (cluster-api-python)= is non-standard and may not work correctly with MyST Markdown.

-(cluster-api-python)=
-
-# CrateDB Cluster Python API
+# CrateDB Cluster Python API {#cluster-api-python}

42-43: 🛠️ Refactor suggestion

Improve environment variables listing format.

The current definition-list style may not render as expected in MyST Markdown.

-:Environment variables:
-  `CRATEDB_CLUSTER_ID`, `CRATEDB_CLUSTER_NAME`, `CRATEDB_CLUSTER_URL`
+**Environment variables**
+- `CRATEDB_CLUSTER_ID`
+- `CRATEDB_CLUSTER_NAME`
+- `CRATEDB_CLUSTER_URL`

45-50: 🛠️ Refactor suggestion

Update MyST admonition syntax for better compatibility.

The current :::{note} block format may not render correctly in all MyST processors.

-:::{note}
-- All address options are mutually exclusive.
-- The cluster identifier takes precedence over the cluster name.
-- The cluster url takes precedence over the cluster id and name.
-- Environment variables can be stored into an `.env` file in your working directory.
-:::
+```{note}
+- All address options are mutually exclusive.
+- The cluster identifier takes precedence over the cluster name.
+- The cluster url takes precedence over the cluster id and name.
+- Environment variables can be stored into an `.env` file in your working directory.
+```

73-75: 🛠️ Refactor suggestion

Use standard MyST admonition syntax for seealso blocks.

Similar to the note block above, the :::{seealso} block should use standard MyST syntax.

-:::{seealso}
-{ref}`cluster-api-tutorial` includes a full end-to-end tutorial.
-:::
+```{seealso}
+{ref}`cluster-api-tutorial` includes a full end-to-end tutorial.
+```
doc/cluster/backlog.md (2)

33-39: 🛠️ Refactor suggestion

Add missing resolution action for concurrent resume operations

When describing how to handle concurrent cluster resume operations, the section ends with just checking the status but doesn't specify what action to take next.

  => Evaluate `cloud.info.last_async_operation.status` = `SENT|IN_PROGRESS`.

  See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384.
+  => Implement a retry mechanism with exponential backoff, or provide clear error messages 
+     instructing users to wait until the current operation completes before trying again.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

39-39: Bare URL used
null

(MD034, no-bare-urls)


40-43: 🛠️ Refactor suggestion

Add test coverage task for context manager behavior

Based on static analysis hints and previous comments, you should add a specific task to test the ManagedCluster context manager behavior.

- CI: Software tests fail when invoked with the whole test suite, but work in isolation
  `test_import_managed_csv_local`
- Expose `ManagedCluster`'s `stop_on_exit` option per environment variable
+- CI: Software tests fail when invoked with the whole test suite, but work in isolation
+  `test_import_managed_csv_local`
+- Add comprehensive test coverage for `ManagedCluster` context manager behavior
+  (specifically `__enter__` and `__exit__` methods)
+- Expose `ManagedCluster`'s `stop_on_exit` option per environment variable
cratedb_toolkit/cluster/cli.py (3)

190-207: Add tests for handle_command_errors context manager

The handle_command_errors context manager lacks test coverage according to static analysis hints. Adding tests for this critical error handling component would ensure it works as expected.

#!/bin/bash
# Check for existing tests of handle_command_errors
rg -l "handle_command_errors" tests/

Please create a new test file (e.g. tests/test_cli_error_handling.py) with tests for each error path:

import pytest
from click import ClickException
from cratedb_toolkit.cluster.cli import handle_command_errors
from cratedb_toolkit.exception import CroudException

def test_croud_exception_handling(caplog):
    """Test that CroudException is logged and re-raised."""
    with pytest.raises(CroudException):
        with handle_command_errors("test operation"):
            raise CroudException("test error")
    assert "Failed to test operation" in caplog.text
    assert "test error" in caplog.text

def test_click_exception_passthrough():
    """Test that ClickException passes through unchanged."""
    with pytest.raises(ClickException):
        with handle_command_errors("test operation"):
            raise ClickException("click error")

def test_generic_exception_becomes_system_exit(caplog):
    """Test that generic exceptions cause SystemExit(1)."""
    with pytest.raises(SystemExit) as excinfo:
        with handle_command_errors("test operation"):
            raise ValueError("value error")
    assert excinfo.value.code == 1
    assert "Unexpected error" in caplog.text
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 193-196: cratedb_toolkit/cluster/cli.py#L193-L196
Added lines #L193 - L196 were not covered by tests


[warning] 202-207: cratedb_toolkit/cluster/cli.py#L202-L207
Added lines #L202 - L207 were not covered by tests


133-139: ⚠️ Potential issue

Ensure cluster.info is not None before accessing asdict() in the start command

The start command uses cluster.info.asdict() without verifying that cluster.info is not None. This could cause type errors if cluster information is unavailable.

    with handle_command_errors("start cluster"):
        # Acquire the database cluster handle and submit the `start` command.
        cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).start()
        logger.info(f"Successfully acquired cluster: {cluster}")

        # Display cluster information.
-       jd(cluster.info.asdict())
+       if cluster.info:
+           jd(cluster.info.asdict())
+       else:
+           logger.warning("No cluster information available")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 134-135: cratedb_toolkit/cluster/cli.py#L134-L135
Added lines #L134 - L135 were not covered by tests


[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests


151-156: ⚠️ Potential issue

Ensure cluster.info is not None before accessing asdict() in the stop command

The stop command uses cluster.info.asdict() without verifying that cluster.info is not None. This could cause type errors if cluster information is unavailable.

    with handle_command_errors("stop cluster"):
        # Acquire the database cluster handle and submit the `suspend` command.
        cluster = ManagedCluster(cluster_id=cluster_id, cluster_name=cluster_name).probe().stop()
        logger.info(f"Successfully suspended cluster: {cluster}")

        # Display cluster information.
-       jd(cluster.info.asdict())
+       if cluster.info:
+           jd(cluster.info.asdict())
+       else:
+           logger.warning("No cluster information available")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 152-153: cratedb_toolkit/cluster/cli.py#L152-L153
Added lines #L152 - L153 were not covered by tests


[warning] 156-156: cratedb_toolkit/cluster/cli.py#L156
Added line #L156 was not covered by tests

cratedb_toolkit/cluster/croud.py (2)

253-258: Document environment variables in project documentation

The TODO comment about documenting environment variables should be addressed to ensure users understand the available configuration options.

How to document environment variables in Python packages?

Create a new "Environment Variables" section in your README.md or docs/configuration.md with a table listing all environment variables used in the code:

## Environment Variables

| Name                          | Description                                      | Required | Default                       | Example        |
|-------------------------------|--------------------------------------------------|----------|-------------------------------|----------------|
| `CRATEDB_CLOUD_TIER`          | CrateDB Cloud cluster tier                       | No       | `default`                     | `high-memory`  |
| `CRATEDB_CLOUD_PRODUCT`       | CrateDB Cloud product type                       | No       | `crfree`                      | `s2`           |
| `CRATEDB_CHANNEL`             | Release channel for CrateDB                      | No       | `stable`                      | `nightly`      |
| `CRATEDB_VERSION`             | CrateDB version to install                       | No       | `5.10.4`                      | `nightly`      |
| `CRATEDB_USERNAME`            | Username for cluster access                      | Yes      | N/A                           | `admin`        |
| `CRATEDB_PASSWORD`            | Password for cluster access                      | Yes      | N/A                           | `securepass123`|
| `CTK_KEYRING_CRYPTFILE_PASSWORD` | Password for keyring encryption               | No       | `TruaframEkEk`                | `custom-key`   |

After adding this documentation, remove the TODO comment from the code.


448-452: ⚠️ Potential issue

Avoid hardcoded fallback for keyring password

Using a hardcoded fallback for the keyring password presents a security risk. A more secure approach would be to generate a unique key or require the environment variable.

-        kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD") or "TruaframEkEk"
+        kr.keyring_key = os.getenv("CTK_KEYRING_CRYPTFILE_PASSWORD")
+        if not kr.keyring_key:
+            # Generate a unique key based on user environment
+            import uuid
+            import getpass
+            user_id = getpass.getuser()
+            kr.keyring_key = str(uuid.uuid5(uuid.NAMESPACE_DNS, f"ctk-{user_id}"))
+            logger.debug("Generated unique keyring key based on user identity")

This generates a unique key based on the user's identity when the environment variable is not set, which is more secure than a hardcoded value.

🧹 Nitpick comments (11)
cratedb_toolkit/testing/testcontainers/util.py (1)

190-205: Comprehensive error handling in run_setup.

The method properly handles Docker connection issues, but contains duplicate error detection logic compared to DockerSkippingContainer.

Consider extracting the common error detection patterns into a shared helper function to fulfill the "Synchronize with DockerSkippingContainer" TODO:

def is_docker_unavailable_error(ex):
    """Determine if an exception indicates Docker daemon unavailability."""
    return any(token in str(ex) for token in ("Connection aborted", "Error while fetching server API version"))

Then use this in both classes:

if is_docker_unavailable_error(ex):
    # Skip logic here
doc/cluster/backlog.md (4)

39-39: Format bare URL as proper Markdown link

The URL should be formatted as a proper Markdown link for better readability and accessibility.

-  See also https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384.
+  See also [GitHub Actions run for PR #81](https://github.com/crate/cratedb-toolkit/actions/runs/14682363239/job/41206608090?pr=81#step:5:384).
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

39-39: Bare URL used
null

(MD034, no-bare-urls)


45-45: Format bare URL as proper Markdown link

The URL should be formatted as a proper Markdown link for better readability and accessibility.

-- Update `shunit2`. -- https://github.com/kward/shunit2
+- Update `shunit2`. -- [shunit2 repository](https://github.com/kward/shunit2)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

45-45: Bare URL used
null

(MD034, no-bare-urls)


70-76: Add language identifier to fenced code block

Code blocks should include a language identifier for proper syntax highlighting.

-  ```
+  ```json
   "health": {
     "last_seen": "",
     "running_operation": "CREATE",
     "status": "UNREACHABLE"
   },

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

</details>

</details>

---

`19-19`: **Consider future strategy for default CrateDB version**

The TODO comment about using `latest` or `nightly` as default raises an important architectural question that should be addressed in a future iteration.


Consider adding this as a dedicated item to your Iteration +1 or +2 backlog. A proper implementation would require:
1. A mechanism to determine the "latest" stable version dynamically
2. Clear documentation about stability implications when using "nightly" builds
3. Environment variable configuration options for controlling this behavior

This is especially important for CI/CD pipelines that need predictable behavior.

</blockquote></details>
<details>
<summary>cratedb_toolkit/cluster/croud.py (2)</summary><blockquote>

`401-417`: **Improve URL vs. local file detection logic**

The current method of detecting whether a resource URL refers to a local file or a remote URL could be improved. The check `Path(resource.url).exists()` might not work correctly for all URL types.


Your approach with `is_remote = "://" in resource.url` on line 403 is good. However, the combined condition on line 404 could be clearer by separating the checks:

```diff
-        is_remote = "://" in resource.url
-        if not is_remote and Path(resource.url).exists():
+        # Determine if this is a local file or remote URL
+        is_remote = "://" in resource.url
+        is_local_file = not is_remote and Path(resource.url).exists()
+        
+        if is_local_file:

This makes the logic more explicit and easier to understand.


19-20: Consider semantic versioning for default CrateDB version

The default CrateDB version is hardcoded, which might require frequent updates.

Consider using semantic version specifiers that allow for compatible updates:

-# TODO: Use `latest` CrateDB by default, or even `nightly`?
-DEFAULT_CRATEDB_VERSION = "5.10.4"
+# Default to the latest stable 5.x version
+DEFAULT_CRATEDB_VERSION = "5.10"  # Will use latest patch version

Alternatively, implement a mechanism to determine the latest stable version dynamically at runtime by querying an API endpoint or release feed.

cratedb_toolkit/util/setting.py (4)

30-38: Consider adding dotenv file override parameter

The current implementation works well, but you might want to add a parameter to allow specifying a custom dotenv file path for more flexibility.

-def init_dotenv():
+def init_dotenv(dotenv_path: str = None):
    """
    Load environment variables from `.env` file.
    
    Args:
        dotenv_path: Optional path to the dotenv file. If not provided, will try to find it automatically.
    """
    if not CONFIG.RUNNING_ON_PYTEST:
-       dotenv_file = find_dotenv()
+       dotenv_file = dotenv_path or find_dotenv()
        logger.info(f"Loading environment variables from .env file: {dotenv_file}")
        load_dotenv(dotenv_file)

40-74: Robust setting resolution with multiple sources

This function elegantly handles obtaining settings from multiple sources with appropriate environment detection for Jupyter and pytest. The docstring clearly documents the priority order and sources.

One minor suggestion is to consider returning early if in Jupyter/pytest and no args are provided, rather than creating an empty args list.

    if CONFIG.RUNNING_ON_JUPYTER or CONFIG.RUNNING_ON_PYTEST:
        args = []
+       # Early return for notebook/test environments with empty args to avoid unnecessary context creation
+       try:
+           with command.make_context(prog_name, args=args) as ctx:
+               return ctx.params
+       except click.exceptions.Exit as ex:
+           if ex.exit_code != 0:
+               raise
+       return {}
    else:
        args = sys.argv[1:]

97-98: Improve error message readability

The guidance message directly includes Python lists (parameter_names and environment_variables), which might not be the most user-friendly format. Consider formatting these lists more readably.

-   guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}"
+   formatted_params = ", ".join(f"`{param}`" for param in parameter_names)
+   formatted_envvars = ", ".join(f"`{env}`" for env in environment_variables)
+   guidance = f"Use one of the CLI arguments {formatted_params} or environment variables {formatted_envvars}"

53-58: Consider logging the exception details at debug level

When handling exceptions from init_dotenv(), consider logging the full exception details at debug level for easier troubleshooting while keeping the warning message concise.

    try:
        init_dotenv()
    except Exception as ex:
        logger.warning(f"Failed to load environment variables from .env file: {ex}")
+       logger.debug("Exception details:", exc_info=True)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f87822e and f19129c.

📒 Files selected for processing (8)
  • cratedb_toolkit/cluster/cli.py (2 hunks)
  • cratedb_toolkit/cluster/croud.py (1 hunks)
  • cratedb_toolkit/testing/testcontainers/util.py (4 hunks)
  • cratedb_toolkit/util/setting.py (1 hunks)
  • doc/cluster/_address.md (1 hunks)
  • doc/cluster/backlog.md (1 hunks)
  • doc/cluster/python.md (1 hunks)
  • tests/cluster/test_cli.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
cratedb_toolkit/cluster/cli.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#81
File: cratedb_toolkit/cluster/cli.py:82-89
Timestamp: 2025-04-27T03:07:04.375Z
Learning: In the cratedb-toolkit codebase, the `handle_command_errors` context manager is designed as an error handler for specific operations (not as a general protective wrapper). It's intentionally used to handle errors for targeted operations rather than wrapping entire function executions.
🧬 Code Graph Analysis (3)
cratedb_toolkit/util/setting.py (1)
cratedb_toolkit/config.py (2)
  • RUNNING_ON_PYTEST (26-27)
  • RUNNING_ON_JUPYTER (22-23)
tests/cluster/test_cli.py (3)
cratedb_toolkit/cluster/cli.py (2)
  • cli (65-69)
  • handle_command_errors (191-207)
cratedb_toolkit/exception.py (1)
  • CroudException (12-13)
tests/conftest.py (1)
  • cloud_environment (149-174)
cratedb_toolkit/testing/testcontainers/util.py (5)
tests/conftest.py (1)
  • setup (56-62)
tests/io/mongodb/conftest.py (1)
  • setup (38-48)
tests/io/dynamodb/conftest.py (1)
  • setup (45-56)
tests/io/influxdb/conftest.py (1)
  • setup (28-34)
cratedb_toolkit/testing/testcontainers/cratedb.py (2)
  • start (176-182)
  • stop (184-189)
🪛 LanguageTool
doc/cluster/backlog.md

[uncategorized] ~63-~63: Loose punctuation mark.
Context: ...ng the block ``` - ctk cluster start: Avoid creating a project on each deploy...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/_address.md

[uncategorized] ~6-~6: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

doc/cluster/python.md

[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API The cratedb_toolkit.ManagedCluster class provides the higher level API/SD...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~41-~41: Loose punctuation mark.
Context: ...r parameters or environment variables. :Environment variables: `CRATEDB_CLUSTE...

(UNLIKELY_OPENING_PUNCTUATION)

🪛 markdownlint-cli2 (0.17.2)
doc/cluster/backlog.md

39-39: Bare URL used
null

(MD034, no-bare-urls)


45-45: Bare URL used
null

(MD034, no-bare-urls)


70-70: Fenced code blocks should have a language specified
null

(MD040, fenced-code-language)

🪛 GitHub Check: codecov/patch
cratedb_toolkit/cluster/cli.py

[warning] 80-82: cratedb_toolkit/cluster/cli.py#L80-L82
Added lines #L80 - L82 were not covered by tests


[warning] 93-95: cratedb_toolkit/cluster/cli.py#L93-L95
Added lines #L93 - L95 were not covered by tests


[warning] 106-108: cratedb_toolkit/cluster/cli.py#L106-L108
Added lines #L106 - L108 were not covered by tests


[warning] 111-112: cratedb_toolkit/cluster/cli.py#L111-L112
Added lines #L111 - L112 were not covered by tests


[warning] 114-114: cratedb_toolkit/cluster/cli.py#L114
Added line #L114 was not covered by tests


[warning] 119-120: cratedb_toolkit/cluster/cli.py#L119-L120
Added lines #L119 - L120 were not covered by tests


[warning] 132-132: cratedb_toolkit/cluster/cli.py#L132
Added line #L132 was not covered by tests


[warning] 134-135: cratedb_toolkit/cluster/cli.py#L134-L135
Added lines #L134 - L135 were not covered by tests


[warning] 138-138: cratedb_toolkit/cluster/cli.py#L138
Added line #L138 was not covered by tests


[warning] 150-150: cratedb_toolkit/cluster/cli.py#L150
Added line #L150 was not covered by tests


[warning] 152-153: cratedb_toolkit/cluster/cli.py#L152-L153
Added lines #L152 - L153 were not covered by tests


[warning] 156-156: cratedb_toolkit/cluster/cli.py#L156
Added line #L156 was not covered by tests


[warning] 176-176: cratedb_toolkit/cluster/cli.py#L176
Added line #L176 was not covered by tests


[warning] 178-181: cratedb_toolkit/cluster/cli.py#L178-L181
Added lines #L178 - L181 were not covered by tests


[warning] 184-187: cratedb_toolkit/cluster/cli.py#L184-L187
Added lines #L184 - L187 were not covered by tests


[warning] 193-196: cratedb_toolkit/cluster/cli.py#L193-L196
Added lines #L193 - L196 were not covered by tests


[warning] 202-207: cratedb_toolkit/cluster/cli.py#L202-L207
Added lines #L202 - L207 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: build-and-publish
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (30)
doc/cluster/python.md (6)

5-7: Great introduction to the SDK capabilities.

The introduction clearly outlines the purpose and capabilities of the ManagedCluster class.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~5-~5: You might be missing the article “the” here.
Context: ...on)= # CrateDB Cluster Python API The cratedb_toolkit.ManagedCluster class provides the higher level API/SD...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


14-19: Clear installation instructions with multiple options.

The installation section effectively recommends the modern uv package manager while providing alternatives.


23-35: Well-documented authentication options.

The authentication section provides clear instructions for both interactive and headless authentication with CrateDB Cloud, including the necessary environment variables.


54-62: Excellent usage example with context manager.

The example clearly demonstrates how to acquire a cluster handle from environment variables and run SQL queries within a context manager.


64-71: Good explanation of context manager behavior.

The documentation clarifies the default behavior of not shutting down on exit and shows how to change this with stop_on_exit=True.


77-80: Well-structured reference links.

The reference links are properly formatted and provide useful resources for users.

cratedb_toolkit/testing/testcontainers/util.py (5)

48-54: Improved container port resolution with error handling.

The enhanced get_real_host_address method now dynamically determines port attributes, raises a meaningful error when no port is found, and properly handles Docker protocol suffixes.


141-162: Well-implemented DockerSkippingContainer for graceful test skipping.

The new DockerSkippingContainer class provides a robust way to handle Docker unavailability by converting connection exceptions to pytest skips, improving test reliability in environments without Docker.

However, note the FIXME to synchronize error detection logic with PytestTestcontainerAdapter.

You'll need to ensure that both classes use the same error detection patterns when updating one of them.


164-180: Good abstraction with PytestTestcontainerAdapter.

This adapter provides a clean interface for test fixtures that use Docker containers, with proper error handling and a well-defined lifecycle.


181-189: Clear documentation for the abstract setup() method.

The docstring clearly explains the responsibilities of subclasses and the sequence of operations during initialization.


206-211: Simple and effective container lifecycle methods.

The start() and stop() methods provide a clean interface for subclasses to manage container lifecycle.

doc/cluster/_address.md (1)

1-6: Good introduction to cluster addressing options.

The document clearly explains the purpose of the addressing options for different cluster scenarios.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~6-~6: Loose punctuation mark.
Context: ... different situations, managed or not. :CLI options: --cluster-id, `--cluste...

(UNLIKELY_OPENING_PUNCTUATION)

tests/cluster/test_cli.py (9)

9-15: Good test for CroudException handling.

The test properly verifies that the error handler logs the failure and re-raises the exception.


17-21: Simple test for ClickException passthrough.

Confirms that ClickException is passed through without any transformation.


23-29: Effective test for generic exception conversion.

Verifies that generic exceptions are converted to SystemExit with a code of 1 and properly logged.


31-44: Comprehensive test for cluster info command.

The test verifies successful execution and checks for expected output content when using a valid cloud environment.


46-59: Thorough verification of cluster health command.

Tests for specific expected output fields like cluster name and status.


61-75: Complete test for ping command with flag verification.

Checks for expected output including the cloud and database flags.


77-89: Good error case testing for unknown cluster names.

Verifies that the command fails with the expected exit code and error message.


91-103: Effective test for unconfigured standalone cluster.

Confirms the command fails with the expected exit code and error message when no cluster is configured.


105-118: Thorough test for unknown standalone cluster.

Verifies both the logged error message and the command output for an unknown cluster name.

cratedb_toolkit/cluster/cli.py (2)

82-89: I see the context manager pattern is used as designed

Based on the retrieved learning, I understand that handle_command_errors is intentionally designed as an error handler for specific operations, not as a general protective wrapper.

The implementation correctly follows the design intent where the context manager provides consistent error handling for the specific operation of displaying cluster information.


159-187: Improve list-jobs implementation with YAML output support

The implementation adds support for YAML output formatting, which is a useful extension of the CLI's capabilities.

The code properly handles the format option with clear choices and defaults, and outputs in the requested format. The conditional check for cluster.operation is a good defensive programming practice.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 176-176: cratedb_toolkit/cluster/cli.py#L176
Added line #L176 was not covered by tests


[warning] 178-181: cratedb_toolkit/cluster/cli.py#L178-L181
Added lines #L178 - L181 were not covered by tests


[warning] 184-187: cratedb_toolkit/cluster/cli.py#L184-L187
Added lines #L184 - L187 were not covered by tests

cratedb_toolkit/cluster/croud.py (4)

79-81: Good implementation of URL validation

The validate method properly checks for the presence of a cluster ID before allowing URL generation, which prevents potential errors from malformed URLs.

This is a robust implementation that addresses validation concerns raised in previous reviews.


180-183: Command arguments passed safely - good practice

The code uses proper argument formatting, passing the argument name and value separately.

arguments = ["--name", name]

This is the correct approach to handle command-line arguments, especially when values might contain spaces or special characters. It addresses concerns raised in previous reviews about argument concatenation safety.


235-249: Robust subscription selection logic

The implementation intelligently selects a subscription when only one exists and provides helpful error messages when there are none or multiple subscriptions.

This is a well-designed approach that:

  1. Attempts to auto-select a subscription only if there's exactly one
  2. Provides clear, user-friendly error messages explaining why auto-selection failed
  3. Includes the list of available subscriptions when multiple exist, making it easy for users to choose

This improves usability while maintaining security by not making assumptions when choices are available.


274-297: Good implementation of command arguments

The code now correctly passes command arguments as separate items rather than concatenated strings with values.

arguments=[
    "--subscription-id",
    subscription_id,
    "--project-id",
    project_id,
    # ...other arguments...
]

This pattern avoids shell injection issues and properly handles special characters in values, addressing concerns raised in previous reviews.

cratedb_toolkit/util/setting.py (3)

14-21: Good use of dataclasses for configuration settings

The Setting dataclass provides a clean and type-safe way to encapsulate click parameters with optional group information. The asdict() method makes it easy to serialize settings.


23-28: LGTM: Simple and effective CLI argument detection

This utility function provides a straightforward way to check if any long-form options were provided on the command line.


76-110:

Details

✅ Verification successful

Improved mutual exclusivity check with clear error messages

The mutual exclusivity validation is well-implemented with helpful error messages. However, there's a potential issue with the group handling logic.

Currently, it collects all settings that have a non-None group attribute, but doesn't filter by specific group values. This might lead to unrelated settings being included in the same mutual exclusivity check if they each have different group values.

Please verify if the intention is to check mutual exclusivity within the same group value:

Here's a suggested fix to group settings by their group value:

def check_mutual_exclusiveness(
    specs: t.List[Setting], settings: t.Dict[str, str], message_none: str = None, message_multiple: str = None
):
    """
    Check settings for mutual exclusiveness.

    It has been inspired by click-option-group's RequiredMutuallyExclusiveOptionGroup.
    https://github.com/click-contrib/click-option-group
    """
-   parameter_names = []
-   environment_variables = []
-   values = []
-   for setting in specs:
-       if setting.group is None:
-           continue
-       if setting.click.name is None:
-           raise ValueError("Setting specification has no name")
-       parameter_names.append(setting.click.opts[0])
-       environment_variables.append(setting.click.envvar)
-       value = settings.get(setting.click.name)
-       values.append(value)
-   guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}"
-   if all(value is None for value in values):
-       if message_none is None:
-           message_none = f"One of the settings is required, but none of them have been specified. {guidance}"
-       raise ValueError(message_none)
-   if values.count(None) < len(values) - 1:
-       if message_multiple is None:
-           # Collect and format the specified settings.
-           specified = [f"{param}={val}" for param, val in zip(parameter_names, values) if val is not None]
-           message_multiple = (
-               f"The settings are mutually exclusive, but multiple were specified: {', '.join(specified)}. {guidance}"
-           )
-       raise ValueError(message_multiple)
-   return values
+   # Group settings by their group attribute
+   groups = {}
+   for setting in specs:
+       if setting.group is None:
+           continue
+       if setting.click.name is None:
+           raise ValueError("Setting specification has no name")
+       if setting.group not in groups:
+           groups[setting.group] = {'settings': [], 'names': [], 'envvars': [], 'values': []}
+       
+       group_data = groups[setting.group]
+       group_data['settings'].append(setting)
+       group_data['names'].append(setting.click.opts[0])
+       group_data['envvars'].append(setting.click.envvar)
+       group_data['values'].append(settings.get(setting.click.name))
+   
+   # Check each group for mutual exclusiveness
+   for group_name, group_data in groups.items():
+       parameter_names = group_data['names']
+       environment_variables = group_data['envvars']
+       values = group_data['values']
+       
+       guidance = f"Use one of the CLI argument {parameter_names} or environment variable {environment_variables}"
+       
+       if all(value is None for value in values):
+           if message_none is None:
+               message_none = f"One of the settings in group '{group_name}' is required, but none have been specified. {guidance}"
+           raise ValueError(message_none)
+           
+       if values.count(None) < len(values) - 1:
+           if message_multiple is None:
+               specified = [f"{param}={val}" for param, val in zip(parameter_names, values) if val is not None]
+               message_multiple = (
+                   f"The settings in group '{group_name}' are mutually exclusive, but multiple were specified: {', '.join(specified)}. {guidance}"
+               )
+           raise ValueError(message_multiple)
+   
+   # Return all values from all groups
+   return [val for group_data in groups.values() for val in group_data['values']]

🏁 Script executed:

#!/bin/bash
# Check if there are any other files in the codebase that use check_mutual_exclusiveness
# and see how they group settings

rg -A 3 "check_mutual_exclusiveness" --glob "!*/util/setting.py"

Length of output: 491


I’d like to verify how settings_spec is structured and what Setting.group actually represents. Let’s pull the relevant definitions and usages:


🏁 Script executed:

#!/bin/bash
set -e

echo "1) occurrences of settings_spec in cluster/core.py:"
rg "settings_spec" -n cratedb_toolkit/cluster/core.py

echo
echo "2) usage context around check_mutual_exclusiveness:"
rg -C5 "check_mutual_exclusiveness" -n cratedb_toolkit/cluster/core.py

echo
echo "3) definition of Setting in util/setting.py:"
rg -C5 "class Setting" -n cratedb_toolkit/util/setting.py

Length of output: 1589


I’d like to inspect how settings_spec is defined and what group values are used for each Setting:


🏁 Script executed:

#!/bin/bash
set -e
echo "Context around settings_spec definition in cluster/core.py:"
rg -C20 "settings_spec =" -n cratedb_toolkit/cluster/core.py

Length of output: 1509


To see all the group values in settings_spec and confirm whether multiple distinct groups exist, let’s dump the full definition:


🏁 Script executed:

#!/bin/bash
set -e

echo "Full settings_spec (lines 50–110):"
sed -n '50,110p' cratedb_toolkit/cluster/core.py

Length of output: 2121


No grouping issue detected; only one group in use

The check_mutual_exclusiveness helper only considers Setting entries with a non-None group, and in ManagedClusterSettings all such entries share the same "cluster-identifier" group. Ungrouped settings are ignored. As there aren’t multiple distinct group values, the current logic correctly enforces exclusivity only within the intended group and can remain as-is.

Comment thread doc/cluster/_address.md
Comment thread doc/cluster/_address.md
@amotl amotl marked this pull request as ready for review May 6, 2025 22:46
@amotl amotl merged commit 8876fb1 into main May 6, 2025
43 checks passed
@amotl amotl deleted the amo/cloud-second branch May 6, 2025 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant