- Kinesis: Added
ctk kinesisCLI group withlist-checkpointsandprune-checkpointscommands for checkpoint table maintenance - Dependencies: Permitted installation of click 8.3
- I/O: API improvements:
ctk {load,save} tablebecamectk {load,save}Also, the--cluster-urloption becomes optional. That's a much more concise interface now.
- I/O: Refactored
ctk.iosubsystem - I/O: Refurbished documentation of the I/O subsystem
- I/O: Started using ingestr for loading data from Amazon Kinesis
- I/O: Started propagating
start_dateparameter to ingestr'sinterval_start - I/O: Started propagating
batch_sizeparameter to ingestr'spage_size - Packaging: Renamed extras
all→full,io-ingestr→io-ingest, andio-all→io-curated
- I/O: Added adapter for Apache Iceberg tables
- I/O: Added adapter for Delta Lake tables
- Retention: Validated integration with Azure blob storage using Azurite
- Fixed missing
pandaspackage that prevented installation/execution without extras
- Settings: Stop flagging
gateway.recover_after_timeas a difference when bothgateway.expected_nodesandgateway.expected_data_nodesare unset (-1). - PyMongo adapter: Adapted to CrateDB 6.2+ which disallows writes to
_idcolumns - Dependencies: Permitted installation of pyarrow 23
- Dependencies: Permitted installation of duckdb 1.x
- Dependencies: Permitted installation of pandas 3.0
- CI: Validated against Python 3.14
- CI: Validated against InfluxDB 2.8
- CI: Validated against MongoDB 8
- I/O: Updated to
influxio-0.6.0. Thanks, @ZillKhan.- The target table is now the measurement name when importing ILP files.
- The InfluxDB source URL accepts a
timeoutquery parameter (seconds) to configure the network timeout when talking to the InfluxDB API. - For ILP imports, the CrateDB URL no longer needs a table component; you can point it at the schema only (the measurement determines the table).
- I/O: Fixed MongoDB CDC invocation. Thanks, Mỹ Duyên.
- OCI: Started producing image
ghcr.io/crate/cratedb-toolkit-ingest - I/O: Added drivers for ODBC and Oracle to
cratedb-toolkit-ingest - I/O: Updated BSON library to support ARM64
- I/O: Updated to
ingestr>=0.13.61 - CFR: Improved log output
- CFR: Fixed double quoting of table name. Thanks, @karynzv.
- CFR: When importing, started using
replacestrategy instead ofappend - CFR: Improved importing data re. type mapping without NumPy
- CFR: Truncated target table before importing, using
appendstrategy again, becausereplacedoesn't do the right DDL. - I/O: Tuned down ingestr, it masked native I/O adapters
- Settings: Fixed comparison of
0svs0ms. Thanks, @hlcianfagna. - DMS: Provided a recipe file to relay primary key and column type map information
- DMS: Provided a recipe option to ignore processing DMS control DDL events
- DMS: Started using the "direct" column mapping by default, retaining the "universal" column mapping optionally.
- Dependencies: Updated to
commons-codec>=0.0.23 - I/O: Adapter for PostgreSQL full-load using ingestr
- I/O: Added documentation about ingestr adapter
- Dependencies: Migrated from
zyptotikray. It's effectively the same, but provided using a dedicated package now - Dependencies: Updated to
croud-1.14 - Dependencies: Updated to
async-kinesis-2.0.0. Thanks, @hampsterx. - CDC: Added canonical SQL example for PostgreSQL from Ibis
- CDC: Enabled loading DMS events from Kinesis streams and stream-dump files
- CDC: Added subcommand
ctk dms table-mappings
- Added lost
pytestdependencies tocratedb-toolkit[testing]
- Downgraded to sqlalchemy-cratedb 0.41, version 0.42 is not GA yet
- CFR: Enhanced job statistics with optional reporting database support. Thanks, @WalBeh.
- Settings: Added settings comparison utility. Thanks, @WalBeh.
- Meta: Added parser for
https://cratedb.com/releases.jsonfile. Thanks, @WalBeh. - CFR: Added the ability to anonymize queries recorded by
collect - Cloud API: SDK and CLI for CrateDB Cloud Cluster and Import APIs. Supports headless/unattended operations on CrateDB Cloud clusters, covering deploy/start/resume and data import procedures using fluent API and CLI.
- Cloud API: Added JWT authentication to client API and
ctk shell. - Cloud API: Added
healthandpingsubcommands toctk cluster - CLI: Downgraded to Click 8.1, as the code is not compatible with 8.2 yet
Breaking changes
Naming things for CLI options and environment variables:
- Converged
--cratedb-sqlalchemy-urlvs.--cratedb-http-urloptions into single--cluster-url - Converged
CRATEDB_SQLALCHEMY_URLvs.CRATEDB_HTTP_URLenv vars into singleCRATEDB_CLUSTER_URL
- MCP: Add subsystem providing a few server and client utilities through
the
ctk query mcp {list,inquire,launch}subcommands. - Docs API: Added extractors for CrateDB functions and settings
- Connect: Respect
sslmodeURI parameter when converting SQLAlchemy connection URLs tohttp(s)://
- Fixed connectivity for
jobstats collect - Refactored code and improved CLI interface of
ctk infovs.ctk cfr - Dependencies: Updated to
crate-2.0.0, which usesorjsonfor JSON marshalling - CFR: Job statistics and slow-query exploration per Marimo notebook. Thanks, @WalBeh.
- Dependencies: Minimize dependencies of core installation,
defer
polarstocratedb-toolkit[io]. - Fixed
ctk cfr info recordabout too large values ofulimit_hard - Improved
ctk shellto also talk to CrateDB standalone databases - Added basic utility command
ctk tail, for tailing a database table, and optionally following the tail - Table Loader: Added capability to load InfluxDB Line Protocol (ILP) files
- Query Collector: Now respects
CRATEDB_CLUSTER_URLenvironment variable
- MongoDB: Added Zyp transformations to the CDC subsystem, making it more symmetric to the full-load procedure.
- Query Converter: Added very basic expression converter utility with CLI interface
- DynamoDB: Added query expression converter for relocating object references, to support query migrations after the breaking change with the SQL DDL schema, by v0.0.27.
- IO: Improved
BulkProcessorwhen running per-record operations by also checkingrowcountfor handlingINSERT OK, 0 rowsresponses - MongoDB: Fixed BSON decoding of
{"$date": 1180690093000}timestamps by updating to commons-codec 0.0.21. - Testcontainers: Don't always pull the OCI image before starting. It is unfortunate in disconnected situations.
- MongoDB: Updated to pymongo 4.9
- DynamoDB: Change CrateDB data model to use (
pk,data,aux) columns Attention: This is a breaking change.
- MongoDB: Configure
MongoDBCrateDBConverterafter updating to commons-codec 0.0.18 - DynamoDB CDC: Fix
MODIFYoperation to also propagate deleted attributes
- Table Loader: Improved conditional handling of "transformation" parameter
- Table Loader: Improved status reporting and error logging in
BulkProcessor - MongoDB: Improve error reporting
- MongoDB Full: Polars'
read_ndjsondoesn't load MongoDB JSON data well, usefsspecandorjsoninstead - MongoDB Full: Improved initialization of transformation subsystem
- MongoDB Adapter: Improved performance of when computing collection cardinality
by using
collection.estimated_document_count() - MongoDB Full: Optionally use
limitparameter as number of total records - MongoDB Adapter: Evaluate
_idfilter field by upcasting tobson.ObjectId, to convey a filter that makesctk load tableprocess a single document, identified by its OID - MongoDB Dependencies: Update to commons-codec 0.0.17
- MongoDB Full: Refactor transformation subsystem to
commons-codec - MongoDB: Update to commons-codec v0.0.16
- MongoDB: Unlock processing multiple collections, either from server database, or from filesystem directory
- MongoDB: Unlock processing JSON files from HTTP resource, using
https+bson:// - MongoDB: Optionally filter server collection using MongoDB query expression
- MongoDB: Improve error handling wrt. bulk operations vs. usability
- DynamoDB CDC: Add
ctk load tableinterface for processing CDC events - DynamoDB CDC: Accept a few more options for the Kinesis Stream: batch-size, create, create-shards, start, seqno, idle-sleep, buffer-time
- DynamoDB Full: Improve error handling wrt. bulk operations vs. usability
- MongoDB: Rename columns with leading underscores to use double leading underscores
- MongoDB: Add support for UUID types
- MongoDB: Improve reading timestamps in previous BSON formats
- MongoDB: Fix processing empty arrays/lists. By default, assume
TEXTas inner type. - MongoDB: For
ctk load table, use "partial" scan for inferring the collection schema, based on the first 10,000 documents. - MongoDB: Skip leaking
UNKNOWNfields into SQL DDL. This means relevant column definitions will not be included into the SQL DDL. - MongoDB: Make
ctk load tableuse thedata OBJECT(DYNAMIC)mapping strategy. - MongoDB: Sanitize lists of varying objects
- MongoDB: Add treatment option for applying special treatments to certain items on real-world data
- MongoDB: Use pagination on source collection, for creating batches towards CrateDB
- MongoDB: Unlock importing MongoDB Extended JSON files using
file+bson://...
- DynamoDB: Add special decoding for varied lists.
Store them into a separate
OBJECT(IGNORED)column in CrateDB. - DynamoDB: Add pagination support for
full-loadtable loader
- DMS/DynamoDB: Fix table name quoting within CDC processor handler
- MongoDB: Fix and verify Zyp transformations
- DMS/DynamoDB/MongoDB I/O: Use SQL with parameters instead of inlining values
- Dependencies: Unpin commons-codec, to always use the latest version
- Dependencies: Unpin lorrystream, to always use the latest version
- MongoDB: Improve type mapper by discriminating between
INTEGERandBIGINT - MongoDB: Improve type mapper by supporting BSON
DatetimeMS,Decimal128, andInt64types
- Processor: Updated Kinesis Lambda processor to understand AWS DMS
- MongoDB: Fix missing output on STDOUT for
migr8 export - MongoDB: Improve timestamp parsing by using
python-dateutil - MongoDB: Converge
_idinput field toidcolumn instead of dropping it - MongoDB: Make user interface use stderr, so stdout is for data only
- MongoDB: Make
migr8 extractwrite to stdout by default - MongoDB: Make
migr8 translateread from stdin by default - MongoDB: Improve user interface messages
- MongoDB: Strip single leading underscore character from all top-level fields
- MongoDB: Map OID types to CrateDB TEXT columns
- MongoDB: Make
migr8 extractandmigr8 exportaccept the--limitoption - MongoDB: Fix indentation in prettified SQL output of
migr8 translate - MongoDB: Add capability to give type hints and add transformations
- Dependencies: Adjust code for lorrystream version 0.0.3
- Dependencies: Update to lorrystream 0.0.4 and commons-codec 0.0.7
- DynamoDB: Add table loader for full-load operations
ctk load table: Added support for MongoDB Change Streams- Fix dependency with the
kagglepackage, downgrade tokaggle==1.6.14 - DynamoDB CDC: Add demo to support reading DynamoDB change data capture
- IO: Added the
if-existsquery parameter by updating to influxio 0.4.0. - Rockset: Added CrateDB Rockset Adapter, a HTTP API emulation layer
- MongoDB: Added adapter amalgamating PyMongo to use CrateDB as backend
- SQLAlchemy: Clean up and refactor SQLAlchemy polyfills
to
cratedb_toolkit.util.sqlalchemy - CFR: Build as a self-contained program using PyInstaller
- CFR: Publish self-contained application bundle to GitHub Workflow Artifacts
- Add
ctk infoandctk cfrdiagnostics programs - Remove support for Python 3.7
- SQLAlchemy dialect: Use
sqlalchemy-cratedb>=0.37.0This includes the fix to theget_table_names()reflection method.
- Dependencies: Migrate from
crate[sqlalchemy]tosqlalchemy-cratedb
- Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by using
ssl=truequery argument also forinfluxdb2://source URLs.
- Fix InfluxDB Cloud <-> CrateDB Cloud connectivity by propagating
ssl=truequery argument. Update dependencies toinfluxio>=0.2.1,<1.
- Dependencies: Unpin upper version bound of
dask. Otherwise, compatibility issues can not be resolved quickly, like with Python 3.11.9. dask/dask#11038
- Dependencies: Use
dask[dataframe]
- datasets: Fix compatibility with Python 3.7
- datasets: Fix dataset loader
- Added
cratedb_toolkit.datasetssubsystem, for acquiring datasets from cratedb-datasets and Kaggle.
- Do not always activate pytest11 entrypoint to pytest fixture
cratedb_service, as it depends on thetestcontainerspackage, which is not always installed.
- Packaging: Use
cloudextra to install relevant packages - Dependencies: Add
testingextra, which installstestcontainersonly - Testing: Export
cratedb_servicefixture as pytest11 entrypoint - Sandbox: Reduce number of extras by just using
all
- Add SQL runner utility primitives to
io.sqlnamespace - Add
import_csv_pandasandimport_csv_daskutility primitives - data: Add subsystem for "loading" data.
- Add SDK and CLI for CrateDB Cloud Data Import APIs
ctk load table ... - Add
migr8program from previous repository - InfluxDB: Add adapter for
influxio - MongoDB: Add
migr8program from previous repository - MongoDB: Improve UX by using
ctk load table mongodb://... - load table: Refactor to use more OO
- Add
examples/cloud_import.py - Adapt testcontainers to be agnostic of the testing framework. Thanks, @pilosus.
-
CLI: Upgrade to
click-aliases>=1.0.2, fixing erroring out when no group aliases are specified. -
Add support for Python 3.12
-
SQLAlchemy: Improve UNIQUE constraints polyfill to accept multiple column names, for emulating unique composite keys.
-
SQLAlchemy: Add a few patches and polyfills, which do not fit well into the vanilla Python driver / SQLAlchemy dialect.
-
Retention: Refactor strategies
delete,reallocate, andsnapshot, to standalone variants. -
Retention: Bundle configuration and runtime settings into
Settingsentity, and use more OO instead of weak dictionaries: AddRetentionStrategy,TableAddress, andSettingsentities, to improve information passing throughout the application and the SQL templates. -
Retention: Add
--schemaoption, andCRATEDB_EXT_SCHEMAenvironment variable, to configure the database schema used to store the retention policy table. The default value isext. -
Retention: Use full-qualified table names everywhere.
-
Retention: Fix: Compensate for
DROP REPOSITORYnow returningRepositoryMissingExceptionwhen the repository does not exist. With previous versions of CrateDB, it wasRepositoryUnknownException.
- Import "data retention" implementation from https://github.com/crate/crate-airflow-tutorial. Thanks, @hammerhead.