Skip to content

Commit 611161f

Browse files
SQLAlchemy core improvements (#688)
* comprehensive join support * fix for wrong values table function construction * fix final and sample override bug * support join using, and array join with multiple cols * adjustments for sqa1.4 compat * consolidate readme * address PR comments
1 parent 5fd16eb commit 611161f

File tree

12 files changed

+1453
-104
lines changed

12 files changed

+1453
-104
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,14 @@ The supported method of passing ClickHouse server settings is to prefix such arg
2424
## UNRELEASED
2525

2626
### New Features
27+
- SQLAlchemy: Comprehensive ClickHouse JOIN support via the new `ch_join()` helper. All strictness modifiers (`ALL`, `ANY`, `SEMI`, `ANTI`, `ASOF`), the `GLOBAL` distribution modifier, and explicit `CROSS JOIN` are now available. Use with `select_from()` to generate ClickHouse-specific join syntax like `GLOBAL ALL LEFT OUTER JOIN`. Closes [#635](https://github.com/ClickHouse/clickhouse-connect/issues/635)
28+
- SQLAlchemy: `array_join()` now supports multiple columns for parallel array expansion. Pass a list of columns and a matching list of aliases to generate `ARRAY JOIN col1 AS a, col2 AS b, col3 AS c`. Single-column usage is unchanged. Closes [#633](https://github.com/ClickHouse/clickhouse-connect/issues/633)
29+
- SQLAlchemy: `ch_join()` now supports `USING` syntax via the new `using` parameter. Pass a list of column name strings to generate `USING (col1, col2)` instead of `ON`. This is important for `FULL OUTER JOIN` where `USING` merges the join column correctly while `ON` produces default values (0, '') for unmatched sides. Closes [#636](https://github.com/ClickHouse/clickhouse-connect/issues/636)
2730
- SQLAlchemy: Add missing Replicated table engine variants: `ReplicatedReplacingMergeTree`, `ReplicatedCollapsingMergeTree`, `ReplicatedVersionedCollapsingMergeTree`, and `ReplicatedGraphiteMergeTree`. Closes [#687](https://github.com/ClickHouse/clickhouse-connect/issues/687)
2831

2932
### Bug Fixes
33+
- SQLAlchemy: Fix `.final()` and `.sample()` silently overwriting each other when chained. Both methods now store modifiers as custom attributes on the `Select` instance and render them during compilation, replacing the previous `with_hint()` approach that only allowed one hint per table. Chaining in either order (e.g. `select(t).final().sample(0.1)`) correctly produces `FROM t FINAL SAMPLE 0.1`. Also fixes rendering for aliased tables (`FROM t AS u FINAL`) and supports explicit table targeting in joins. Fixes [#658](https://github.com/ClickHouse/clickhouse-connect/issues/658)
34+
- SQLAlchemy: Fix `sqlalchemy.values()` to generate ClickHouse's `VALUES` table function syntax. The compiler now emits `VALUES('col1 Type1, col2 Type2', ...)` with the column structure as the first argument, instead of the standard SQL form that places column names after the alias. Generic SQLAlchemy types are mapped to ClickHouse equivalents (e.g. `Integer` to `Int32`, `String` to `String`). Also handles CTE usage by wrapping in `SELECT * FROM VALUES(...)`. Fixes [#681](https://github.com/ClickHouse/clickhouse-connect/issues/681)
3035
- SQLAlchemy: Fix `GraphiteMergeTree` and `ReplicatedGraphiteMergeTree` to properly single-quote the `config_section` argument as ClickHouse requires.
3136

3237
## 0.14.1, 2026-03-11

README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,23 @@ When creating a Superset Data Source, either use the provided connection dialog,
3232
### SQLAlchemy Implementation
3333

3434
ClickHouse Connect includes a lightweight SQLAlchemy dialect implementation focused on compatibility with **Superset**
35-
and **SQLAlchemy Core**.
35+
and **SQLAlchemy Core**. Both SQLAlchemy 1.4 and 2.x are supported. SQLAlchemy 1.4 compatibility is maintained
36+
because Apache Superset currently requires `sqlalchemy>=1.4,<2`.
3637

3738
Supported features include:
3839
- Basic query execution via SQLAlchemy Core
39-
- `SELECT` queries with `JOIN`s, `ARRAY JOIN`, and `FINAL` modifier
40+
- `SELECT` queries with `JOIN`s (including ClickHouse-specific strictness, `USING`, and `GLOBAL` modifiers),
41+
`ARRAY JOIN` (single and multi-column), `FINAL`, and `SAMPLE`
42+
- `VALUES` table function syntax
4043
- Lightweight `DELETE` statements
4144

42-
The implementation does not include ORM support and is not intended as a full SQLAlchemy dialect. While it can support
43-
a range of Core-based applications beyond Superset, it may not be suitable for more complex SQLAlchemy applications
44-
that rely on full ORM or advanced dialect functionality.
45+
A small number of features require SQLAlchemy 2.x: `Values.cte()` and certain literal-rendering behaviors.
46+
All other dialect features, including those used by Superset, work on both 1.4 and 2.x.
47+
48+
Basic ORM usage works for insert-heavy, read-focused workloads: declarative model definitions, `CREATE TABLE`,
49+
`session.add()`, `bulk_save_objects()`, and read queries all function correctly. However, full ORM support is not
50+
provided. UPDATE compilation, foreign key/relationship reflection, autoincrement/RETURNING, and cascade operations
51+
are not implemented. The dialect is best suited for SQLAlchemy Core usage and Superset connectivity.
4552

4653
### Asyncio Support
4754

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
from clickhouse_connect import driver_name
22
from clickhouse_connect.cc_sqlalchemy.datatypes.base import schema_types
3-
from clickhouse_connect.cc_sqlalchemy.sql import final
4-
from clickhouse_connect.cc_sqlalchemy.sql.clauses import array_join, ArrayJoin
3+
from clickhouse_connect.cc_sqlalchemy.sql import final, sample
4+
from clickhouse_connect.cc_sqlalchemy.sql.clauses import array_join, ArrayJoin, ch_join, ClickHouseJoin
55

66
# pylint: disable=invalid-name
77
dialect_name = driver_name
88
ischema_names = schema_types
99

10-
__all__ = ['dialect_name', 'ischema_names', 'array_join', 'ArrayJoin', 'final']
10+
__all__ = ['dialect_name', 'ischema_names', 'array_join', 'ArrayJoin', 'ch_join', 'ClickHouseJoin', 'final', 'sample']

clickhouse_connect/cc_sqlalchemy/sql/__init__.py

Lines changed: 64 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@
55

66
from clickhouse_connect.driver.binding import quote_identifier
77

8+
# Dialect name used for non-rendering statement hints that only serve to
9+
# differentiate cache keys when FINAL/SAMPLE modifiers are applied.
10+
_CH_MODIFIER_DIALECT = "_ch_modifier"
11+
812

913
def full_table(table_name: str, schema: Optional[str] = None) -> str:
1014
if table_name.startswith('(') or '.' in table_name or not schema:
@@ -16,38 +20,61 @@ def format_table(table: Table):
1620
return full_table(table.name, table.schema)
1721

1822

19-
def final(select_stmt: Select, table: Optional[FromClause] = None) -> Select:
20-
"""
21-
Apply the ClickHouse FINAL modifier to a select statement.
22-
23-
Args:
24-
select_stmt: The SQLAlchemy Select statement to modify.
25-
table: Optional explicit table/alias to apply FINAL to. When omitted the
26-
method will use the single FROM element present on the select. A
27-
ValueError is raised if the statement has no FROMs or more than one
28-
FROM element and table is not provided.
29-
30-
Returns:
31-
A new Select that renders the FINAL modifier for the target table.
32-
"""
23+
def _resolve_target(select_stmt: Select, table: Optional[FromClause], method_name: str) -> FromClause:
24+
"""Resolve the target FROM clause for ClickHouse modifiers (FINAL/SAMPLE)."""
3325
if not isinstance(select_stmt, Select):
34-
raise TypeError("final() expects a SQLAlchemy Select instance")
26+
raise TypeError(f"{method_name}() expects a SQLAlchemy Select instance")
3527

3628
target = table
3729
if target is None:
3830
froms = select_stmt.get_final_froms()
3931
if not froms:
40-
raise ValueError("final() requires a table to apply the FINAL modifier.")
32+
raise ValueError(f"{method_name}() requires a table to apply the {method_name.upper()} modifier.")
4133
if len(froms) > 1:
4234
raise ValueError(
43-
"final() is ambiguous for statements with multiple FROM clauses. Specify the table explicitly."
35+
f"{method_name}() is ambiguous for statements with multiple FROM clauses. "
36+
"Specify the table explicitly."
4437
)
4538
target = froms[0]
4639

4740
if not isinstance(target, FromClause):
4841
raise TypeError("table must be a SQLAlchemy FromClause when provided")
4942

50-
return select_stmt.with_hint(target, "FINAL")
43+
return target
44+
45+
46+
def _target_cache_key(target: FromClause) -> str:
47+
"""Stable string identifying a FROM target for cache key differentiation."""
48+
if hasattr(target, "fullname"):
49+
return target.fullname
50+
return target.name
51+
52+
53+
# pylint: disable=protected-access
54+
def final(select_stmt: Select, table: Optional[FromClause] = None) -> Select:
55+
"""Apply the ClickHouse FINAL modifier to a select statement.
56+
57+
FINAL forces ClickHouse to merge data parts before returning results,
58+
guaranteeing fully collapsed rows for ReplacingMergeTree, CollapsingMergeTree,
59+
and similar engines.
60+
61+
Args:
62+
select_stmt: The SELECT statement to modify.
63+
table: The target table to apply FINAL to. Required when the query
64+
joins multiple tables, optional when there is a single FROM target.
65+
"""
66+
target = _resolve_target(select_stmt, table, "final")
67+
ch_final = getattr(select_stmt, "_ch_final", set())
68+
69+
if target in ch_final:
70+
return select_stmt
71+
72+
# with_statement_hint creates a generative copy and adds a non-rendering
73+
# hint that participates in the statement cache key.
74+
hint_key = _target_cache_key(target)
75+
new_stmt = select_stmt.with_statement_hint(f"FINAL:{hint_key}", dialect_name=_CH_MODIFIER_DIALECT)
76+
new_stmt._ch_final = ch_final | {target}
77+
return new_stmt
5178

5279

5380
def _select_final(self: Select, table: Optional[FromClause] = None) -> Select:
@@ -58,39 +85,27 @@ def _select_final(self: Select, table: Optional[FromClause] = None) -> Select:
5885

5986

6087
def sample(select_stmt: Select, sample_value: Union[str, int, float], table: Optional[FromClause] = None) -> Select:
61-
"""
62-
Apply ClickHouse SAMPLE clause to a select statement.
63-
Reference: https://clickhouse.com/docs/sql-reference/statements/select/sample
88+
"""Apply the ClickHouse SAMPLE modifier to a select statement.
89+
6490
Args:
65-
select_stmt: The SQLAlchemy Select statement to modify.
66-
sample_value: Controls the sampling behavior. Accepts three forms:
67-
- A float in (0, 1) for proportional sampling (e.g., 0.1 for ~10% of data).
68-
- A positive integer for row-count sampling (e.g., 10000000 for ~10M rows).
69-
- A string for fraction or offset notation (e.g., "1/10" or "1/10 OFFSET 1/2").
70-
table: Optional explicit table to apply SAMPLE to. When omitted the
71-
method will use the single FROM element present on the select. A
72-
ValueError is raised if the statement has no FROMs or more than one
73-
FROM element and table is not provided.
74-
75-
Returns:
76-
A new Select that renders the SAMPLE clause for the target table.
91+
select_stmt: The SELECT statement to modify.
92+
sample_value: The sample expression. Can be a float between 0 and 1
93+
for a fractional sample (e.g. 0.1 for 10%), an integer for an
94+
approximate row count, or a string for SAMPLE expressions like
95+
'1/10 OFFSET 1/2'.
96+
table: The target table to sample. Required when the query joins
97+
multiple tables, optional when there is a single FROM target.
7798
"""
78-
if not isinstance(select_stmt, Select):
79-
raise TypeError("sample() expects a SQLAlchemy Select instance")
80-
81-
target_table = table
82-
if target_table is None:
83-
froms = select_stmt.get_final_froms()
84-
if not froms:
85-
raise ValueError("sample() requires a FROM clause to apply the SAMPLE modifier.")
86-
if len(froms) > 1:
87-
raise ValueError("sample() is ambiguous for statements with multiple FROM clauses. Specify the table explicitly.")
88-
target_table = froms[0]
89-
90-
if not isinstance(target_table, FromClause):
91-
raise TypeError("table must be a SQLAlchemy FromClause when provided")
92-
93-
return select_stmt.with_hint(target_table, f"SAMPLE {sample_value}")
99+
target = _resolve_target(select_stmt, table, "sample")
100+
101+
hint_key = _target_cache_key(target)
102+
new_stmt = select_stmt.with_statement_hint(
103+
f"SAMPLE:{hint_key}:{sample_value}", dialect_name=_CH_MODIFIER_DIALECT
104+
)
105+
ch_sample = dict(getattr(select_stmt, "_ch_sample", {}))
106+
ch_sample[target] = sample_value
107+
new_stmt._ch_sample = ch_sample
108+
return new_stmt
94109

95110

96111
def _select_sample(self: Select, sample_value: Union[str, int, float], table: Optional[FromClause] = None) -> Select:

0 commit comments

Comments
 (0)