-
Notifications
You must be signed in to change notification settings - Fork 974
chore(wren-ai-service): minor updates #1782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe updates standardize SQL data type handling and improve SQL dialect flexibility by removing explicit Trino dialect specification in multiple Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant DDLBuilder as build_table_ddl
participant Utils as get_engine_supported_data_type
User->>DDLBuilder: Request DDL for table/metrics
DDLBuilder->>Utils: Normalize column data type
Utils-->>DDLBuilder: Return standardized type
DDLBuilder-->>User: Return DDL (excluding "unknown" types)
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate Unit Tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
98-110
: Consistent data type normalization with same filtering concerns.The implementation follows the same pattern as
build_table_ddl
incommon.py
, which is good for consistency. However, the same concern about filtering "unknown" types applies here - consider adding logging to track when this filtering occurs.Consider adding logging similar to the suggestion for
common.py
:def _build_metric_ddl(content: dict) -> str: + import logging + logger = logging.getLogger("wren-ai-service") + columns_ddl = [ f"{column['comment']}{column['name']} {get_engine_supported_data_type(column['data_type'])}" for column in content["columns"] if column["data_type"].lower() != "unknown" # quick fix: filtering out UNKNOWN column type ] + + # Log filtered unknown columns + unknown_columns = [col for col in content["columns"] if col["data_type"].lower() == "unknown"] + if unknown_columns: + logger.warning(f"Filtered {len(unknown_columns)} unknown type columns from metric '{content['name']}'")
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/common.py (1)
8-29
: Consider refactoring to address static analysis warning.The data type normalization function provides valuable standardization, but the static analysis warning about too many return statements suggests it could be refactored for better maintainability.
Consider using a dictionary mapping approach:
def get_engine_supported_data_type(data_type: str) -> str: """ This function makes sure downstream ai pipeline get column data types in a format that is supported by the data engine. """ + type_mappings = { + "BPCHAR": "VARCHAR", "NAME": "VARCHAR", "UUID": "VARCHAR", "INET": "VARCHAR", + "OID": "INT", + "BIGNUMERIC": "NUMERIC", + "BYTES": "BYTEA", + "DATETIME": "TIMESTAMP", + "FLOAT64": "DOUBLE", + "INT64": "BIGINT" + } + + return type_mappings.get(data_type.upper(), data_type.upper()) - match data_type.upper(): - case "BPCHAR" | "NAME" | "UUID" | "INET": - return "VARCHAR" - case "OID": - return "INT" - case "BIGNUMERIC": - return "NUMERIC" - case "BYTES": - return "BYTEA" - case "DATETIME": - return "TIMESTAMP" - case "FLOAT64": - return "DOUBLE" - case "INT64": - return "BIGINT" - case _: - return data_type.upper()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
wren-ai-service/eval/utils.py
(1 hunks)wren-ai-service/src/core/engine.py
(1 hunks)wren-ai-service/src/pipelines/common.py
(2 hunks)wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py
(2 hunks)wren-ai-service/tools/dev/.env
(1 hunks)wren-ai-service/tools/dev/docker-compose-dev.yaml
(1 hunks)wren-ai-service/tools/run_sql.py
(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
wren-ai-service/src/pipelines/common.py
[refactor] 8-8: Too many return statements (8/6)
(R0911)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pytest
- GitHub Check: pytest
- GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/tools/run_sql.py (1)
22-29
: ```shell
#!/bin/bashSearch for all Python usages of sqlglot.transpile to verify dialect parameter usage
rg -n "sqlglot.transpile" --type py -C 2
</details> <details> <summary>wren-ai-service/eval/utils.py (1)</summary> `27-34`: **Consistent SQL dialect flexibility improvement.** This change aligns with the similar update in `wren-ai-service/tools/run_sql.py`, ensuring consistent SQL parsing behavior across the codebase. </details> <details> <summary>wren-ai-service/src/core/engine.py (1)</summary> `53-63`: **Consistent dialect flexibility with proper error handling.** This change follows the same pattern as other modules while maintaining the explicit error level configuration, which is good for debugging SQL parsing issues. </details> <details> <summary>wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)</summary> `17-17`: **Good import addition for consistency.** The import of `get_engine_supported_data_type` ensures consistent data type handling across different DDL generation functions. </details> <details> <summary>wren-ai-service/tools/dev/docker-compose-dev.yaml (1)</summary> `30-30`: **Ensure all components consistently use the new `wren-engine` hostname** The endpoint host was updated from `engine` → `wren-engine`. Please grep the repo (UI, tests, CI scripts, docs) for any hard-coded `engine:` references that might have been missed, to avoid runtime DNS errors inside the Compose network. </details> <details> <summary>wren-ai-service/tools/dev/.env (1)</summary> `14-17`: **Version bumps – confirm image tags exist and remain mutually compatible** Tags `0.16.4`, `0.24.0`, and `0.29.2` look fine but are new. Before merging, pull each image locally or check GHCR to ensure: 1. The tags are published. 2. There are no breaking API changes between `wren-engine` ↔ `ibis-server` ↔ `wren-ui` for these specific versions. This avoids unexpected “manifest not found” or runtime incompatibility issues during `docker compose up`. </details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Summary by CodeRabbit
New Features
Bug Fixes
Chores
Refactor