chore(wren-ai-service): minor updates #1782

cyyeh · 2025-06-25T03:58:12Z

Summary by CodeRabbit

New Features
- Added normalization of database column data types to a standardized set for improved compatibility in AI pipelines.
Bug Fixes
- Columns with unknown data types are now excluded from generated DDL statements.
Chores
- Updated environment configuration and component version numbers.
- Adjusted Docker service endpoint configuration for improved connectivity.
Refactor
- Improved SQL dialect handling to support auto-detection instead of assuming a specific dialect.

coderabbitai · 2025-06-25T03:58:34Z

Walkthrough

The updates standardize SQL data type handling and improve SQL dialect flexibility by removing explicit Trino dialect specification in multiple add_quotes functions. A new utility function normalizes database column types, and logic is added to exclude columns with unknown types from DDL generation. Version numbers and Docker configuration are also updated.

Changes

File(s)	Change Summary
wren-ai-service/eval/utils.py wren-ai-service/tools/run_sql.py wren-ai-service/src/core/engine.py	Modified `add_quotes` to remove explicit `"trino"` dialect from `sqlglot.transpile`, setting `read=None` instead.
wren-ai-service/src/pipelines/common.py	Added `get_engine_supported_data_type` function; updated DDL logic to normalize and filter out "unknown" types.
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py	Imported and used `get_engine_supported_data_type`; filtered out "unknown" types in metric DDL generation.
wren-ai-service/tools/dev/.env	Updated version numbers for several components.
wren-ai-service/tools/dev/docker-compose-dev.yaml	Changed `WREN_ENGINE_ENDPOINT` host from `engine` to `wren-engine` for `ibis-server`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DDLBuilder as build_table_ddl
    participant Utils as get_engine_supported_data_type

    User->>DDLBuilder: Request DDL for table/metrics
    DDLBuilder->>Utils: Normalize column data type
    Utils-->>DDLBuilder: Return standardized type
    DDLBuilder-->>User: Return DDL (excluding "unknown" types)

Suggested reviewers

yichieh-lu
imAsterSun

Poem

In fields of code where data grows,
Types unknown the rabbit now throws.
With quotes less bound to Trino's way,
The dialect dances, free to sway.
Versions hop, endpoints leap,
A warren of updates—tidy and neat!
🐇✨

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate Unit Tests

Create PR with Unit Tests
Post Copyable Unit Tests in Comment
Commit Unit Tests in branch chore/ai-service/minor-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai auto-generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)
98-110: Consistent data type normalization with same filtering concerns.

The implementation follows the same pattern as build_table_ddl in common.py, which is good for consistency. However, the same concern about filtering "unknown" types applies here - consider adding logging to track when this filtering occurs.

Consider adding logging similar to the suggestion for common.py:
def _build_metric_ddl(content: dict) -> str:
+    import logging
+    logger = logging.getLogger("wren-ai-service")
+    
    columns_ddl = [
        f"{column['comment']}{column['name']} {get_engine_supported_data_type(column['data_type'])}"
        for column in content["columns"]
        if column["data_type"].lower()
        != "unknown"  # quick fix: filtering out UNKNOWN column type
    ]
+    
+    # Log filtered unknown columns
+    unknown_columns = [col for col in content["columns"] if col["data_type"].lower() == "unknown"]
+    if unknown_columns:
+        logger.warning(f"Filtered {len(unknown_columns)} unknown type columns from metric '{content['name']}'")

🧹 Nitpick comments (1)

wren-ai-service/src/pipelines/common.py (1)

8-29: Consider refactoring to address static analysis warning.

The data type normalization function provides valuable standardization, but the static analysis warning about too many return statements suggests it could be refactored for better maintainability.

Consider using a dictionary mapping approach:

def get_engine_supported_data_type(data_type: str) -> str:
    """
    This function makes sure downstream ai pipeline get column data types in a format that is supported by the data engine.
    """
+    type_mappings = {
+        "BPCHAR": "VARCHAR", "NAME": "VARCHAR", "UUID": "VARCHAR", "INET": "VARCHAR",
+        "OID": "INT",
+        "BIGNUMERIC": "NUMERIC",
+        "BYTES": "BYTEA",
+        "DATETIME": "TIMESTAMP",
+        "FLOAT64": "DOUBLE",
+        "INT64": "BIGINT"
+    }
+    
+    return type_mappings.get(data_type.upper(), data_type.upper())
-    match data_type.upper():
-        case "BPCHAR" | "NAME" | "UUID" | "INET":
-            return "VARCHAR"
-        case "OID":
-            return "INT"
-        case "BIGNUMERIC":
-            return "NUMERIC"
-        case "BYTES":
-            return "BYTEA"
-        case "DATETIME":
-            return "TIMESTAMP"
-        case "FLOAT64":
-            return "DOUBLE"
-        case "INT64":
-            return "BIGINT"
-        case _:
-            return data_type.upper()

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4bf4e1 and 209bddf.

📒 Files selected for processing (7)

wren-ai-service/eval/utils.py (1 hunks)
wren-ai-service/src/core/engine.py (1 hunks)
wren-ai-service/src/pipelines/common.py (2 hunks)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (2 hunks)
wren-ai-service/tools/dev/.env (1 hunks)
wren-ai-service/tools/dev/docker-compose-dev.yaml (1 hunks)
wren-ai-service/tools/run_sql.py (1 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

wren-ai-service/src/pipelines/common.py

[refactor] 8-8: Too many return statements (8/6)

(R0911)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: pytest
GitHub Check: pytest
GitHub Check: Analyze (go)

🔇 Additional comments (6)

wren-ai-service/tools/run_sql.py (1)

22-29: ```shell
#!/bin/bash

Search for all Python usages of sqlglot.transpile to verify dialect parameter usage

rg -n "sqlglot.transpile" --type py -C 2


</details>
<details>
<summary>wren-ai-service/eval/utils.py (1)</summary>

`27-34`: **Consistent SQL dialect flexibility improvement.**

This change aligns with the similar update in `wren-ai-service/tools/run_sql.py`, ensuring consistent SQL parsing behavior across the codebase.

</details>
<details>
<summary>wren-ai-service/src/core/engine.py (1)</summary>

`53-63`: **Consistent dialect flexibility with proper error handling.**

This change follows the same pattern as other modules while maintaining the explicit error level configuration, which is good for debugging SQL parsing issues.

</details>
<details>
<summary>wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)</summary>

`17-17`: **Good import addition for consistency.**

The import of `get_engine_supported_data_type` ensures consistent data type handling across different DDL generation functions.

</details>
<details>
<summary>wren-ai-service/tools/dev/docker-compose-dev.yaml (1)</summary>

`30-30`: **Ensure all components consistently use the new `wren-engine` hostname**

The endpoint host was updated from `engine` → `wren-engine`.  
Please grep the repo (UI, tests, CI scripts, docs) for any hard-coded `engine:` references that might have been missed, to avoid runtime DNS errors inside the Compose network.

</details>
<details>
<summary>wren-ai-service/tools/dev/.env (1)</summary>

`14-17`: **Version bumps – confirm image tags exist and remain mutually compatible**

Tags `0.16.4`, `0.24.0`, and `0.29.2` look fine but are new. Before merging, pull each image locally or check GHCR to ensure:

1. The tags are published.  
2. There are no breaking API changes between `wren-engine` ↔ `ibis-server` ↔ `wren-ui` for these specific versions.

This avoids unexpected “manifest not found” or runtime incompatibility issues during `docker compose up`.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

wren-ai-service/src/pipelines/common.py

cyyeh added 5 commits June 24, 2025 17:41

update

d26eb05

add engine supported data type

939adc5

update

820189a

fix tools

95db317

filtering out unknown

209bddf

cyyeh requested a review from yichieh-lu June 25, 2025 03:58

cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Jun 25, 2025

github-actions bot added the wren-ai-service label Jun 25, 2025

coderabbitai bot reviewed Jun 25, 2025

View reviewed changes

wren-ai-service/src/pipelines/common.py Show resolved Hide resolved

yichieh-lu approved these changes Jun 25, 2025

View reviewed changes

cyyeh merged commit 9d85e12 into main Jun 25, 2025
15 checks passed

cyyeh deleted the chore/ai-service/minor-updates branch June 25, 2025 04:46

coderabbitai bot mentioned this pull request Aug 1, 2025

chore(wren-ai-service): improve text2sql #1855

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(wren-ai-service): minor updates #1782

chore(wren-ai-service): minor updates #1782

Uh oh!

cyyeh commented Jun 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jun 25, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Search for all Python usages of sqlglot.transpile to verify dialect parameter usage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chore(wren-ai-service): minor updates #1782

chore(wren-ai-service): minor updates #1782

Uh oh!

Conversation

cyyeh commented Jun 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Search for all Python usages of sqlglot.transpile to verify dialect parameter usage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cyyeh commented Jun 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)