Skip to content

chore(wren-ai-service): minor updates #1782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 25, 2025
Merged

Conversation

cyyeh
Copy link
Member

@cyyeh cyyeh commented Jun 25, 2025

Summary by CodeRabbit

  • New Features

    • Added normalization of database column data types to a standardized set for improved compatibility in AI pipelines.
  • Bug Fixes

    • Columns with unknown data types are now excluded from generated DDL statements.
  • Chores

    • Updated environment configuration and component version numbers.
    • Adjusted Docker service endpoint configuration for improved connectivity.
  • Refactor

    • Improved SQL dialect handling to support auto-detection instead of assuming a specific dialect.

@cyyeh cyyeh requested a review from yichieh-lu June 25, 2025 03:58
@cyyeh cyyeh added module/ai-service ai-service related ci/ai-service ai-service related labels Jun 25, 2025
Copy link
Contributor

coderabbitai bot commented Jun 25, 2025

Walkthrough

The updates standardize SQL data type handling and improve SQL dialect flexibility by removing explicit Trino dialect specification in multiple add_quotes functions. A new utility function normalizes database column types, and logic is added to exclude columns with unknown types from DDL generation. Version numbers and Docker configuration are also updated.

Changes

File(s) Change Summary
wren-ai-service/eval/utils.py
wren-ai-service/tools/run_sql.py
wren-ai-service/src/core/engine.py
Modified add_quotes to remove explicit "trino" dialect from sqlglot.transpile, setting read=None instead.
wren-ai-service/src/pipelines/common.py Added get_engine_supported_data_type function; updated DDL logic to normalize and filter out "unknown" types.
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py Imported and used get_engine_supported_data_type; filtered out "unknown" types in metric DDL generation.
wren-ai-service/tools/dev/.env Updated version numbers for several components.
wren-ai-service/tools/dev/docker-compose-dev.yaml Changed WREN_ENGINE_ENDPOINT host from engine to wren-engine for ibis-server.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DDLBuilder as build_table_ddl
    participant Utils as get_engine_supported_data_type

    User->>DDLBuilder: Request DDL for table/metrics
    DDLBuilder->>Utils: Normalize column data type
    Utils-->>DDLBuilder: Return standardized type
    DDLBuilder-->>User: Return DDL (excluding "unknown" types)
Loading

Suggested reviewers

  • yichieh-lu
  • imAsterSun

Poem

In fields of code where data grows,
Types unknown the rabbit now throws.
With quotes less bound to Trino's way,
The dialect dances, free to sway.
Versions hop, endpoints leap,
A warren of updates—tidy and neat!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate Unit Tests
  • Create PR with Unit Tests
  • Post Copyable Unit Tests in Comment
  • Commit Unit Tests in branch chore/ai-service/minor-updates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)

98-110: Consistent data type normalization with same filtering concerns.

The implementation follows the same pattern as build_table_ddl in common.py, which is good for consistency. However, the same concern about filtering "unknown" types applies here - consider adding logging to track when this filtering occurs.

Consider adding logging similar to the suggestion for common.py:

def _build_metric_ddl(content: dict) -> str:
+    import logging
+    logger = logging.getLogger("wren-ai-service")
+    
    columns_ddl = [
        f"{column['comment']}{column['name']} {get_engine_supported_data_type(column['data_type'])}"
        for column in content["columns"]
        if column["data_type"].lower()
        != "unknown"  # quick fix: filtering out UNKNOWN column type
    ]
+    
+    # Log filtered unknown columns
+    unknown_columns = [col for col in content["columns"] if col["data_type"].lower() == "unknown"]
+    if unknown_columns:
+        logger.warning(f"Filtered {len(unknown_columns)} unknown type columns from metric '{content['name']}'")
🧹 Nitpick comments (1)
wren-ai-service/src/pipelines/common.py (1)

8-29: Consider refactoring to address static analysis warning.

The data type normalization function provides valuable standardization, but the static analysis warning about too many return statements suggests it could be refactored for better maintainability.

Consider using a dictionary mapping approach:

def get_engine_supported_data_type(data_type: str) -> str:
    """
    This function makes sure downstream ai pipeline get column data types in a format that is supported by the data engine.
    """
+    type_mappings = {
+        "BPCHAR": "VARCHAR", "NAME": "VARCHAR", "UUID": "VARCHAR", "INET": "VARCHAR",
+        "OID": "INT",
+        "BIGNUMERIC": "NUMERIC",
+        "BYTES": "BYTEA",
+        "DATETIME": "TIMESTAMP",
+        "FLOAT64": "DOUBLE",
+        "INT64": "BIGINT"
+    }
+    
+    return type_mappings.get(data_type.upper(), data_type.upper())
-    match data_type.upper():
-        case "BPCHAR" | "NAME" | "UUID" | "INET":
-            return "VARCHAR"
-        case "OID":
-            return "INT"
-        case "BIGNUMERIC":
-            return "NUMERIC"
-        case "BYTES":
-            return "BYTEA"
-        case "DATETIME":
-            return "TIMESTAMP"
-        case "FLOAT64":
-            return "DOUBLE"
-        case "INT64":
-            return "BIGINT"
-        case _:
-            return data_type.upper()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4bf4e1 and 209bddf.

📒 Files selected for processing (7)
  • wren-ai-service/eval/utils.py (1 hunks)
  • wren-ai-service/src/core/engine.py (1 hunks)
  • wren-ai-service/src/pipelines/common.py (2 hunks)
  • wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (2 hunks)
  • wren-ai-service/tools/dev/.env (1 hunks)
  • wren-ai-service/tools/dev/docker-compose-dev.yaml (1 hunks)
  • wren-ai-service/tools/run_sql.py (1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
wren-ai-service/src/pipelines/common.py

[refactor] 8-8: Too many return statements (8/6)

(R0911)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: pytest
  • GitHub Check: pytest
  • GitHub Check: Analyze (go)
🔇 Additional comments (6)
wren-ai-service/tools/run_sql.py (1)

22-29: ```shell
#!/bin/bash

Search for all Python usages of sqlglot.transpile to verify dialect parameter usage

rg -n "sqlglot.transpile" --type py -C 2


</details>
<details>
<summary>wren-ai-service/eval/utils.py (1)</summary>

`27-34`: **Consistent SQL dialect flexibility improvement.**

This change aligns with the similar update in `wren-ai-service/tools/run_sql.py`, ensuring consistent SQL parsing behavior across the codebase.

</details>
<details>
<summary>wren-ai-service/src/core/engine.py (1)</summary>

`53-63`: **Consistent dialect flexibility with proper error handling.**

This change follows the same pattern as other modules while maintaining the explicit error level configuration, which is good for debugging SQL parsing issues.

</details>
<details>
<summary>wren-ai-service/src/pipelines/retrieval/db_schema_retrieval.py (1)</summary>

`17-17`: **Good import addition for consistency.**

The import of `get_engine_supported_data_type` ensures consistent data type handling across different DDL generation functions.

</details>
<details>
<summary>wren-ai-service/tools/dev/docker-compose-dev.yaml (1)</summary>

`30-30`: **Ensure all components consistently use the new `wren-engine` hostname**

The endpoint host was updated from `engine` → `wren-engine`.  
Please grep the repo (UI, tests, CI scripts, docs) for any hard-coded `engine:` references that might have been missed, to avoid runtime DNS errors inside the Compose network.

</details>
<details>
<summary>wren-ai-service/tools/dev/.env (1)</summary>

`14-17`: **Version bumps – confirm image tags exist and remain mutually compatible**

Tags `0.16.4`, `0.24.0`, and `0.29.2` look fine but are new. Before merging, pull each image locally or check GHCR to ensure:

1. The tags are published.  
2. There are no breaking API changes between `wren-engine` ↔ `ibis-server` ↔ `wren-ui` for these specific versions.

This avoids unexpected “manifest not found” or runtime incompatibility issues during `docker compose up`.

</details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@cyyeh cyyeh merged commit 9d85e12 into main Jun 25, 2025
15 checks passed
@cyyeh cyyeh deleted the chore/ai-service/minor-updates branch June 25, 2025 04:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/ai-service ai-service related module/ai-service ai-service related wren-ai-service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants