diff --git a/topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb b/topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb
new file mode 100644
index 00000000..00988ad4
--- /dev/null
+++ b/topic/chatbot/table-augmented-generation/aws/cratedb_tag_inline_agent.ipynb
@@ -0,0 +1,1096 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "5170c824",
+ "metadata": {},
+ "source": [
+ "# π οΈ Timeseries QA with agentic LLMs & manuals\n",
+ "\n",
+ "Welcome to this **hands-on workshop** where we explore how to combine **LLMs (Large Language Models)** with **machine telemetry data** and **equipment manuals** for smart diagnostics. With the help of an **AI agent** we will talk to our data in natural language.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "## π― Workshop Goals\n",
+ "\n",
+ "By the end of this session, you'll be able to:\n",
+ "\n",
+ "- β
**Query timeseries data** using natural language\n",
+ "- β
**Detect anomalies** using both data and manual thresholds\n",
+ "- β
**Interpret machine behavior** by combining real-time metrics with context from manuals\n",
+ "- β
**Generate SQL and visualizations** with the help of an LLM\n",
+ "- β
**Explain telemetry results** in plain English\n",
+ "- β
**Define an AI agent** and understand its basic building blocks such as model context protocol (MCP) servers\n",
+ "\n",
+ "---\n",
+ "\n",
+ "## π§ Why This Matters\n",
+ "\n",
+ "Modern machines generate a huge volume of telemetry data (vibration, temperature, speed, etc.). Understanding this data is critical for:\n",
+ "\n",
+ "- π **Detecting anomalies before they cause failures**\n",
+ "- π **Explaining why something is behaving abnormally**\n",
+ "- π§° **Making maintenance more proactive and data-driven**\n",
+ "\n",
+ "But reading raw data isn't enough...\n",
+ "\n",
+ "That's why this notebook shows how **AI assistants can help domain experts and analysts** by combining:\n",
+ "\n",
+ "- π **Telemetry data** (from CrateDB)\n",
+ "- π **Manuals and expert context** (stored in SQL)\n",
+ "- π¬ **Natural language** (as the interface)\n",
+ "\n",
+ "---\n",
+ "\n",
+ "## π¦ What You'll Build\n",
+ "\n",
+ "Over the course of this workshop, you'll create a system that can:\n",
+ "\n",
+ "- Load and explore machine telemetry stored in CrateDB\n",
+ "- Ask questions like:\n",
+ " - _βIs machine 5 overheating?β_\n",
+ " - _βWhat should I do if machine 3 has an anomaly?β_\n",
+ "- Generate relevant SQL queries, tapping into the large number of LLMs available on AWS Bedrock\n",
+ "- Use an MCP server in connection with an agent, to make the smart assistant extendable with other services\n",
+ "\n",
+ "All of this will run **in a single Jupyter notebook** β no frontend or backend code needed.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "Letβs get started! π"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a8cea826",
+ "metadata": {},
+ "source": [
+ "## Step 1: Setup & Installation \n",
+ "\n",
+ "π οΈ Setup and Installation\n",
+ "In this step, we install all required Python packages to run the workshop.\n",
+ "Our hosted Jupyter notebook service already includes many by default, but we ensure compatibility and version alignment here."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cc578ad1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Install dependencies\n",
+ "%pip install -U \\\n",
+ " pandas matplotlib ipython-sql tqdm \\\n",
+ " crate cratedb-mcp==0.0.3 sqlalchemy-cratedb \\\n",
+ " \"inlineagent @ git+https://github.com/awslabs/amazon-bedrock-agent-samples@4a5d72a#subdirectory=src/InlineAgent\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "228c723b",
+ "metadata": {},
+ "source": [
+ "## Step 2: Generate and Store Synthetic Timeseries Data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "157242c8",
+ "metadata": {},
+ "source": [
+ "### Connect to CrateDB\n",
+ "\n",
+ "\n",
+ "For this workshop, weβll use **CrateDB** as our database to store both:\n",
+ "\n",
+ "- π **Timeseries telemetry data** (e.g., vibration, temperature, rotations)\n",
+ "- π **Machine manuals** (e.g., anomaly thresholds, emergency protocols)\n",
+ "\n",
+ "CrateDB is a distributed SQL database optimized for **real-time analytics on machine data and IoT workloads**. It blends the scalability of NoSQL with the familiarity and power of SQL β making it ideal for hybrid scenarios like combining sensor readings with structured documents.\n",
+ "\n",
+ "In this notebook, weβll use CrateDB to:\n",
+ "\n",
+ "- Store synthetic telemetry data across multiple machines\n",
+ "- Store matching operational manuals per machine\n",
+ "- Use natural language to **query both datasets together**\n",
+ "- Detect anomalies, extract insights, and generate contextual diagnostics\n",
+ "\n",
+ "You can use **CrateDB Cloud** to get started without any setup:\n",
+ "π [Launch a free cluster on CrateDB Cloud](https://console.cratedb.cloud/)\n",
+ "\n",
+ "Alternatively, you can also run CrateDB locally using Docker.\n",
+ "\n",
+ "Letβs connect and load our first dataset.\n",
+ "\n",
+ "Please **adjust the connection string** to point to your CrateDB cluster."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2585e16e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import sqlalchemy as sa\n",
+ "import pandas as pd\n",
+ "\n",
+ "# Option 1: CrateDB Cloud (Update or set via CRATEDB_CONNECTION_STRING env variable)\n",
+ "# Example: crate://admin:ooToh2Paecielun@demo.eks1.eu-west-1.aws.cratedb.net/?ssl=true\n",
+ "CONNECTION_STRING = os.environ.get(\n",
+ " \"CRATEDB_CONNECTION_STRING\",\n",
+ " \"crate://USER:PASSWORD@CRATEDB_HOST/?ssl=true\",\n",
+ ")\n",
+ "\n",
+ "# Option 2: Localhost setup\n",
+ "# CONNECTION_STRING = os.environ.get(\"CRATEDB_CONNECTION_STRING\", \"crate://crate@localhost/\")\n",
+ "\n",
+ "# Try to connect\n",
+ "try:\n",
+ " engine = sa.create_engine(CONNECTION_STRING)\n",
+ " connection = engine.connect()\n",
+ "\n",
+ " # Run a simple query to validate connection\n",
+ " result = pd.read_sql(\"SELECT mountain FROM sys.summits LIMIT 1\", con=engine)\n",
+ " print(\"β
Successfully connected to CrateDB!\")\n",
+ " print(\"Sample query result from sys.summits:\", result.iloc[0][\"mountain\"])\n",
+ "except Exception as e:\n",
+ " print(\"β Failed to connect to CrateDB. Please check your connection string.\")\n",
+ " print(\"Error:\", e)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7a0c1726",
+ "metadata": {},
+ "source": [
+ "### Define the Table Schema in CrateDB\n",
+ "\n",
+ "Before inserting data, we explicitly define the `motor_readings` table in CrateDB. This ensures consistent data types and structure, which is especially important when working in production environments or collaborating across teams.\n",
+ "\n",
+ "The table will store telemetry for each machine, including timestamped readings for vibration, temperature, and rotations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "856582bb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sqlalchemy import text\n",
+ "\n",
+ "# Define the CREATE TABLE statement\n",
+ "create_table_sql = text(\n",
+ " \"\"\"\n",
+ " CREATE TABLE IF NOT EXISTS motor_readings (\n",
+ " machine_id INTEGER,\n",
+ " timestamp TIMESTAMP WITHOUT TIME ZONE,\n",
+ " vibration DOUBLE PRECISION,\n",
+ " temperature DOUBLE PRECISION,\n",
+ " rotations DOUBLE PRECISION,\n",
+ " PRIMARY KEY (machine_id, timestamp)\n",
+ " );\n",
+ " \"\"\"\n",
+ ")\n",
+ "\n",
+ "try:\n",
+ " connection.execute(create_table_sql)\n",
+ " print(\"β
Table 'motor_readings' created (if not already existing).\")\n",
+ "except Exception as e:\n",
+ " print(\"β Failed to create table.\")\n",
+ " print(\"Error:\", e)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dfaebedc",
+ "metadata": {},
+ "source": [
+ "### Generate & Load Timeseries Data\n",
+ "\n",
+ "Letβs generate synthetic telemetry data for 10 machines and store it in CrateDB under the table `motor_readings`.\n",
+ "This table will serve as the base for all LLM queries and visual analytics in the next steps.\n",
+ "\n",
+ "You can modify the number of machines, simulation days, or reading frequency by adjusting the configuration block below.\n",
+ "This gives you full control over the size and granularity of your synthetic timeseries dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b8084239",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from tqdm import tqdm\n",
+ "import numpy as np\n",
+ "import datetime\n",
+ "\n",
+ "# --- Configuration ---\n",
+ "num_machines = 10 # Number of machines to simulate\n",
+ "days = 30 # Number of days to simulate\n",
+ "freq_minutes = 15 # Frequency of readings (in minutes)\n",
+ "\n",
+ "\n",
+ "# --- Data Generation ---\n",
+ "def generate_timeseries_data(num_machines, days, freq_minutes):\n",
+ " total_intervals = int((24 * 60 / freq_minutes) * days)\n",
+ " timestamps = [\n",
+ " datetime.datetime.now() - datetime.timedelta(minutes=freq_minutes * i)\n",
+ " for i in range(total_intervals)\n",
+ " ]\n",
+ " data = []\n",
+ "\n",
+ " for machine_id in tqdm(range(num_machines)):\n",
+ " for t in timestamps:\n",
+ " vibration = np.round(np.random.normal(1.0, 0.2), 4)\n",
+ " temperature = np.round(np.random.normal(45, 2.5), 2)\n",
+ " rotations = np.round(np.random.normal(1600, 30), 2)\n",
+ " data.append([t, vibration, temperature, rotations, machine_id])\n",
+ "\n",
+ " df = pd.DataFrame(\n",
+ " data,\n",
+ " columns=[\"timestamp\", \"vibration\", \"temperature\", \"rotations\", \"machine_id\"],\n",
+ " )\n",
+ " return df\n",
+ "\n",
+ "\n",
+ "# --- Generate & Preview ---\n",
+ "df_ts = generate_timeseries_data(num_machines, days, freq_minutes)\n",
+ "print(f\"β
Generated {len(df_ts)} rows of synthetic timeseries data.\")\n",
+ "\n",
+ "# --- Load to CrateDB ---\n",
+ "df_ts.to_sql(\"motor_readings\", con=engine, if_exists=\"append\", index=False)\n",
+ "print(\"β
Data loaded into CrateDB table 'motor_readings'.\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5e73b260",
+ "metadata": {},
+ "source": [
+ "## Step 3: Previewing and Exploring the Data\n",
+ "\n",
+ "### Explore the Timeseries Data in CrateDB\n",
+ "Now that weβve generated and loaded our synthetic telemetry data, letβs run some SQL queries to explore it.\n",
+ "Weβll check how many rows were inserted, and preview a few example records."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7e53cba8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Query: Count total records\n",
+ "df_count = pd.read_sql(\"SELECT COUNT(*) as total_rows FROM motor_readings\", con=engine)\n",
+ "print(f\"π’ Total rows in 'motor_readings': {df_count.iloc[0]['total_rows']}\")\n",
+ "\n",
+ "# Query: Preview 5 sample rows (formatted timestamps)\n",
+ "query_preview = \"\"\"\n",
+ "SELECT\n",
+ " timestamp,\n",
+ " vibration,\n",
+ " temperature,\n",
+ " rotations,\n",
+ " machine_id\n",
+ "FROM motor_readings\n",
+ "ORDER BY timestamp DESC\n",
+ "LIMIT 5\n",
+ "\"\"\"\n",
+ "\n",
+ "df_preview = pd.read_sql(query_preview, con=engine)\n",
+ "df_preview[\"timestamp\"] = pd.to_datetime(df_preview.timestamp, unit=\"ms\")\n",
+ "\n",
+ "print(\"π Sample records:\")\n",
+ "df_preview"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b02d28a",
+ "metadata": {},
+ "source": [
+ "## Step 4: Natural Language Querying with an LLM (Table-Augmented Generation)\n",
+ "\n",
+ "### Ask Questions in Natural Language\n",
+ "\n",
+ "In this step, we use an LLM to convert plain language questions into SQL queries and run them against our timeseries data in CrateDB.\n",
+ "This is an example of **Table-Augmented Generation (TAG)** β combining large language models with structured data.\n",
+ "\n",
+ "Youβll be able to ask questions like:\n",
+ "\n",
+ "- What is the average rotation for machine 3?\n",
+ "- When was the last anomaly for machine 5?\n",
+ "- How many temperature spikes were there last week?\n",
+ "\n",
+ "We will start with an implementation that still relies on classical programming to extract schema information. Our assistant will becomes more lightweight later on."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6289ccfc",
+ "metadata": {},
+ "source": [
+ "### Schema Extraction\n",
+ "\n",
+ "Here we define a helper method that queries column information for a given table. It will provide the LLM with knowledge of how columns are named and what their data type are."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ebb6ce59",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Get table schema (columns and types) from CrateDB\n",
+ "def fetch_table_schema(table_name):\n",
+ " \"Fetch the column names and data types for a given table from CrateDB's system catalog.\"\n",
+ "\n",
+ " query = text(\n",
+ " \"\"\"\n",
+ " SELECT column_name, data_type\n",
+ " FROM information_schema.columns\n",
+ " WHERE table_name = :tbl\n",
+ " ORDER BY ordinal_position\n",
+ " \"\"\"\n",
+ " )\n",
+ " try:\n",
+ " stmt = query.bindparams(tbl=table_name)\n",
+ " df = pd.read_sql(stmt, con=engine)\n",
+ "\n",
+ " schema_text = f\"Table: {table_name}\\nColumns:\\n\"\n",
+ " for _, row in df.iterrows():\n",
+ " schema_text += f\"- {row['column_name']} ({row['data_type']})\\n\"\n",
+ " return schema_text\n",
+ " except Exception as e:\n",
+ " print(f\"β Error fetching schema for table '{table_name}':\", e)\n",
+ " return f\"Error fetching schema for {table_name}\"\n",
+ "\n",
+ "\n",
+ "print(\"π Schema output for motor_readings table:\\n\")\n",
+ "print(fetch_table_schema(\"motor_readings\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ede9c5b1",
+ "metadata": {},
+ "source": [
+ "### Define Prompt Template\n",
+ "\n",
+ "We now define the prompt to the LLM. We use [AWS Bedrock](https://aws.amazon.com/bedrock/), which is a marketplace for a large number of LLMs. The `modelId` parameter configures which exact model to use, and we can easily change it to query other models if needed. For this workshop, we will go with AWS' [Nova model](https://aws.amazon.com/ai/generative-ai/nova/). \n",
+ "\n",
+ "The LLM returns a reply with the resulting SQL query, as well as some additional explanations about its thought process. The LLM doesn't execute the SQL itself, so we need to:\n",
+ "\n",
+ "1. **Extract the SQL statement from the LLM's reply.**\n",
+ "\n",
+ " The LLM comes back with a rather verbose reply. It may repeat again the question it was asked or state certain assumptions that it made. We are only interested in the plain SQL, so we need to extract it from the reply.\n",
+ "\n",
+ "2. **Connect to CrateDB and execute the SQL statement ourselves.**\n",
+ "\n",
+ " Once we know the SQL statement, we can easily connect to CrateDB using our standard Python driver and run the statement.\n",
+ "\n",
+ "***\n",
+ "\n",
+ "The LLM prompt itself also consists of two parts:\n",
+ "\n",
+ "1. π€ **System prompt**\n",
+ "\n",
+ " This is a set of instructions for the LLM on how it should behave. We can explain what expectations we have towards the LLM, and help it understand the context of our user questions.\n",
+ " When it comes to the generation of SQL statements, we can formulate certain rules that it should follow, e.g. patterns to apply when a specific keyword is used in the question.\n",
+ "\n",
+ " The system prompt is typically static and doesn't change across different user questions.\n",
+ "\n",
+ "2. π **User prompt**\n",
+ "\n",
+ " Here we inject the actual user question. We do not do any preprocessing, validation, etc. but leave it all to the LLM to make sense of it in the context of the provided system prompt.\n",
+ "\n",
+ "Let's define a method that talks to the LLM for us:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "101bdcbd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "import re\n",
+ "import boto3\n",
+ "\n",
+ "client = boto3.client(\"bedrock-runtime\")\n",
+ "\n",
+ "\n",
+ "def prompt_llm(table_schema: str, question: str) -> str:\n",
+ " \"\"\"\n",
+ " Defines the prompt for the LLM.\n",
+ "\n",
+ " It uses system prompts to give context to the LLM.\n",
+ " The user prompt consists of schema information and the actual question.\n",
+ " \"\"\"\n",
+ "\n",
+ " # Format the request payload using the model's native structure.\n",
+ " # See https://docs.aws.amazon.com/nova/latest/userguide/complete-request-schema.html\n",
+ " native_request = {\n",
+ " \"system\": [\n",
+ " {\n",
+ " \"text\": \"You are a CrateDB expert. Your task is to generate SQL queries in CrateDB's SQL syntax.\",\n",
+ " },\n",
+ " {\n",
+ " \"text\": \"\"\"\n",
+ " Assume an **anomaly** is defined as:\n",
+ " vibration > 1.5 OR temperature > 80 OR rotations > 500\n",
+ "\n",
+ " This definition is for reference only β do not apply anomaly filters unless the userβs question explicitly asks about anomalies.\n",
+ " \"\"\"\n",
+ " },\n",
+ " {\n",
+ " \"text\": \"\"\"\n",
+ " Rules for generating SQL queries:\n",
+ " - Always include the 'timestamp' column in the SELECT clause for any question involving plotting, visualizations, trends, over time, or per day / week / hour.\n",
+ " - Only exclude 'timestamp' for pure aggregations (e.g., total counts without time).\n",
+ " - When using date intervals, always include both the quantity and the unit in a string, e.g. INTERVAL '7 days'.\n",
+ " - If using an aggregation function (e.g., MAX, AVG) with other fields, include a proper GROUP BY clause.\n",
+ " \"\"\"\n",
+ " },\n",
+ " ],\n",
+ " \"messages\": [\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"content\": [\n",
+ " {\n",
+ " \"text\": f\"The following table schema is available: {table_schema}\",\n",
+ " },\n",
+ " {\n",
+ " \"text\": question,\n",
+ " },\n",
+ " ],\n",
+ " },\n",
+ " ],\n",
+ " }\n",
+ "\n",
+ " # Convert the native request to JSON.\n",
+ " request = json.dumps(native_request)\n",
+ "\n",
+ " # Invoke the model\n",
+ " response = client.invoke_model(\n",
+ " modelId=\"us.amazon.nova-pro-v1:0\",\n",
+ " body=request,\n",
+ " )\n",
+ "\n",
+ " # There is lots of metadata returned, we are only interested in the actual response\n",
+ " response_body = json.loads(response[\"body\"].read())\n",
+ " return response_body[\"output\"][\"message\"][\"content\"][0][\"text\"]\n",
+ "\n",
+ "\n",
+ "def get_sql_from_llm(question: str, table_name: str = \"motor_readings\"):\n",
+ " \"\"\"\n",
+ " This puts all the building blocks together by:\n",
+ " 1. Retrieving the table schema\n",
+ " 2. Invoking the LLM\n",
+ " 3. Extracting the SQL query\n",
+ " \"\"\"\n",
+ "\n",
+ " # Dynamically fetch schema\n",
+ " table_schema = fetch_table_schema(table_name)\n",
+ "\n",
+ " # Prompt the LLM\n",
+ " response = prompt_llm(table_schema, question)\n",
+ "\n",
+ " # Optional: Uncomment the line below to see the complete LLM response\n",
+ " # print(\"Debug: The LLM output was: \\n\", response)\n",
+ "\n",
+ " # Extract the actual SQL query from the response\n",
+ " match = re.search(\"(?<=`sql)([^`]*)(?=`)\", response)\n",
+ " if match:\n",
+ " return match.group(1).strip()\n",
+ "\n",
+ " raise ValueError(\"Failed to extract SQL from LLM reply\", response)\n",
+ "\n",
+ "print(\"β
Functions `prompt_llm` and `get_sql_from_llm` defined successfully\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "49a68349",
+ "metadata": {},
+ "source": [
+ "### Code Cell: Ask a Question β Run SQL β Show Result\n",
+ "You can customize the question below to ask anything about the timeseries data using plain English.\n",
+ "\n",
+ "The assistant will translate your question into SQL, run it on the `motor_readings` table, and return the result.\n",
+ "\n",
+ "Below you will find a set of example questions. **Uncomment** one of the `question` assignments **and run** them to see the result. Feel free to **try you own questions** as well to get a feeling for how the LLM works.\n",
+ "\n",
+ "\n",
+ "---\n",
+ "**π‘ NOTE**\n",
+ "\n",
+ "When trying your own questions, you may also have to extend the system prompt if the LLM has difficulties with your question.\n",
+ "\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "10b72ecc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Ask a question in natural language\n",
+ "\n",
+ "question = \"What was the average rotation for machine 3 the last week?\"\n",
+ "# question = \"When was the last recorded anomaly?\"\n",
+ "# question = \"How many readings had a vibration greater than 1.5?\"\n",
+ "# question = \"What was the number of anomalies per machine in the last 48 hours?\"\n",
+ "\n",
+ "# Convert to SQL\n",
+ "sql_query = get_sql_from_llm(question)\n",
+ "\n",
+ "# Collect and format output in a single list\n",
+ "output = []\n",
+ "output.append(\"π§ Generated SQL:\")\n",
+ "output.append(sql_query)\n",
+ "\n",
+ "try:\n",
+ " # Execute the SQL\n",
+ " df_result = pd.read_sql(sql_query, con=engine)\n",
+ " output.append(\"\\nβ
Query executed. Result:\")\n",
+ " output.append(df_result.to_string(index=False)) # One box output\n",
+ "except Exception as e:\n",
+ " output.append(\"β Error running query:\")\n",
+ " output.append(str(e))\n",
+ "\n",
+ "# Print all in one block\n",
+ "print(\"\\n\".join(output))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "26477c0b",
+ "metadata": {},
+ "source": [
+ "## Step 5: Visualizing Timeseries Data with Natural Language\n",
+ "\n",
+ "Getting a reply to our questions is already a nice result. But often, the answer to our questions is not just a single number or a few rows. And at some point, humans have difficulties grasping the content of large amounts of text.\n",
+ "\n",
+ "Therefore, we add a bit of processing logic to detect if it makes sense to switch to a visual representation of the result. In one of the system prompts, we told the LLM to always include a `timestamp` column unless just a single row or value is returned. We take advantage of this rule now, and decide if we should plot the result or not based on the presence of the `timestamp` column using Matplotlib.\n",
+ "\n",
+ "This lets you:\n",
+ "- Plot machine readings over time\n",
+ "- Compare metrics like vibration, temperature, and rotations\n",
+ "- Quickly identify anomalies or trends\n",
+ "\n",
+ "Example questions:\n",
+ "\n",
+ "- Show temperature and vibration for machine 2 over time.\n",
+ "- Plot the average rotation per machine.\n",
+ "- Show the number of anomalies per day."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f870d7f",
+ "metadata": {},
+ "source": [
+ "### Ask a Question β LLM Generates SQL β Plot with Matplotlib\n",
+ "\n",
+ "Weβll use the same `get_sql_from_llm` function, then add a logic layer to check if the result has a `timestamp` column (for time-based plotting). \n",
+ "\n",
+ "The assistant will generate SQL and visualize the result as a timeseries chart."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "10fee9fa",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import matplotlib.dates as mdates\n",
+ "\n",
+ "# Step 1: Ask a visualization-friendly question\n",
+ "question = \"Show temperature, rotation and vibration for machine 2 last Monday.\"\n",
+ "# question = \"Show average temperature, rotation and vibration for machine 2 on Mondays.\"\n",
+ "# question = \"Plot the average rotation and temperature for the last week. Rename the start_time column to timestamp.\"\n",
+ "\n",
+ "# Step 2: Get SQL from LLM\n",
+ "sql_query = get_sql_from_llm(question)\n",
+ "print(\"π§ Generated SQL:\\n\", sql_query)\n",
+ "\n",
+ "# Step 3: Run the query\n",
+ "try:\n",
+ " df_result = pd.read_sql(sql_query, con=engine)\n",
+ " print(\"β
Query returned\", len(df_result), \"rows.\")\n",
+ "\n",
+ " # Step 4: Try to plot if timestamp column is present\n",
+ " if \"timestamp\" in df_result.columns:\n",
+ " # Ensure timestamp is datetime and sorted (handle epoch ms)\n",
+ " df_result = df_result.sort_values(\"timestamp\")\n",
+ "\n",
+ " # Convert epoch millis to datetime if needed\n",
+ " if df_result[\"timestamp\"].dtype in [\"int64\", \"float64\"]:\n",
+ " df_result[\"timestamp\"] = pd.to_datetime(df_result[\"timestamp\"], unit=\"ms\")\n",
+ " else:\n",
+ " df_result[\"timestamp\"] = pd.to_datetime(df_result[\"timestamp\"])\n",
+ "\n",
+ " df_result.set_index(\"timestamp\", inplace=True)\n",
+ "\n",
+ " # Plot numeric columns\n",
+ " fig, ax = plt.subplots(figsize=(14, 5))\n",
+ " df_result.plot(ax=ax, title=question)\n",
+ "\n",
+ " # Format x-axis for better readability\n",
+ " ax.set_xlabel(\"Timestamp\")\n",
+ " ax.set_ylabel(\"Value\")\n",
+ " ax.grid(True)\n",
+ " ax.xaxis.set_major_locator(mdates.AutoDateLocator())\n",
+ " ax.xaxis.set_major_formatter(mdates.DateFormatter(\"%Y-%m-%d\\n%H:%M\"))\n",
+ "\n",
+ " plt.xticks(rotation=45)\n",
+ " plt.tight_layout()\n",
+ " plt.show()\n",
+ " else:\n",
+ " print(\"No 'timestamp' column in result β skipping visualization.\")\n",
+ " print(\n",
+ " \"Tip: Ask a time-based question like '...over time' or 'per day' to enable plotting.\"\n",
+ " )\n",
+ " display(df_result)\n",
+ "except Exception as e:\n",
+ " print(\"β Error during SQL execution or plotting:\")\n",
+ " print(e)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "27be8bd6",
+ "metadata": {},
+ "source": [
+ "## Step 6: Turn it into an AI Agent\n",
+ "\n",
+ "The previous steps have established the ability to generate and run queries from natural language. However, there are a few difficulties with this approach:\n",
+ "\n",
+ "π§ We need to manually implement retrieval of table schemas, and pass them as part of the prompt.\n",
+ "\n",
+ "π§ The LLM doesn't get the CrateDB SQL grammar right in a number of cases. System prompts are needed to steer the LLM into the right direction. Asking new questions likely will result in the need for additional system propmpts.\n",
+ "\n",
+ "π§ We still need to program the execution of the SQL query the LLM came up with.\n",
+ "\n",
+ "Advantages of the **agent approach**:\n",
+ "\n",
+ "β
An agent allows to integrate multiple services (called \"tools\") and process information from them. It is able to bridge the gap between tools.\n",
+ "\n",
+ "β
The CrateDB MCP Server implements a `get_table_metadata` tool for retrieving schema information. The LLM will call this tool when it needs schema information to generate a query. We no longer need to manually encode the schema into the prompt.\n",
+ "\n",
+ "β
Also due to the `query_sql` tool of the CrateDB MCP Server, the agent can execute the SQL query it came up with.\n",
+ "\n",
+ "β
The CrateDB MCP Server also has a `fetch_cratedb_docs` tool. If the LLM is in doubt how to generate a query, it can consult the CrateDB documentation. This is also helpful for features added to CrateDB after the LLM's cutoff date, which is often more than a year back.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "**Agents** are usually defined as permanent resources in AWS Bedrock. But there is also the option for them to be **transient (inline)** and only exist during runtime. This is very useful for **development and rapid prototyping**, which comes handy for this workshop.\n",
+ "\n",
+ "Let's start with defining an inline agent, configure the CrateDB MCP server, and pass an action group with it to the agent."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cbbb38f2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from urllib import parse\n",
+ "\n",
+ "from mcp import StdioServerParameters\n",
+ "\n",
+ "from InlineAgent.tools.mcp import MCPStdio\n",
+ "from InlineAgent.action_group import ActionGroup\n",
+ "from InlineAgent.agent import InlineAgent\n",
+ "from InlineAgent import AgentAppConfig\n",
+ "\n",
+ "config = AgentAppConfig()\n",
+ "\n",
+ "\n",
+ "def sqlalchemy_to_http(connection_string: str) -> str:\n",
+ " \"\"\"\n",
+ " Due to a technicality, we need to translate between two different types of connection URLs.\n",
+ " pandas was using a SQLAlchemy-style connection URL (crate://...),\n",
+ " while the MCP server uses a standard HTTP URL with basic auth.\n",
+ " \"\"\"\n",
+ "\n",
+ " parsed = parse.urlparse(connection_string)\n",
+ " if \"ssl=true\" in parsed.query:\n",
+ " protocol = \"https\"\n",
+ " else:\n",
+ " protocol = \"http\"\n",
+ "\n",
+ " return f\"{protocol}://{parsed.username}:{parse.quote(parsed.password)}@{parsed.hostname}:4200\"\n",
+ "\n",
+ "\n",
+ "async def query_agent(question):\n",
+ " \"\"\"\n",
+ " This method defines an AWS Bedrock inline agent during runtime:\n",
+ " https://docs.aws.amazon.com/bedrock/latest/userguide/agents-create-inline.html\n",
+ "\n",
+ " The action group contains one or more actions that the agent can perform.\n",
+ " In our case, we have only one action, which is reaching out to the CrateDB MCP server.\n",
+ " The CrateDB MCP server has several tools that it offers, such as `query_sql`, `get_table_metadata`, etc.\n",
+ " \"\"\"\n",
+ "\n",
+ " cratedb_mcp_client = await MCPStdio.create(\n",
+ " server_params=StdioServerParameters(\n",
+ " # The MCP server is a Python script that we installed earlier as part of `pip install`.\n",
+ " # It got placed in the `bin` directory of our virtual environment.\n",
+ " command=\"cratedb-mcp\",\n",
+ " args=[\"serve\"],\n",
+ " env={\n",
+ " \"CRATEDB_CLUSTER_URL\": sqlalchemy_to_http(CONNECTION_STRING),\n",
+ " \"CRATEDB_MCP_TRANSPORT\": \"stdio\",\n",
+ " },\n",
+ " )\n",
+ " )\n",
+ "\n",
+ " # The action group containing our MCP server.\n",
+ " # Other types of actions may be OpenAPI schemas or Python methods.\n",
+ " # https://docs.aws.amazon.com/bedrock/latest/userguide/action-define.html\n",
+ " #\n",
+ " # We stick to only CrateDB here for the sake of the workshop,\n",
+ " # although the real power of agents comes from connecting multiple components.\n",
+ " action_group = ActionGroup(\n",
+ " name=\"CratateDBActionGroup\",\n",
+ " mcp_clients=[cratedb_mcp_client],\n",
+ " )\n",
+ "\n",
+ " return await InlineAgent(\n",
+ " foundation_model=\"us.amazon.nova-pro-v1:0\",\n",
+ " instruction=f\"\"\"\n",
+ " You are a friendly assistant who receives information from CrateDB.\n",
+ " Your task is to translate questions into SQL queries, run them on CrateDB, and return back results.\n",
+ " Try to generate SQL queries based on the known data model and don't ask questions back.\n",
+ "\n",
+ " You have the following tools available:\n",
+ " 1. `query_sql`: Executes SQL queries on CrateDB\n",
+ " 2. `get_cratedb_documentation`: Returns the table of contents for the CrateDB documentation. If in doubt about CrateDB-specific syntax, you can obtain the documentation here.\n",
+ " 3. `fetch_cratedb_docs`: Once a specific link within the CrateDB documentation is identified, you can download its content here by providing the link.\n",
+ " 4. `get_table_metadata`: This returns all metadata for tables in CrateDB.\n",
+ "\n",
+ " Try to reason and give an interpretation of the result.\n",
+ "\n",
+ " When asked about manuals, query the `manual` column of the `machine_manuals` table to retrieve the manual. Interpret its content to provide an answer.\n",
+ "\n",
+ " Rules for writing SQL queries:\n",
+ " - To retrieve the latest value for a column, use CrateDB's `MAX_BY` function.\n",
+ " - When using date intervals, always include both the quantity and the unit in a string, e.g. INTERVAL '7 days'.\n",
+ " - Don't use DATE_SUB, it does not exist in CrateDB. Use DATE_TRUNC instead.\n",
+ " \"\"\",\n",
+ " agent_name=\"cratedb_query_agent\",\n",
+ " action_groups=[action_group],\n",
+ " ).invoke(input_text=question)\n",
+ "\n",
+ "print(\"β
Function `query_agent` defined successfully\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c6715d7f",
+ "metadata": {},
+ "source": [
+ "As before when communicating directly with the LLM, we can try out a few different questions. But this time, we ask it to the agent, and not the LLM."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "aa4ed895",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# fmt: off\n",
+ "question = \"Is any of my machines behaving significantly different compared to others? I'm interested in vibration from motor_readings.\"\n",
+ "# question = \"Did the vibration of machine 4 change between today and yesterday? Query the table motor_readings.\"\n",
+ "# question = \"How recent is my data in motor_readings? Is there any machine that lacks behind?\"\n",
+ "# question = \"What was the highest temperature ever observed over all machines? Apply DATE_TRUNC to generate a weekly overview and include the week in your reply. The week is returned as a timestamp in millisecond, format it in a human-readable way.\"\n",
+ "# fmt: on\n",
+ "\n",
+ "print(await query_agent(question))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "91f61529",
+ "metadata": {},
+ "source": [
+ "## Step 7: Integrate Machine Manuals into the QA Pipeline\n",
+ "\n",
+ "We so far worked on timeseries data only. Now we dynamically generate fictional manuals for each machine based on the IDs in `motor_readings`.\n",
+ "\n",
+ "Each manual includes:\n",
+ "- Operational limits\n",
+ "- Maintenance schedules\n",
+ "- Emergency protocols\n",
+ "- Manufacturer and contact info\n",
+ "\n",
+ "This ensures the manual data matches whatever telemetry data has been created, even if someone customized the setup earlier.\n",
+ "\n",
+ "We store the results in a CrateDB table: `machine_manuals`.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f51f49a2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import random\n",
+ "\n",
+ "# === Configuration ===\n",
+ "include_branding = True\n",
+ "include_contact_info = True\n",
+ "\n",
+ "brands = [\"AtlasTech\", \"RotoFlow\", \"MechAxis\", \"IndustraCore\"]\n",
+ "models = [\"VX100\", \"MX200\", \"TQ350\", \"RG450\"]\n",
+ "year_range = list(range(2017, 2023))\n",
+ "\n",
+ "# === Load unique machine IDs from CrateDB ===\n",
+ "machine_ids = pd.read_sql(\"SELECT DISTINCT machine_id FROM motor_readings\", con=engine)\n",
+ "machine_ids = machine_ids[\"machine_id\"].tolist()\n",
+ "\n",
+ "\n",
+ "def generate_manual(machine_id):\n",
+ " brand = random.choice(brands)\n",
+ " model = random.choice(models)\n",
+ " year = random.choice(year_range)\n",
+ "\n",
+ " vib_max = round(random.uniform(1.2, 1.6), 2)\n",
+ " temp_max = round(random.uniform(65, 75), 1)\n",
+ " rpm_max = random.randint(1550, 1650)\n",
+ "\n",
+ " # Build optional blocks\n",
+ " branding_section = \"\"\n",
+ " if include_branding:\n",
+ " branding_section = f\"\"\"**Manufacturer:** {brand}\n",
+ "**Model:** {model}\n",
+ "**Year of Installation:** {year}\"\"\"\n",
+ "\n",
+ " contact_section = \"\"\n",
+ " if include_contact_info:\n",
+ " contact_section = f\"\"\"**Contact:**\n",
+ "- Support: support@{brand.lower()}.com\n",
+ "- Manual Version: 1.0\"\"\"\n",
+ "\n",
+ " # Build the full manual string\n",
+ " content = f\"\"\"\n",
+ "π οΈ Machine Manual β ID: {machine_id}\n",
+ "\n",
+ "{branding_section}\n",
+ "\n",
+ "---\n",
+ "\n",
+ "**Operational Limits:**\n",
+ "- Max Vibration: {vib_max} units\n",
+ "- Max Temperature: {temp_max}Β°C\n",
+ "- Max RPM: {rpm_max} rotations/min\n",
+ "\n",
+ "**Anomaly Detection:**\n",
+ "- Vibration > {vib_max} may indicate imbalance or bearing issues\n",
+ "- Temperature > {temp_max} may suggest overheating\n",
+ "- RPM deviations > Β±100 RPM require inspection\n",
+ "\n",
+ "---\n",
+ "\n",
+ "**Maintenance Schedule:**\n",
+ "- Weekly: Inspect vibration and temperature logs\n",
+ "- Monthly: Lubricate bearings and check alignment\n",
+ "- Quarterly: Full motor calibration and safety check\n",
+ "\n",
+ "**Emergency Protocol:**\n",
+ "If vibration exceeds {vib_max + 0.2} or temperature exceeds {temp_max + 5}:\n",
+ "1. Immediately reduce load\n",
+ "2. Shut down the motor if anomaly persists for >5 mins\n",
+ "3. Notify operations lead and schedule maintenance\n",
+ "\n",
+ "---\n",
+ "\n",
+ "{contact_section}\n",
+ "\"\"\".strip()\n",
+ "\n",
+ " return {\"machine_id\": machine_id, \"manual\": content}\n",
+ "\n",
+ "\n",
+ "# Generate manuals for all machine IDs found\n",
+ "manuals = [generate_manual(mid) for mid in machine_ids]\n",
+ "df_manuals = pd.DataFrame(manuals)\n",
+ "\n",
+ "# Store in CrateDB\n",
+ "df_manuals.to_sql(\"machine_manuals\", con=engine, if_exists=\"append\", index=False)\n",
+ "print(f\"β
Stored manuals for {len(df_manuals)} machines in 'machine_manuals'.\")\n",
+ "_ = connection.execute(sa.text(\"REFRESH TABLE machine_manuals;\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab1576cf",
+ "metadata": {},
+ "source": [
+ "### View a Random Machine Manual\n",
+ "\n",
+ "Below is a randomly selected machine manual from the `machine_manuals` table. \n",
+ "Each manual includes operational guidelines, maintenance schedules, and emergency protocols β all of which can be referenced by the assistant in later steps."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bb403cfb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from IPython.display import display, HTML\n",
+ "\n",
+ "# Step 1: Load a random manual's content\n",
+ "manual = pd.read_sql(\n",
+ " \"SELECT machine_id, manual FROM machine_manuals ORDER BY RANDOM() LIMIT 1\",\n",
+ " con=engine,\n",
+ ")\n",
+ "machine_id = manual.iloc[0][\"machine_id\"]\n",
+ "manual_text = manual.iloc[0][\"manual\"]\n",
+ "\n",
+ "# Step 2: Display in scrollable, formatted box\n",
+ "display(\n",
+ " HTML(\n",
+ " f\"\"\"\n",
+ "
π Manual for Machine ID: {machine_id}
\n",
+ "\n",
+ "{manual_text}\n",
+ "
\n",
+ "\"\"\"\n",
+ " )\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e0b80fc",
+ "metadata": {},
+ "source": [
+ "### Context-Aware Assistant Using Data + Manuals\n",
+ "\n",
+ "We can now ask our assistent questions involving both telemetry data (`motor_readings`) and manual guidance (`machine_manuals`).\n",
+ "\n",
+ "This allows it to:\n",
+ "- Detect anomalies\n",
+ "- Reference emergency protocols or limits\n",
+ "- Provide maintenance guidance\n",
+ "\n",
+ "This is your main interaction point with the assistant. Just type in a natural language question, and the assistant will:\n",
+ "- Analyze your question\n",
+ "- Decide if telemetry data or manual context is needed\n",
+ "- Generate and run one or more SQL queries\n",
+ "- Explain the results in plain language\n",
+ "- Optionally summarize emergency protocols from manuals\n",
+ "\n",
+ "\n",
+ "Here are some example queries and what kind of answers you can expect:\n",
+ "\n",
+ "| Question | Assistant Behavior |\n",
+ "|--------------------------------------------------------------------------|------------------------------------------------------------------------------------|\n",
+ "| What was the average temperature for machine 3 last week? | Retrieves average temperature from telemetry and explains the result. |\n",
+ "| Is machine 4 overheating? | Checks the latest temperature and compares it with manual thresholds if present. |\n",
+ "| What should I do if machine 2 has an anomaly? | Loads the manual for machine 2 and summarizes anomaly and emergency protocols. |\n",
+ "| Give me the max and min vibration for machine 6 when rotations > 1600. | Executes a filtered SQL query and summarizes the max/min vibration values. |\n",
+ "| Show me the maintenance steps for machine 1. | Extracts and summarizes the maintenance section from the manual. |\n",
+ "| Is machine 5's most recent temperature still ok according to the manual? | Retrieves the latest temperature from telemetry and correlates it with the manual. |"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3bbe0cce",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# fmt: off\n",
+ "question = \"Show me the maintenance schedule for machine 5. Retrieve the manual column from the machine_manuals table and extract the maintenance schedule from its content.\"\n",
+ "# question = \"Is machine 4 overheating?\"\n",
+ "# question = \"Give me the max and min vibration observed for machine 6 when rotations > 1600. Rotations are stored in the rotations column.\"\n",
+ "# question = \"What should I do if machine 2 has an anomaly?\"\n",
+ "# question = \"What can be a reason for higher than usual values for the column vibration in motor_readings for machine 2?\"\n",
+ "# question = \"Is machine 5's most recent temperature still ok according to the manual? Look up the most recent temperature from motor_readings.\"\n",
+ "# fmt: on\n",
+ "\n",
+ "print(await query_agent(question))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b98110e",
+ "metadata": {},
+ "source": [
+ "--- \n",
+ "## Step 8: Recap & Lessons Learned\n",
+ "\n",
+ "Congratulations on completing the Timeseries QA with LLMs & Manuals workshop!\n",
+ "Letβs wrap things up with a quick recap of what weβve achieved and the key takeaways.\n",
+ "\n",
+ "### Workshop Recap\n",
+ "\n",
+ "Over the course of this notebook, you:\n",
+ "- Set up your environment and connected to CrateDB, a powerful distributed SQL database optimized for timeseries data.\n",
+ "- Generated and loaded synthetic telemetry data simulating real-world sensor readings from industrial machines.\n",
+ "- Built a natural language interface with an LLM to convert plain English into CrateDB-compatible SQL queries.\n",
+ "- Visualized time-series data directly from natural language questions, bringing clarity to trends, outliers, and anomalies.\n",
+ "- Defined an inline AI agent, communicating with CrateDB through an MCP server.\n",
+ "- Generated structured machine manuals, complete with thresholds, anomaly rules, and emergency procedures.\n",
+ "- Merged telemetry with manual context, allowing a single question to yield data-driven insights and operational guidance.\n",
+ "\n",
+ "### Key Lessons Learned\n",
+ "\n",
+ "| Skill | What You Practiced |\n",
+ "|--------------------------------------|----------------------------------------------------------------------------------------|\n",
+ "| **LLM Prompt Engineering** | How to design system prompts and templates that turn natural language into SQL. |\n",
+ "| **Table-Augmented Generation (TAG)** | Augmenting LLMs with live table schemas to improve query accuracy. |\n",
+ "| **Timeseries Analysis** | Using SQL and pandas to inspect, filter, and visualize sensor data over time. |\n",
+ "| **AI agents** | The key components of an agent, including action groups and MCP servers. |\n",
+ "| **CrateDB Features** | Leveraging a scalable time-series database with SQL support. |\n",
+ "| **RAG-like Patterns** | Combining structured telemetry with unstructured manuals for richer QA experiences. |\n",
+ "\n",
+ "\n",
+ "π Thanks for Participating!\n",
+ "\n",
+ "We hope this workshop has inspired you to combine structured data, unstructured manuals, and language models in powerful new ways."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.13.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/topic/multi-model/multi-model-offshore-wind-farms.ipynb b/topic/multi-model/multi-model-offshore-wind-farms.ipynb
index 05b29aeb..838a7d07 100644
--- a/topic/multi-model/multi-model-offshore-wind-farms.ipynb
+++ b/topic/multi-model/multi-model-offshore-wind-farms.ipynb
@@ -27,7 +27,7 @@
"metadata": {},
"outputs": [],
"source": [
- "! pip install -U ipyleaflet sqlalchemy-cratedb pandas"
+ "%pip install ipyleaflet sqlalchemy-cratedb pandas"
]
},
{
@@ -51,26 +51,30 @@
"source": [
"import os\n",
"import sqlalchemy as sa\n",
+ "import pandas as pd\n",
"\n",
- "# Define database address when using CrateDB Cloud.\n",
- "# Please find these settings on your cluster overview page.\n",
- "# Example: crate://admin:ofidiejai7iiReereHi0@demo.gke1.us-central1.gcp.cratedb.net/?ssl=true\n",
+ "# Option 1: CrateDB Cloud (Update or set via CRATEDB_CONNECTION_STRING env variable)\n",
+ "# Example: crate://admin:ooToh2Paecielun@demo.eks1.eu-west-1.aws.cratedb.net/?ssl=true\n",
"CONNECTION_STRING = os.environ.get(\n",
" \"CRATEDB_CONNECTION_STRING\",\n",
- " \"crate://:@/?ssl=true\",\n",
+ " \"crate://USER:PASSWORD@CRATEDB_HOST/?ssl=true\",\n",
")\n",
"\n",
- "# # Define database address when using CrateDB on localhost.\n",
- "# CONNECTION_STRING = os.environ.get(\n",
- "# \"CRATEDB_CONNECTION_STRING\",\n",
- "# \"crate://crate@localhost/\"\n",
- "# )\n",
- "\n",
- "# Connect to CrateDB using SQLAlchemy.\n",
- "engine = sa.create_engine(\n",
- " CONNECTION_STRING, echo=sa.util.asbool(os.environ.get(\"DEBUG\", \"false\"))\n",
- ")\n",
- "connection = engine.connect()"
+ "# Option 2: Localhost setup\n",
+ "# CONNECTION_STRING = os.environ.get(\"CRATEDB_CONNECTION_STRING\", \"crate://crate@localhost/\")\n",
+ "\n",
+ "# Try to connect\n",
+ "try:\n",
+ " engine = sa.create_engine(CONNECTION_STRING)\n",
+ " connection = engine.connect()\n",
+ "\n",
+ " # Run a simple query to validate connection\n",
+ " result = pd.read_sql(\"SELECT mountain FROM sys.summits LIMIT 1\", con=engine)\n",
+ " print(\"β
Successfully connected to CrateDB!\")\n",
+ " print(\"Sample query result from sys.summits:\", result.iloc[0][\"mountain\"])\n",
+ "except Exception as e:\n",
+ " print(\"β Failed to connect to CrateDB. Please check your connection string.\")\n",
+ " print(\"Error:\", e)"
]
},
{
@@ -90,7 +94,7 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -136,7 +140,9 @@
" ) PARTITIONED BY (month);\n",
" \"\"\"\n",
" )\n",
- ")"
+ ")\n",
+ "\n",
+ "print(\"β
Tables created successfully\")"
]
},
{
@@ -224,12 +230,14 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"_ = connection.execute(sa.text(\"REFRESH TABLE windfarms, windfarm_output\"))\n",
- "_ = connection.execute(sa.text(\"ANALYZE\"))"
+ "_ = connection.execute(sa.text(\"ANALYZE\"))\n",
+ "\n",
+ "print(\"β
Tables refreshed successfully\")"
]
},
{
@@ -247,87 +255,10 @@
},
{
"cell_type": "code",
- "execution_count": 33,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " ts | \n",
- " output | \n",
- " outputpercentage | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " 2024-10-28 05:00:00 | \n",
- " 32.0 | \n",
- " 53.33 | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " 2024-10-28 04:00:00 | \n",
- " 39.5 | \n",
- " 65.83 | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " 2024-10-28 03:00:00 | \n",
- " 46.0 | \n",
- " 76.67 | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " 2024-10-28 02:00:00 | \n",
- " 46.9 | \n",
- " 78.17 | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " 2024-10-28 01:00:00 | \n",
- " 47.8 | \n",
- " 79.67 | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " ts output outputpercentage\n",
- "0 2024-10-28 05:00:00 32.0 53.33\n",
- "1 2024-10-28 04:00:00 39.5 65.83\n",
- "2 2024-10-28 03:00:00 46.0 76.67\n",
- "3 2024-10-28 02:00:00 46.9 78.17\n",
- "4 2024-10-28 01:00:00 47.8 79.67"
- ]
- },
- "execution_count": 33,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "import pandas as pd\n",
- "\n",
"query = \"\"\"\n",
"SELECT\n",
" ts,\n",
@@ -356,111 +287,10 @@
},
{
"cell_type": "code",
- "execution_count": 23,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " name | \n",
- " avg_output_percent | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " Seagreen Phase 1 | \n",
- " 69.42 | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " Walney 2 | \n",
- " 67.70 | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " West of Duddon Sands | \n",
- " 66.42 | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " Walney Extension 4 | \n",
- " 65.72 | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " Rhyl Flats | \n",
- " 61.26 | \n",
- "
\n",
- " \n",
- " 5 | \n",
- " Inner Dowsing | \n",
- " 61.00 | \n",
- "
\n",
- " \n",
- " 6 | \n",
- " Lynn | \n",
- " 61.00 | \n",
- "
\n",
- " \n",
- " 7 | \n",
- " Galloper | \n",
- " 60.52 | \n",
- "
\n",
- " \n",
- " 8 | \n",
- " Robin Rigg West | \n",
- " 59.67 | \n",
- "
\n",
- " \n",
- " 9 | \n",
- " Walney Extension 3 | \n",
- " 59.32 | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " name avg_output_percent\n",
- "0 Seagreen Phase 1 69.42\n",
- "1 Walney 2 67.70\n",
- "2 West of Duddon Sands 66.42\n",
- "3 Walney Extension 4 65.72\n",
- "4 Rhyl Flats 61.26\n",
- "5 Inner Dowsing 61.00\n",
- "6 Lynn 61.00\n",
- "7 Galloper 60.52\n",
- "8 Robin Rigg West 59.67\n",
- "9 Walney Extension 3 59.32"
- ]
- },
- "execution_count": 23,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "# Slide 18\n",
- "\n",
"query = \"\"\"\n",
" SELECT\n",
" name,\n",
@@ -493,78 +323,9 @@
},
{
"cell_type": "code",
- "execution_count": 31,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " week | \n",
- " hourly_avg_output_pct | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " 2024-10-24 | \n",
- " 26.30 | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " 2024-10-17 | \n",
- " 31.17 | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " 2024-10-10 | \n",
- " 26.64 | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " 2024-10-03 | \n",
- " 25.97 | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " 2024-09-26 | \n",
- " 40.86 | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " week hourly_avg_output_pct\n",
- "0 2024-10-24 26.30\n",
- "1 2024-10-17 31.17\n",
- "2 2024-10-10 26.64\n",
- "3 2024-10-03 25.97\n",
- "4 2024-09-26 40.86"
- ]
- },
- "execution_count": 31,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"query = \"\"\"\n",
"SELECT\n",
@@ -594,133 +355,9 @@
},
{
"cell_type": "code",
- "execution_count": 35,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " hour_of_day | \n",
- " hour_output | \n",
- " output_day_so_far | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " 2024-08-20 00:00:00 | \n",
- " 1014.5 | \n",
- " 1014.5 | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " 2024-08-20 01:00:00 | \n",
- " 1014.0 | \n",
- " 2028.5 | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " 2024-08-20 02:00:00 | \n",
- " 1014.0 | \n",
- " 3042.5 | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " 2024-08-20 03:00:00 | \n",
- " 1014.0 | \n",
- " 4056.5 | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " 2024-08-20 05:00:00 | \n",
- " 1005.3 | \n",
- " 5061.8 | \n",
- "
\n",
- " \n",
- " 5 | \n",
- " 2024-08-20 06:00:00 | \n",
- " 981.0 | \n",
- " 6042.8 | \n",
- "
\n",
- " \n",
- " 6 | \n",
- " 2024-08-20 07:00:00 | \n",
- " 956.4 | \n",
- " 6999.2 | \n",
- "
\n",
- " \n",
- " 7 | \n",
- " 2024-08-20 08:00:00 | \n",
- " 958.9 | \n",
- " 7958.1 | \n",
- "
\n",
- " \n",
- " 8 | \n",
- " 2024-08-20 09:00:00 | \n",
- " 912.1 | \n",
- " 8870.2 | \n",
- "
\n",
- " \n",
- " 9 | \n",
- " 2024-08-20 10:00:00 | \n",
- " 927.1 | \n",
- " 9797.3 | \n",
- "
\n",
- " \n",
- " 10 | \n",
- " 2024-08-20 11:00:00 | \n",
- " 904.4 | \n",
- " 10701.7 | \n",
- "
\n",
- " \n",
- " 11 | \n",
- " 2024-08-20 12:00:00 | \n",
- " 943.8 | \n",
- " 11645.5 | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " hour_of_day hour_output output_day_so_far\n",
- "0 2024-08-20 00:00:00 1014.5 1014.5\n",
- "1 2024-08-20 01:00:00 1014.0 2028.5\n",
- "2 2024-08-20 02:00:00 1014.0 3042.5\n",
- "3 2024-08-20 03:00:00 1014.0 4056.5\n",
- "4 2024-08-20 05:00:00 1005.3 5061.8\n",
- "5 2024-08-20 06:00:00 981.0 6042.8\n",
- "6 2024-08-20 07:00:00 956.4 6999.2\n",
- "7 2024-08-20 08:00:00 958.9 7958.1\n",
- "8 2024-08-20 09:00:00 912.1 8870.2\n",
- "9 2024-08-20 10:00:00 927.1 9797.3\n",
- "10 2024-08-20 11:00:00 904.4 10701.7\n",
- "11 2024-08-20 12:00:00 943.8 11645.5"
- ]
- },
- "execution_count": 46,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"query = \"\"\"\n",
"SELECT\n",
@@ -787,94 +424,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " name | \n",
- " latest_output_pct | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " Race Bank | \n",
- " 71.48 | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " Dudgeon | \n",
- " 63.56 | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " Inner Dowsing | \n",
- " 59.48 | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " Triton Knoll | \n",
- " 51.89 | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " Lincs | \n",
- " 41.67 | \n",
- "
\n",
- " \n",
- " 5 | \n",
- " Sheringham Shoal | \n",
- " 39.59 | \n",
- "
\n",
- " \n",
- " 6 | \n",
- " Scroby Sands | \n",
- " 23.17 | \n",
- "
\n",
- " \n",
- " 7 | \n",
- " Humber Gateway | \n",
- " 17.12 | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " name latest_output_pct\n",
- "0 Race Bank 71.48\n",
- "1 Dudgeon 63.56\n",
- "2 Inner Dowsing 59.48\n",
- "3 Triton Knoll 51.89\n",
- "4 Lincs 41.67\n",
- "5 Sheringham Shoal 39.59\n",
- "6 Scroby Sands 23.17\n",
- "7 Humber Gateway 17.12"
- ]
- },
- "execution_count": 28,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"query = \"\"\"\n",
"SELECT\n",
@@ -926,81 +476,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " name | \n",
- " num_turbines | \n",
- " description | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " East Anglia One | \n",
- " 102 | \n",
- " East Anglia ONE is located in the southern are... | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " Moray (East) | \n",
- " 100 | \n",
- " Moray East Wind Farm is an offshore wind farm ... | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " Sheringham Shoal | \n",
- " 89 | \n",
- " Sheringham Shoal Offshore Wind Farm is a Round... | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " Beatrice | \n",
- " 84 | \n",
- " The Beatrice Offshore Wind Farm now known as B... | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " name num_turbines \\\n",
- "0 East Anglia One 102 \n",
- "1 Moray (East) 100 \n",
- "2 Sheringham Shoal 89 \n",
- "3 Beatrice 84 \n",
- "\n",
- " description \n",
- "0 East Anglia ONE is located in the southern are... \n",
- "1 Moray East Wind Farm is an offshore wind farm ... \n",
- "2 Sheringham Shoal Offshore Wind Farm is a Round... \n",
- "3 The Beatrice Offshore Wind Farm now known as B... "
- ]
- },
- "execution_count": 26,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"query = \"\"\"\n",
"SELECT\n",
@@ -1053,113 +529,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- "\n",
- "
\n",
- " \n",
- " \n",
- " | \n",
- " _score | \n",
- " name | \n",
- " description | \n",
- "
\n",
- " \n",
- " \n",
- " \n",
- " 0 | \n",
- " 1.585061 | \n",
- " London Array 1 | \n",
- " The London Array is a 175-turbine 630 MW Round... | \n",
- "
\n",
- " \n",
- " 1 | \n",
- " 1.583606 | \n",
- " Greater Gabbard | \n",
- " Greater Gabbard is a 504 MW wind farm, built o... | \n",
- "
\n",
- " \n",
- " 2 | \n",
- " 1.561188 | \n",
- " East Anglia One | \n",
- " East Anglia ONE is located in the southern are... | \n",
- "
\n",
- " \n",
- " 3 | \n",
- " 1.551555 | \n",
- " Rampion Wind Farm | \n",
- " Rampion is an offshore wind farm developed by ... | \n",
- "
\n",
- " \n",
- " 4 | \n",
- " 1.544661 | \n",
- " Hornsea One | \n",
- " Hornsea Wind Farm is a Round 3 wind farm which... | \n",
- "
\n",
- " \n",
- " 5 | \n",
- " 1.544292 | \n",
- " Hornsea 2 | \n",
- " Hornsea Wind Farm is a Round 3 wind farm which... | \n",
- "
\n",
- " \n",
- " 6 | \n",
- " 1.526784 | \n",
- " Seagreen Phase 1 | \n",
- " Seagreen is an offshore wind farm located in t... | \n",
- "
\n",
- " \n",
- " 7 | \n",
- " 1.505355 | \n",
- " Gwynt y Mor | \n",
- " Gwynt y MΓ΄r (Welsh for 'sea wind') is a 576-me... | \n",
- "
\n",
- " \n",
- "
\n",
- "
"
- ],
- "text/plain": [
- " _score name \\\n",
- "0 1.585061 London Array 1 \n",
- "1 1.583606 Greater Gabbard \n",
- "2 1.561188 East Anglia One \n",
- "3 1.551555 Rampion Wind Farm \n",
- "4 1.544661 Hornsea One \n",
- "5 1.544292 Hornsea 2 \n",
- "6 1.526784 Seagreen Phase 1 \n",
- "7 1.505355 Gwynt y Mor \n",
- "\n",
- " description \n",
- "0 The London Array is a 175-turbine 630 MW Round... \n",
- "1 Greater Gabbard is a 504 MW wind farm, built o... \n",
- "2 East Anglia ONE is located in the southern are... \n",
- "3 Rampion is an offshore wind farm developed by ... \n",
- "4 Hornsea Wind Farm is a Round 3 wind farm which... \n",
- "5 Hornsea Wind Farm is a Round 3 wind farm which... \n",
- "6 Seagreen is an offshore wind farm located in t... \n",
- "7 Gwynt y MΓ΄r (Welsh for 'sea wind') is a 576-me... "
- ]
- },
- "execution_count": 30,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"# Slide 26.\n",
"\n",