Minor cleaning

VioletM · VioletM · commit 642fe0e66c62 · 2025-07-23T15:53:39.000+02:00
diff --git a/docs/website/docs/dlt-ecosystem/llm-tooling/cursor-restapi.md b/docs/website/docs/dlt-ecosystem/llm-tooling/cursor-restapi.md
@@ -12,9 +12,9 @@ This guide walks you through a collaborative AI-human workflow for extracting an
 
 You will learn:
 1. How to use dltHub's [LLM-context database](https://dlthub.com/workspace) to init workspace for the source you need.
-2. How to build a REST API source in minutes with AI assistance
-3. How to debug the pipeline and explore data using the pipeline dashboard
-4. How to start a new notebook and use the pipeline's dataset in it
+2. How to build a REST API source in minutes with AI assistance.
+3. How to debug the pipeline and explore data using the pipeline dashboard.
+4. How to start a new notebook and use the pipeline's dataset in it.
 
 ## Prerequisites
 
@@ -25,13 +25,14 @@ You will learn:
 Before diving into the workflow, here’s a quick overview of key terms you’ll encounter:
 
 1. **dltHub Workspace** - An environment where all data engineering tasks, from writing code to maintenance in production, can be executed by single developer:
-    > TODO: rewrite this
-   - Develop and test locally with `dlt`, `duckdb` and `filesystem` then run in the cloud without any changes to code and schemas.
-   - Deploy and run dlt pipelines, transformations, and notebooks with one command
-   - Maintain pipelines with a Runtime Agent, customizable dashboards, and validation tests
-   - Deliver live, production-ready reports without worrying about schema drift or silent failures
+   - Develop and test data pipelines locally
+   - Run dlt pipelines, transformations, and notebooks with one command
+   - Deliver live, production-ready reports with streamlined access to the dataset
 
-   It's not yet fully available, but you can start with the initial workflow: LLM-native pipeline development for 1,000+ REST APIs.
+   We plan to support more functionality in the future, such as:
+   - Deploy and run your data workflows in the cloud without any changes to code and schemas
+   - Maintain pipelines with a Runtime Agent, customizable dashboards, and validation tests
+   - Deploy live, reports without worrying about schema drift or silent failures
 
 2. **[Cursor](https://cursor.com/)** - An AI-powered code editor that lets you express tasks in natural language for an LLM agent to implement. This LLM-native workflow isn’t exclusive to Cursor, but it’s the first AI code editor we’ve integrated with.
 
@@ -41,20 +42,16 @@ Before diving into the workflow, here’s a quick overview of key terms you’ll
 
 ### Setup Cursor
 
-> TODO: review and make this section smooth
-
 1. Use the right model
-For best results, use Claude 3.7-sonnet or Gemini 2.5+. Weaker models struggle with context comprehension and workflow consistency.
-We've had the best results with Claude 3.7-sonnet (which requires the paid version of Cursor). Weaker models were not able to comprehend the required context fully and were not able to use tools and follow workflows consistently.
+For best results, use Claude 3.7-sonnet, Gemini 2.5+ or higher models. Weaker models struggle with context comprehension and workflow consistency.
+We've observed the best results with Claude 3.7-sonnet (which requires the paid version of Cursor).
 
 2. Add documentation
+AI code editors let you upload documentation and code examples to provide additional context. [Here](https://docs.cursor.com/context/@-symbols/@-docs) you can learn how to do it with Cursor.
+Go to `Cursor Settings > Indexing & Docs` to see all your added documentation. You can edit, delete, or add new docs here. We recommend adding documentation scoped for a specific task. Add the following documentation links:
 
-    AI code editors let you upload documentation and code examples to provide additional context. [Here](https://docs.cursor.com/context/@-symbols/@-docs) you can learn how to do it.
-
-    Under Cursor `Settings > Features > Docs`, you can see all the docs you have added. You can edit, delete, or add new docs here. We recommend adding documentation scoped for a specific task. For example, for developing a REST API source, consider adding:
-
-    * [REST API Source](../verified-sources/rest_api/) documentation
-
+    * [REST API Source](../verified-sources/rest_api/) as `@dlt rest api`
+    * [Core dlt concepts & usage](https://dlthub.com/docs/general-usage/resource) as `@dlt docs`
 
 ### Install dlt Workspace
 
@@ -66,38 +63,39 @@ pip install dlt[workspace]
 
 dltHub provides prepared contexts for 1000+ sources, available at [https://dlthub.com/workspace](https://dlthub.com/workspace). To get started, search for your API and follow the tailored instructions.
 
+<div style={{textAlign: 'center'}}>
+![search for your source](https://storage.googleapis.com/dlt-blog-images/llm_workflows_search.png)
+</div>
 
-To initialize your workspace, execute this dltHub Workspace command:
+To initialize dltHub Workspace, execute the following:
 
 ```sh
 dlt init dlthub:{source_name} duckdb
 ```
 
 This command will initialize the dltHub Workspace with:
-- files and folder structure you know from [dlt init](../../walkthroughs/create-a-pipeline.md)
-- Documentation scaffold for the specific source (typically a `yaml` file)
+- Files and folder structure you know from [dlt init](../../walkthroughs/create-a-pipeline.md)
+- Documentation scaffold for the specific source (typically a `yaml` file) optimized for LLMs
 - Cursor rules tailored for `dlt`
 - Pipeline script and REST API Source (`{source_name}_pipeline.py`) definition that you'll customize in next step
 
 :::tip
-If you can't find the source you need, start with a generic REST API Source template. Choose source name you need ie.
+If you can't find the source you need, start with a generic REST API Source template. Choose source name you need i.e.
 ```sh
 dlt init dlthub:my_internal_fast_api duckdb
 ```
-You'll still get full cursor setup and pipeline script (`my_internal_fast_api_pipeline.py`) plus all files and folder you get with regular [dlt init](../../walkthroughs/create-a-pipeline.md).
-
-You'll need to [provide an useful REST API scaffold](#addon-bring-your-own-llm-scaffold) for your LLM model, though.
-
+This will generate the full pipeline setup, including the script (`my_internal_fast_api_pipeline.py`) and all the files and folders you’d normally get with a standard [dlt init](../../walkthroughs/create-a-pipeline.md).
+To make your source available to the LLM, be sure to [include the documentation](#addon-bring-your-own-llm-scaffold) in the context so the model can understand how to use it.
 :::
 
-
 ## Create dlt pipeline
 
 ### Generate code
 
-We recommend starting with our prepared prompts for each API. Visit [https://dlthub.com/workspace](https://dlthub.com/workspace) and copy the suggested prompt for your chosen source. Note that the prompt may vary depending on the API to ensure the best context and accuracy.
+To get started quickly, we recommend using our pre-defined prompts tailored for each API. Visit [https://dlthub.com/workspace](https://dlthub.com/workspace) and copy the prompt for your selected source.
+Prompts are adjusted per API to provide the most accurate and relevant context.
 
-Here's a general prompt template:
+Here's a general prompt template you can adapt:
 
 ```text
 Please generate a REST API Source for {source} API, as specified in @{source}-docs.yaml
@@ -109,8 +107,11 @@ Use @dlt rest api as a tutorial.
 After adding the endpoints, allow the user to run the pipeline with python {source}_pipeline.py and await further instructions.
 ```
 
-> TODO: the crucial part is to explain the context in the prompt. this is basically why we write this docs - to give a little background to the walkthrough on the website.
-> in the prompt above: we link to the scaffold/spec, pipeline script, dlt rest api docs etc. this should be explained
+YIn this prompt, we use `@` references to link to source specifications and documentation. Make sure Cursor recognizes the referenced docs.
+You can learn more about [referencing with @ in Cursor](https://docs.cursor.com/context/@-symbols/overview).
+
+* `@{source}-docs.yaml` contains the source specification. Describes the source with endpoints, parameters, and other details.
+* `@dlt rest api` contains the documentation for dlt's REST API source.
 
 ### Add credentials
 
@@ -147,15 +148,17 @@ dlt pipeline {source}_pipeline show --dashboard
 The dashboard shows:
 - Pipeline overview with state and metrics
 - Data schema (tables, columns, types)
-- Data itself - you can even write custom queries
+- Data itself, you can even write custom queries
 
 The dashboard helps detect silent failures due to pagination errors, schema drift, or incremental load misconfigurations.
 
 ## Use the data in a Notebook
 
 With the pipeline and data validated, you can continue with custom data explorations and reports. You can use your preferred environment, for example, [Jupyter Notebook](https://jupyter.org/), [Marimo Notebook](https://marimo.io/), or a plain Python file.
 
-> TODO: (1) maybe a short instruction how to bootstrap marimo notebook would help? (2) we have some instructions in our docs already https://dlthub.com/docs/general-usage/dataset-access/marimo 
+:::tip
+For an optimized data exploration experience, we recommend using a Marimo notebook. Check out the [detailed guide on using dlt with Marimo](https://dlthub.com/docs/general-usage/dataset-access/marimo).
+:::
 
 To access the data, you can use the `dataset()` method:
 
@@ -176,8 +179,11 @@ For more, see [dataset access guide](../../general-usage/dataset-access).
 
 
 ## Addon: bring your own LLM Scaffold
-LLMs can infer REST API Source definition from many kinds of specs and sometimes providing one is fairly easy.
 
-1. If you use Fast API (ie. for internal API) - use autogenerated openAPI spec and refer to it in your prompt.
-2. If you have legacy code in any Language, add it to the workspace and refer to it in your prompt.
-3. A good human readable documentation also works! You can try to add it ot Cursor docs and refer to it in your prompt.
+LLMs can infer a REST API Source definition from various types of input, and in many cases, it’s easy to provide what’s needed.
+
+Here are a few effective ways to scaffold your source:
+
+1. **FastAPI (Internal APIs)**. If you're using FastAPI, simply add a file with the autogenerated OpenAPI spec to your workspace and reference it in your prompt.
+2. **Legacy code in any programming language**. Add the relevant code files to your workspace and reference them directly in your prompt. LLM can extract useful structure even from older codebases.
+3. **Human-readable documentation**. Well-written documentation works too. You can add it to your Cursor docs and reference it in your prompt for context.