Skip to content

Commit 642fe0e

Browse files
committed
Minor cleaning
1 parent a6eca88 commit 642fe0e

File tree

1 file changed

+44
-38
lines changed

1 file changed

+44
-38
lines changed

docs/website/docs/dlt-ecosystem/llm-tooling/cursor-restapi.md

Lines changed: 44 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@ This guide walks you through a collaborative AI-human workflow for extracting an
1212

1313
You will learn:
1414
1. How to use dltHub's [LLM-context database](https://dlthub.com/workspace) to init workspace for the source you need.
15-
2. How to build a REST API source in minutes with AI assistance
16-
3. How to debug the pipeline and explore data using the pipeline dashboard
17-
4. How to start a new notebook and use the pipeline's dataset in it
15+
2. How to build a REST API source in minutes with AI assistance.
16+
3. How to debug the pipeline and explore data using the pipeline dashboard.
17+
4. How to start a new notebook and use the pipeline's dataset in it.
1818

1919
## Prerequisites
2020

@@ -25,13 +25,14 @@ You will learn:
2525
Before diving into the workflow, here’s a quick overview of key terms you’ll encounter:
2626

2727
1. **dltHub Workspace** - An environment where all data engineering tasks, from writing code to maintenance in production, can be executed by single developer:
28-
> TODO: rewrite this
29-
- Develop and test locally with `dlt`, `duckdb` and `filesystem` then run in the cloud without any changes to code and schemas.
30-
- Deploy and run dlt pipelines, transformations, and notebooks with one command
31-
- Maintain pipelines with a Runtime Agent, customizable dashboards, and validation tests
32-
- Deliver live, production-ready reports without worrying about schema drift or silent failures
28+
- Develop and test data pipelines locally
29+
- Run dlt pipelines, transformations, and notebooks with one command
30+
- Deliver live, production-ready reports with streamlined access to the dataset
3331

34-
It's not yet fully available, but you can start with the initial workflow: LLM-native pipeline development for 1,000+ REST APIs.
32+
We plan to support more functionality in the future, such as:
33+
- Deploy and run your data workflows in the cloud without any changes to code and schemas
34+
- Maintain pipelines with a Runtime Agent, customizable dashboards, and validation tests
35+
- Deploy live, reports without worrying about schema drift or silent failures
3536

3637
2. **[Cursor](https://cursor.com/)** - An AI-powered code editor that lets you express tasks in natural language for an LLM agent to implement. This LLM-native workflow isn’t exclusive to Cursor, but it’s the first AI code editor we’ve integrated with.
3738

@@ -41,20 +42,16 @@ Before diving into the workflow, here’s a quick overview of key terms you’ll
4142

4243
### Setup Cursor
4344

44-
> TODO: review and make this section smooth
45-
4645
1. Use the right model
47-
For best results, use Claude 3.7-sonnet or Gemini 2.5+. Weaker models struggle with context comprehension and workflow consistency.
48-
We've had the best results with Claude 3.7-sonnet (which requires the paid version of Cursor). Weaker models were not able to comprehend the required context fully and were not able to use tools and follow workflows consistently.
46+
For best results, use Claude 3.7-sonnet, Gemini 2.5+ or higher models. Weaker models struggle with context comprehension and workflow consistency.
47+
We've observed the best results with Claude 3.7-sonnet (which requires the paid version of Cursor).
4948

5049
2. Add documentation
50+
AI code editors let you upload documentation and code examples to provide additional context. [Here](https://docs.cursor.com/context/@-symbols/@-docs) you can learn how to do it with Cursor.
51+
Go to `Cursor Settings > Indexing & Docs` to see all your added documentation. You can edit, delete, or add new docs here. We recommend adding documentation scoped for a specific task. Add the following documentation links:
5152

52-
AI code editors let you upload documentation and code examples to provide additional context. [Here](https://docs.cursor.com/context/@-symbols/@-docs) you can learn how to do it.
53-
54-
Under Cursor `Settings > Features > Docs`, you can see all the docs you have added. You can edit, delete, or add new docs here. We recommend adding documentation scoped for a specific task. For example, for developing a REST API source, consider adding:
55-
56-
* [REST API Source](../verified-sources/rest_api/) documentation
57-
53+
* [REST API Source](../verified-sources/rest_api/) as `@dlt rest api`
54+
* [Core dlt concepts & usage](https://dlthub.com/docs/general-usage/resource) as `@dlt docs`
5855

5956
### Install dlt Workspace
6057

@@ -66,38 +63,39 @@ pip install dlt[workspace]
6663

6764
dltHub provides prepared contexts for 1000+ sources, available at [https://dlthub.com/workspace](https://dlthub.com/workspace). To get started, search for your API and follow the tailored instructions.
6865

66+
<div style={{textAlign: 'center'}}>
67+
![search for your source](https://storage.googleapis.com/dlt-blog-images/llm_workflows_search.png)
68+
</div>
6969

70-
To initialize your workspace, execute this dltHub Workspace command:
70+
To initialize dltHub Workspace, execute the following:
7171

7272
```sh
7373
dlt init dlthub:{source_name} duckdb
7474
```
7575

7676
This command will initialize the dltHub Workspace with:
77-
- files and folder structure you know from [dlt init](../../walkthroughs/create-a-pipeline.md)
78-
- Documentation scaffold for the specific source (typically a `yaml` file)
77+
- Files and folder structure you know from [dlt init](../../walkthroughs/create-a-pipeline.md)
78+
- Documentation scaffold for the specific source (typically a `yaml` file) optimized for LLMs
7979
- Cursor rules tailored for `dlt`
8080
- Pipeline script and REST API Source (`{source_name}_pipeline.py`) definition that you'll customize in next step
8181

8282
:::tip
83-
If you can't find the source you need, start with a generic REST API Source template. Choose source name you need ie.
83+
If you can't find the source you need, start with a generic REST API Source template. Choose source name you need i.e.
8484
```sh
8585
dlt init dlthub:my_internal_fast_api duckdb
8686
```
87-
You'll still get full cursor setup and pipeline script (`my_internal_fast_api_pipeline.py`) plus all files and folder you get with regular [dlt init](../../walkthroughs/create-a-pipeline.md).
88-
89-
You'll need to [provide an useful REST API scaffold](#addon-bring-your-own-llm-scaffold) for your LLM model, though.
90-
87+
This will generate the full pipeline setup, including the script (`my_internal_fast_api_pipeline.py`) and all the files and folders you’d normally get with a standard [dlt init](../../walkthroughs/create-a-pipeline.md).
88+
To make your source available to the LLM, be sure to [include the documentation](#addon-bring-your-own-llm-scaffold) in the context so the model can understand how to use it.
9189
:::
9290

93-
9491
## Create dlt pipeline
9592

9693
### Generate code
9794

98-
We recommend starting with our prepared prompts for each API. Visit [https://dlthub.com/workspace](https://dlthub.com/workspace) and copy the suggested prompt for your chosen source. Note that the prompt may vary depending on the API to ensure the best context and accuracy.
95+
To get started quickly, we recommend using our pre-defined prompts tailored for each API. Visit [https://dlthub.com/workspace](https://dlthub.com/workspace) and copy the prompt for your selected source.
96+
Prompts are adjusted per API to provide the most accurate and relevant context.
9997

100-
Here's a general prompt template:
98+
Here's a general prompt template you can adapt:
10199

102100
```text
103101
Please generate a REST API Source for {source} API, as specified in @{source}-docs.yaml
@@ -109,8 +107,11 @@ Use @dlt rest api as a tutorial.
109107
After adding the endpoints, allow the user to run the pipeline with python {source}_pipeline.py and await further instructions.
110108
```
111109

112-
> TODO: the crucial part is to explain the context in the prompt. this is basically why we write this docs - to give a little background to the walkthrough on the website.
113-
> in the prompt above: we link to the scaffold/spec, pipeline script, dlt rest api docs etc. this should be explained
110+
YIn this prompt, we use `@` references to link to source specifications and documentation. Make sure Cursor recognizes the referenced docs.
111+
You can learn more about [referencing with @ in Cursor](https://docs.cursor.com/context/@-symbols/overview).
112+
113+
* `@{source}-docs.yaml` contains the source specification. Describes the source with endpoints, parameters, and other details.
114+
* `@dlt rest api` contains the documentation for dlt's REST API source.
114115

115116
### Add credentials
116117

@@ -147,15 +148,17 @@ dlt pipeline {source}_pipeline show --dashboard
147148
The dashboard shows:
148149
- Pipeline overview with state and metrics
149150
- Data schema (tables, columns, types)
150-
- Data itself - you can even write custom queries
151+
- Data itself, you can even write custom queries
151152

152153
The dashboard helps detect silent failures due to pagination errors, schema drift, or incremental load misconfigurations.
153154

154155
## Use the data in a Notebook
155156

156157
With the pipeline and data validated, you can continue with custom data explorations and reports. You can use your preferred environment, for example, [Jupyter Notebook](https://jupyter.org/), [Marimo Notebook](https://marimo.io/), or a plain Python file.
157158

158-
> TODO: (1) maybe a short instruction how to bootstrap marimo notebook would help? (2) we have some instructions in our docs already https://dlthub.com/docs/general-usage/dataset-access/marimo
159+
:::tip
160+
For an optimized data exploration experience, we recommend using a Marimo notebook. Check out the [detailed guide on using dlt with Marimo](https://dlthub.com/docs/general-usage/dataset-access/marimo).
161+
:::
159162

160163
To access the data, you can use the `dataset()` method:
161164

@@ -176,8 +179,11 @@ For more, see [dataset access guide](../../general-usage/dataset-access).
176179

177180

178181
## Addon: bring your own LLM Scaffold
179-
LLMs can infer REST API Source definition from many kinds of specs and sometimes providing one is fairly easy.
180182

181-
1. If you use Fast API (ie. for internal API) - use autogenerated openAPI spec and refer to it in your prompt.
182-
2. If you have legacy code in any Language, add it to the workspace and refer to it in your prompt.
183-
3. A good human readable documentation also works! You can try to add it ot Cursor docs and refer to it in your prompt.
183+
LLMs can infer a REST API Source definition from various types of input, and in many cases, it’s easy to provide what’s needed.
184+
185+
Here are a few effective ways to scaffold your source:
186+
187+
1. **FastAPI (Internal APIs)**. If you're using FastAPI, simply add a file with the autogenerated OpenAPI spec to your workspace and reference it in your prompt.
188+
2. **Legacy code in any programming language**. Add the relevant code files to your workspace and reference them directly in your prompt. LLM can extract useful structure even from older codebases.
189+
3. **Human-readable documentation**. Well-written documentation works too. You can add it to your Cursor docs and reference it in your prompt for context.

0 commit comments

Comments
 (0)