You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/website/docs/dlt-ecosystem/llm-tooling/cursor-restapi.md
+44-38Lines changed: 44 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,9 +12,9 @@ This guide walks you through a collaborative AI-human workflow for extracting an
12
12
13
13
You will learn:
14
14
1. How to use dltHub's [LLM-context database](https://dlthub.com/workspace) to init workspace for the source you need.
15
-
2. How to build a REST API source in minutes with AI assistance
16
-
3. How to debug the pipeline and explore data using the pipeline dashboard
17
-
4. How to start a new notebook and use the pipeline's dataset in it
15
+
2. How to build a REST API source in minutes with AI assistance.
16
+
3. How to debug the pipeline and explore data using the pipeline dashboard.
17
+
4. How to start a new notebook and use the pipeline's dataset in it.
18
18
19
19
## Prerequisites
20
20
@@ -25,13 +25,14 @@ You will learn:
25
25
Before diving into the workflow, here’s a quick overview of key terms you’ll encounter:
26
26
27
27
1.**dltHub Workspace** - An environment where all data engineering tasks, from writing code to maintenance in production, can be executed by single developer:
28
-
> TODO: rewrite this
29
-
- Develop and test locally with `dlt`, `duckdb` and `filesystem` then run in the cloud without any changes to code and schemas.
30
-
- Deploy and run dlt pipelines, transformations, and notebooks with one command
31
-
- Maintain pipelines with a Runtime Agent, customizable dashboards, and validation tests
32
-
- Deliver live, production-ready reports without worrying about schema drift or silent failures
28
+
- Develop and test data pipelines locally
29
+
- Run dlt pipelines, transformations, and notebooks with one command
30
+
- Deliver live, production-ready reports with streamlined access to the dataset
33
31
34
-
It's not yet fully available, but you can start with the initial workflow: LLM-native pipeline development for 1,000+ REST APIs.
32
+
We plan to support more functionality in the future, such as:
33
+
- Deploy and run your data workflows in the cloud without any changes to code and schemas
34
+
- Maintain pipelines with a Runtime Agent, customizable dashboards, and validation tests
35
+
- Deploy live, reports without worrying about schema drift or silent failures
35
36
36
37
2.**[Cursor](https://cursor.com/)** - An AI-powered code editor that lets you express tasks in natural language for an LLM agent to implement. This LLM-native workflow isn’t exclusive to Cursor, but it’s the first AI code editor we’ve integrated with.
37
38
@@ -41,20 +42,16 @@ Before diving into the workflow, here’s a quick overview of key terms you’ll
41
42
42
43
### Setup Cursor
43
44
44
-
> TODO: review and make this section smooth
45
-
46
45
1. Use the right model
47
-
For best results, use Claude 3.7-sonnet or Gemini 2.5+. Weaker models struggle with context comprehension and workflow consistency.
48
-
We've had the best results with Claude 3.7-sonnet (which requires the paid version of Cursor). Weaker models were not able to comprehend the required context fully and were not able to use tools and follow workflows consistently.
46
+
For best results, use Claude 3.7-sonnet, Gemini 2.5+ or higher models. Weaker models struggle with context comprehension and workflow consistency.
47
+
We've observed the best results with Claude 3.7-sonnet (which requires the paid version of Cursor).
49
48
50
49
2. Add documentation
50
+
AI code editors let you upload documentation and code examples to provide additional context. [Here](https://docs.cursor.com/context/@-symbols/@-docs) you can learn how to do it with Cursor.
51
+
Go to `Cursor Settings > Indexing & Docs` to see all your added documentation. You can edit, delete, or add new docs here. We recommend adding documentation scoped for a specific task. Add the following documentation links:
51
52
52
-
AI code editors let you upload documentation and code examples to provide additional context. [Here](https://docs.cursor.com/context/@-symbols/@-docs) you can learn how to do it.
53
-
54
-
Under Cursor `Settings > Features > Docs`, you can see all the docs you have added. You can edit, delete, or add new docs here. We recommend adding documentation scoped for a specific task. For example, for developing a REST API source, consider adding:
55
-
56
-
*[REST API Source](../verified-sources/rest_api/) documentation
57
-
53
+
* [REST API Source](../verified-sources/rest_api/) as `@dlt rest api`
54
+
* [Core dlt concepts & usage](https://dlthub.com/docs/general-usage/resource) as `@dlt docs`
58
55
59
56
### Install dlt Workspace
60
57
@@ -66,38 +63,39 @@ pip install dlt[workspace]
66
63
67
64
dltHub provides prepared contexts for 1000+ sources, available at [https://dlthub.com/workspace](https://dlthub.com/workspace). To get started, search for your API and follow the tailored instructions.
68
65
66
+
<divstyle={{textAlign:'center'}}>
67
+

68
+
</div>
69
69
70
-
To initialize your workspace, execute this dltHub Workspace command:
70
+
To initialize dltHub Workspace, execute the following:
71
71
72
72
```sh
73
73
dlt init dlthub:{source_name} duckdb
74
74
```
75
75
76
76
This command will initialize the dltHub Workspace with:
77
-
-files and folder structure you know from [dlt init](../../walkthroughs/create-a-pipeline.md)
78
-
- Documentation scaffold for the specific source (typically a `yaml` file)
77
+
-Files and folder structure you know from [dlt init](../../walkthroughs/create-a-pipeline.md)
78
+
- Documentation scaffold for the specific source (typically a `yaml` file) optimized for LLMs
79
79
- Cursor rules tailored for `dlt`
80
80
- Pipeline script and REST API Source (`{source_name}_pipeline.py`) definition that you'll customize in next step
81
81
82
82
:::tip
83
-
If you can't find the source you need, start with a generic REST API Source template. Choose source name you need ie.
83
+
If you can't find the source you need, start with a generic REST API Source template. Choose source name you need i.e.
84
84
```sh
85
85
dlt init dlthub:my_internal_fast_api duckdb
86
86
```
87
-
You'll still get full cursor setup and pipeline script (`my_internal_fast_api_pipeline.py`) plus all files and folder you get with regular [dlt init](../../walkthroughs/create-a-pipeline.md).
88
-
89
-
You'll need to [provide an useful REST API scaffold](#addon-bring-your-own-llm-scaffold) for your LLM model, though.
90
-
87
+
This will generate the full pipeline setup, including the script (`my_internal_fast_api_pipeline.py`) and all the files and folders you’d normally get with a standard [dlt init](../../walkthroughs/create-a-pipeline.md).
88
+
To make your source available to the LLM, be sure to [include the documentation](#addon-bring-your-own-llm-scaffold) in the context so the model can understand how to use it.
91
89
:::
92
90
93
-
94
91
## Create dlt pipeline
95
92
96
93
### Generate code
97
94
98
-
We recommend starting with our prepared prompts for each API. Visit [https://dlthub.com/workspace](https://dlthub.com/workspace) and copy the suggested prompt for your chosen source. Note that the prompt may vary depending on the API to ensure the best context and accuracy.
95
+
To get started quickly, we recommend using our pre-defined prompts tailored for each API. Visit [https://dlthub.com/workspace](https://dlthub.com/workspace) and copy the prompt for your selected source.
96
+
Prompts are adjusted per API to provide the most accurate and relevant context.
99
97
100
-
Here's a general prompt template:
98
+
Here's a general prompt template you can adapt:
101
99
102
100
```text
103
101
Please generate a REST API Source for {source} API, as specified in @{source}-docs.yaml
@@ -109,8 +107,11 @@ Use @dlt rest api as a tutorial.
109
107
After adding the endpoints, allow the user to run the pipeline with python {source}_pipeline.py and await further instructions.
110
108
```
111
109
112
-
> TODO: the crucial part is to explain the context in the prompt. this is basically why we write this docs - to give a little background to the walkthrough on the website.
113
-
> in the prompt above: we link to the scaffold/spec, pipeline script, dlt rest api docs etc. this should be explained
110
+
YIn this prompt, we use `@` references to link to source specifications and documentation. Make sure Cursor recognizes the referenced docs.
111
+
You can learn more about [referencing with @ in Cursor](https://docs.cursor.com/context/@-symbols/overview).
112
+
113
+
*`@{source}-docs.yaml` contains the source specification. Describes the source with endpoints, parameters, and other details.
114
+
*`@dlt rest api` contains the documentation for dlt's REST API source.
114
115
115
116
### Add credentials
116
117
@@ -147,15 +148,17 @@ dlt pipeline {source}_pipeline show --dashboard
147
148
The dashboard shows:
148
149
- Pipeline overview with state and metrics
149
150
- Data schema (tables, columns, types)
150
-
- Data itself - you can even write custom queries
151
+
- Data itself, you can even write custom queries
151
152
152
153
The dashboard helps detect silent failures due to pagination errors, schema drift, or incremental load misconfigurations.
153
154
154
155
## Use the data in a Notebook
155
156
156
157
With the pipeline and data validated, you can continue with custom data explorations and reports. You can use your preferred environment, for example, [Jupyter Notebook](https://jupyter.org/), [Marimo Notebook](https://marimo.io/), or a plain Python file.
157
158
158
-
> TODO: (1) maybe a short instruction how to bootstrap marimo notebook would help? (2) we have some instructions in our docs already https://dlthub.com/docs/general-usage/dataset-access/marimo
159
+
:::tip
160
+
For an optimized data exploration experience, we recommend using a Marimo notebook. Check out the [detailed guide on using dlt with Marimo](https://dlthub.com/docs/general-usage/dataset-access/marimo).
161
+
:::
159
162
160
163
To access the data, you can use the `dataset()` method:
161
164
@@ -176,8 +179,11 @@ For more, see [dataset access guide](../../general-usage/dataset-access).
176
179
177
180
178
181
## Addon: bring your own LLM Scaffold
179
-
LLMs can infer REST API Source definition from many kinds of specs and sometimes providing one is fairly easy.
180
182
181
-
1. If you use Fast API (ie. for internal API) - use autogenerated openAPI spec and refer to it in your prompt.
182
-
2. If you have legacy code in any Language, add it to the workspace and refer to it in your prompt.
183
-
3. A good human readable documentation also works! You can try to add it ot Cursor docs and refer to it in your prompt.
183
+
LLMs can infer a REST API Source definition from various types of input, and in many cases, it’s easy to provide what’s needed.
184
+
185
+
Here are a few effective ways to scaffold your source:
186
+
187
+
1.**FastAPI (Internal APIs)**. If you're using FastAPI, simply add a file with the autogenerated OpenAPI spec to your workspace and reference it in your prompt.
188
+
2.**Legacy code in any programming language**. Add the relevant code files to your workspace and reference them directly in your prompt. LLM can extract useful structure even from older codebases.
189
+
3.**Human-readable documentation**. Well-written documentation works too. You can add it to your Cursor docs and reference it in your prompt for context.
0 commit comments