Skip to content

Commit ea55fbc

Browse files
Clarify that S3 download happens during execute(), not during fetch
Explain why as_pandas/as_arrow/as_polars don't need await: the S3 download is wrapped in asyncio.to_thread inside execute(), so data is already in memory by the time fetch/as_* methods are called. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent f0adaee commit ea55fbc

File tree

4 files changed

+22
-16
lines changed

4 files changed

+22
-16
lines changed

docs/aio.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -177,24 +177,27 @@ Native asyncio versions are available for all cursor types:
177177

178178
### Fetch behavior
179179

180-
Most aio cursors load all result data eagerly during `execute()` (via `asyncio.to_thread`),
181-
so `fetchone()`, `fetchmany()`, and `fetchall()` are synchronous (in-memory only):
180+
For **AioPandasCursor**, **AioArrowCursor**, and **AioPolarsCursor**, the S3 download
181+
(CSV or Parquet) happens inside `execute()`, wrapped in `asyncio.to_thread()`.
182+
By the time `execute()` returns, all data is already loaded into memory.
183+
Therefore `fetchone()`, `fetchall()`, `as_pandas()`, `as_arrow()`, and `as_polars()`
184+
are synchronous (in-memory only) and do not need `await`:
182185

183186
```python
184-
# Pandas, Arrow, Polars — fetch is sync (data already loaded)
185-
await cursor.execute("SELECT * FROM many_rows")
186-
row = cursor.fetchone() # No await needed
187-
rows = cursor.fetchall() # No await needed
188-
df = cursor.as_pandas() # No await needed
187+
# Pandas, Arrow, Polars — S3 download completes during execute()
188+
await cursor.execute("SELECT * FROM many_rows") # Downloads data here
189+
row = cursor.fetchone() # No await — data already in memory
190+
rows = cursor.fetchall() # No await
191+
df = cursor.as_pandas() # No await
189192
```
190193

191194
The exceptions are **AioCursor** and **AioS3FSCursor**, which stream rows lazily from S3.
192-
Their fetch methods require `await`:
195+
Their fetch methods perform I/O and require `await`:
193196

194197
```python
195-
# AioCursor, AioS3FSCursor — fetch is async (reads from S3)
198+
# AioCursor, AioS3FSCursor — fetch reads from S3 lazily
196199
await cursor.execute("SELECT * FROM many_rows")
197-
row = await cursor.fetchone() # Await required
200+
row = await cursor.fetchone() # Await required — reads from S3
198201
rows = await cursor.fetchall() # Await required
199202
```
200203

docs/arrow.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -489,8 +489,9 @@ AioArrowCursor is a native asyncio cursor that returns results as Apache Arrow T
489489
Unlike AsyncArrowCursor which uses `concurrent.futures`, this cursor uses
490490
`asyncio.to_thread()` for result set creation, keeping the event loop free.
491491

492-
Since the result set is loaded eagerly during `execute()`, fetch methods, `as_arrow()`,
493-
and `as_polars()` are synchronous (in-memory only) and do not need `await`.
492+
The S3 download (CSV or Parquet) happens inside `execute()`, wrapped in `asyncio.to_thread()`.
493+
By the time `execute()` returns, all data is already loaded into memory.
494+
Therefore fetch methods, `as_arrow()`, and `as_polars()` are synchronous and do not need `await`.
494495

495496
```python
496497
from pyathena import aconnect

docs/pandas.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -776,8 +776,9 @@ AioPandasCursor is a native asyncio cursor that returns results as pandas DataFr
776776
Unlike AsyncPandasCursor which uses `concurrent.futures`, this cursor uses
777777
`asyncio.to_thread()` for result set creation, keeping the event loop free.
778778

779-
Since the result set is loaded eagerly during `execute()`, fetch methods and `as_pandas()`
780-
are synchronous (in-memory only) and do not need `await`.
779+
The S3 download (CSV or Parquet) happens inside `execute()`, wrapped in `asyncio.to_thread()`.
780+
By the time `execute()` returns, all data is already loaded into memory.
781+
Therefore fetch methods and `as_pandas()` are synchronous and do not need `await`.
781782

782783
```python
783784
from pyathena import aconnect

docs/polars.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -583,8 +583,9 @@ AioPolarsCursor is a native asyncio cursor that returns results as Polars DataFr
583583
Unlike AsyncPolarsCursor which uses `concurrent.futures`, this cursor uses
584584
`asyncio.to_thread()` for result set creation, keeping the event loop free.
585585

586-
Since the result set is loaded eagerly during `execute()`, fetch methods, `as_polars()`,
587-
and `as_arrow()` are synchronous (in-memory only) and do not need `await`.
586+
The S3 download (CSV or Parquet) happens inside `execute()`, wrapped in `asyncio.to_thread()`.
587+
By the time `execute()` returns, all data is already loaded into memory.
588+
Therefore fetch methods, `as_polars()`, and `as_arrow()` are synchronous and do not need `await`.
588589

589590
```python
590591
from pyathena import aconnect

0 commit comments

Comments
 (0)