Skip to content

Commit 5236e15

Browse files
authored
documents standard http session settings, sets shorter timeouts in telemetry (#3074)
1 parent 61625b0 commit 5236e15

File tree

7 files changed

+71
-8
lines changed

7 files changed

+71
-8
lines changed

dlt/common/runtime/anon_tracker.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,10 @@ def init_anon_tracker(config: RuntimeConfiguration) -> None:
3737

3838
# lazily import requests to avoid binding config before initialization
3939
global requests
40-
from dlt.sources.helpers import requests as r_
40+
from dlt.sources.helpers.requests import Client
4141

42-
requests = r_ # type: ignore[assignment]
42+
# fail fast, don't block user
43+
requests = Client(request_timeout=_REQUEST_TIMEOUT, request_max_attempts=0) # type: ignore[assignment]
4344

4445
global _WRITE_KEY, _ANON_TRACKER_ENDPOINT, _THREAD_POOL
4546
# start the pool

dlt/pipeline/platform.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,10 @@ class TPipelineSyncPayload(TypedDict):
2929
def init_platform_tracker() -> None:
3030
# lazily import requests to avoid binding config before initialization
3131
global requests
32-
from dlt.sources.helpers import requests as r_
32+
from dlt.sources.helpers.requests import Client
3333

34-
requests = r_ # type: ignore[assignment]
34+
# fail fast, don't block user
35+
requests = Client(request_timeout=(2, 10), request_max_attempts=0) # type: ignore[assignment]
3536

3637
global _THREAD_POOL
3738
if _THREAD_POOL is None:

docs/website/docs/dlt-ecosystem/verified-sources/rest_api/advanced.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,3 +241,22 @@ source_config = {
241241

242242
In this example, the resource will set the correct encoding for all responses. More callables can be added to the list of response_actions.
243243

244+
245+
### Setup timeouts and retry strategies
246+
`rest_api` uses `dlt` [custom sessions](../../../general-usage/http/requests.md) and [`RESTClient`](../../../general-usage/http/rest-client.md) to access http(s) endpoints. You can use them to configure timeout, retries and other aspects. For example:
247+
```py
248+
from dlt.sources.helpers import requests
249+
250+
source_config: RESTAPIConfig = {
251+
"client": {
252+
"session": requests.Client(request_timeout=(1.0, 1.0), request_max_attempts=0)
253+
},
254+
}
255+
```
256+
will set-up all endpoints to use a short connect and read timeouts with no retries.
257+
Most settings [can be configured](../../../general-usage/http/requests.md#customizing-retry-settings) using `toml` files or environment variables.
258+
259+
:::note
260+
By default, we set connection timeout and read timeout to 60 seconds, with
261+
5 retry attempts without backoff.
262+
:::

docs/website/docs/dlt-ecosystem/verified-sources/rest_api/basic.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,7 @@ The `client` configuration is used to connect to the API's endpoints. It include
240240
- `auth` (optional): Authentication configuration. This can be a simple token, an `AuthConfigBase` object, or a more complex authentication method.
241241
- `session` (requests.Session, optional): A custom session object. When provided, this session will be used for all HTTP requests instead of the default session. Can be used, for example, with [requests-oauthlib](https://github.com/requests/requests-oauthlib) for OAuth authentication.
242242
- `paginator` (optional): Configuration for the default pagination used for resources that support pagination. Refer to the [pagination](#pagination) section for more details.
243+
- `session` (optional): Custom `requests` session to setup custom [timeouts and retry strategies.](advanced.md#setup-timeouts-and-retry-strategies)
243244

244245
#### `resource_defaults` (optional)
245246

docs/website/docs/general-usage/http/requests.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,18 @@ request_timeout = 120 # Timeout in seconds
6363
request_max_retry_delay = 30 # Cap exponential delay to 30 seconds
6464
```
6565

66+
:::note
67+
Default session retires as follows:
68+
69+
```toml
70+
[runtime]
71+
request_timeout=60
72+
request_max_attempts = 5
73+
request_backoff_factor = 1
74+
request_max_retry_delay = 300
75+
```
76+
:::
77+
6678
For more control, you can create your own instance of `dlt.sources.requests.Client` and use that instead of the global client.
6779

6880
This lets you customize which status codes and exceptions to retry on:
@@ -98,6 +110,12 @@ http_client = Client(
98110
retry_condition=retry_if_error_key
99111
)
100112
```
113+
114+
:::tip
115+
`requests.Client` is thread safe. We recommend to share sessions across threads for better performance.
116+
:::
117+
118+
101119
## Handling API Rate Limits
102120

103121
HTTP 429 errors indicate you've hit API rate limits. The dlt requests client retries these automatically and respects `Retry-After` headers. If rate limits persist, consider additional mitigation strategies.

docs/website/docs/general-usage/http/rest-client.md

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,7 @@ Unfortunately, most OAuth 2.0 implementations vary, and thus you might need to s
620620
- `client_secret`: Client credential to obtain authorization. Usually issued via a developer portal.
621621
- `access_token_request_data`: A dictionary with data required by the authorization server apart from the `client_id`, `client_secret`, and `"grant_type": "client_credentials"`. Defaults to `None`.
622622
- `default_token_expiration`: The time in seconds after which the temporary access token expires. Defaults to 3600.
623+
- `session`: Custom `requests` session where you can configure default timeouts and retry strategies
623624

624625
**Example:**
625626

@@ -729,6 +730,30 @@ request_timeout = 120 # Timeout in seconds
729730
request_max_retry_delay = 30 # Cap exponential delay to 30 seconds
730731
```
731732

733+
:::note
734+
`RESTClient` retries by default:
735+
736+
```toml
737+
[runtime]
738+
request_timeout=60
739+
request_max_attempts = 5
740+
request_backoff_factor = 1
741+
request_max_retry_delay = 300
742+
```
743+
:::
744+
745+
### Use custom session
746+
You can pass custom `requests` `Session` to `RESTClient`. `dlt` [provides own implementation](requests.md#customizing-retry-settings) where you can easily configure
747+
retry strategies, timeouts and other factors. For example:
748+
```py
749+
from dlt.sources.helpers import requests
750+
client = RESTClient(
751+
base_url="https://api.example.com",
752+
session=requests.Client(request_timeout=(1.0, 1.0), request_max_attempts=0).session
753+
)
754+
```
755+
will set-up the client for a short connect and read timeouts with no retries.
756+
732757
### URL sanitization and secret protection
733758

734759
The RESTClient automatically sanitizes URLs in logs and error messages to prevent exposure of sensitive information. Query parameters with the following names are automatically redacted:
@@ -820,4 +845,4 @@ for page in client.paginate(
820845
hooks={"response": [response_hook]}
821846
):
822847
print(page)
823-
```
848+
```

tests/common/runtime/test_telemetry.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,19 +137,17 @@ def test_telemetry_endpoint_exceptions(
137137
def test_track_anon_event(
138138
mocker: MockerFixture, disable_temporary_telemetry: RuntimeConfiguration
139139
) -> None:
140-
from dlt.sources.helpers import requests
141140
from dlt.common.runtime import anon_tracker
142141

143142
mock_github_env(os.environ)
144143
mock_pod_env(os.environ)
145144
SENT_ITEMS.clear()
146145
config = SentryLoggerConfiguration()
147146

148-
requests_post = mocker.spy(requests, "post")
149-
150147
props = {"destination_name": "duckdb", "elapsed_time": 712.23123, "success": True}
151148
with patch("dlt.common.runtime.anon_tracker.before_send", _mock_before_send):
152149
start_test_telemetry(config)
150+
requests_post = mocker.spy(anon_tracker.requests, "post")
153151
track("pipeline", "run", props)
154152
# this will send stuff
155153
disable_anon_tracker()

0 commit comments

Comments
 (0)