Skip to content

Commit c5d7ad7

Browse files
Expand README with Configuration section and clearer limitations and troubleshooting.
1 parent 27db760 commit c5d7ad7

1 file changed

Lines changed: 141 additions & 22 deletions

File tree

README.md

Lines changed: 141 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,20 @@
1616
- [Environment Variables](#environment-variables)
1717
- [Install](#install)
1818
- [Basic Usage](#basic-usage)
19+
- [Configuration](#configuration)
20+
- [API token](#api-token)
21+
- [Download directory](#download-directory)
22+
- [Rate limiting](#rate-limiting)
23+
- [Per-download callback](#per-download-callback)
24+
- [Skip existing files](#skip-existing-files)
25+
- [Download results](#download-results)
1926
- [Known Limitations](#known-limitations)
20-
- [Access is IP address based only.](#access-is-ip-address-based-only)
21-
- [SICI DOIs are not supported.](#sici-dois-are-not-supported)
27+
- [Access is IP address based only](#access-is-ip-address-based-only)
28+
- [SICI DOIs are not supported](#sici-dois-are-not-supported)
2229
- [Troubleshooting](#troubleshooting)
2330
- [Installation](#installation)
2431
- [Access denied](#access-denied)
32+
- [Other issues](#other-issues)
2533
- [Contributing](#contributing)
2634
- [License](#license)
2735

@@ -66,7 +74,7 @@ You will require the following:
6674
* Python dependencies:
6775
* [requests](https://requests.readthedocs.io/) (≥2.33.1)
6876
* A [Wiley Online Library](https://onlinelibrary.wiley.com/) (WOL) Account
69-
* A TDM API Token, available from the WOL [TDM resources page](https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining) using your WOL Account
77+
* Your TDM API Token (UUID format), available from the WOL [TDM resources page](https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining) using your WOL Account
7078
* Access to the content you wish to download
7179
* Access will be determined via your [public IP address](https://api.ipify.org/?format=json)
7280

@@ -76,11 +84,12 @@ You will require the following:
7684

7785
Set the environment variable `TDM_API_TOKEN` to your API token:
7886

79-
Linux example
8087
```bash
81-
# Set your TDM API token (required)
88+
# Windows (PowerShell)
89+
$env:TDM_API_TOKEN = 'your-api-token-here'
90+
91+
# macOS / Linux
8292
export TDM_API_TOKEN='your-api-token-here'
83-
echo $TDM_API_TOKEN
8493
```
8594

8695
### Install
@@ -130,46 +139,143 @@ tdm.download_pdfs("dois.txt")
130139

131140
See more [examples](examples/).
132141

142+
## Configuration
143+
144+
`TDMClient` exposes several options to control where files are saved, how requests are paced, and what gets recorded. All paths are relative to your current working directory unless you use an absolute path.
145+
146+
### API token
147+
148+
The TDM API token is a UUID issued via the [TDM resources page](https://onlinelibrary.wiley.com/library-info/resources/text-and-datamining). Provide it via the `TDM_API_TOKEN` environment variable (recommended) or pass it directly:
149+
150+
```python
151+
from wiley_tdm import TDMClient
152+
153+
tdm = TDMClient(api_token="your-uuid-token-here")
154+
```
155+
156+
### Download directory
157+
158+
PDFs are saved to a `downloads` folder by default. Set a custom directory when creating the client, or change it later:
159+
160+
```python
161+
from pathlib import Path
162+
from wiley_tdm import TDMClient
163+
164+
# At initialization
165+
tdm = TDMClient(download_dir=Path("downloads") / "oa-pdfs")
166+
167+
# Or after initialization
168+
tdm.download_dir = "downloads/batch-2025"
169+
```
170+
171+
Files are named `<doi>.pdf` in the download directory.
172+
173+
### Rate limiting
174+
175+
Batch downloads (`download_pdfs`) wait between API requests to stay within Wiley's rate limits. The default pause is **5 seconds**; you can increase it, but not set it below the default:
176+
177+
```python
178+
tdm = TDMClient()
179+
tdm.api_rate_limit = 10.0 # Wait 10 seconds between downloads in a batch
180+
```
181+
182+
Single-article downloads (`download_pdf`) are not delayed.
183+
184+
### Per-download callback
185+
186+
For batch downloads, pass an `on_result` callback to `download_pdfs`. It is called after each DOI finishes, so you can inject your own code into the loop without waiting for the whole batch to complete.
187+
188+
```python
189+
from wiley_tdm import DownloadResult, TDMClient
190+
191+
def track_progress(result: DownloadResult) -> None:
192+
# Your integration: database, queue, metrics, UI, etc.
193+
record_in_my_system(doi=result.doi, status=result.status.name, path=result.path)
194+
195+
tdm = TDMClient()
196+
tdm.download_pdfs("dois.txt", on_result=track_progress)
197+
```
198+
199+
The callback receives the same `DownloadResult` returned by `download_pdf`. `download_pdfs` still returns the full list when the batch completes.
200+
201+
### Skip existing files
202+
203+
By default, `skip_existing_files` is `True`: if `<doi>.pdf` already exists in the download directory, the client skips the API call and records the file as already present. Set to `False` to re-download:
204+
205+
```python
206+
tdm = TDMClient()
207+
tdm.skip_existing_files = False
208+
tdm.download_pdf("10.1111/jtsb.12390")
209+
```
210+
211+
### Download results
212+
213+
Each download returns a `DownloadResult` with status, file path, and timing. Access accumulated results via `tdm.results`, or save only failures:
214+
215+
```python
216+
from wiley_tdm import DownloadStatus, TDMClient
217+
218+
tdm = TDMClient()
219+
tdm.only_record_errors = True # Default is False — only failures kept in tdm.results
220+
221+
result = tdm.download_pdf("10.1111/jtsb.12390")
222+
if result.status == DownloadStatus.SUCCESS:
223+
print(f"Saved to {result.path}")
224+
225+
tdm.download_pdfs(["10.1111/jtsb.12390", "10.1111/jlse.12141"])
226+
tdm.save_results("my-results.csv") # Default is results.csv
227+
```
228+
133229
## Known Limitations
134230

135-
There are two known limitations of the TDM Client (/TDM API)
231+
There are two known limitations of the TDM Client (/TDM API):
136232

137-
### Access is IP address based only.
233+
### Access is IP address based only
138234

139235
The following scenarios are not supported:
140-
* Your WOL customer account doesn't have IP based access configured. (e.g. SSO only)
141-
* You do have IP based access but you are on a different network or subnet to the one configured. (e.g. Off campus)
236+
* Your WOL customer account **doesn't** have IP based access configured. (e.g. SSO only)
237+
* Your WOL customer account **does** have IP based access configured but you are outside the configured IP range. (e.g. Executing code off campus or in the Cloud)
142238

143-
Ask your WOL Account Admin to confirm your access model. See [Troubleshooting]([#Troubleshooting) for further assistance.
239+
Ask your WOL Account Admin to confirm your access model. See [Troubleshooting](#troubleshooting) for further assistance.
144240

145-
Potential Workarounds:
146-
* Return to campus.
241+
**Potential Workarounds:**
242+
243+
**IP access available:**
244+
* Run code within configured IP range (e.g. Return to campus)
147245
* Manually download entitled content on WOL, via https://onlinelibrary.wiley.com/doi/epdf/{doi}
246+
247+
**IP access not available:**
148248
* Request IP based access via your WOL Account Admin.
149-
* Request feed-based (non‑API‑based) content dissemination for TDM with Wiley's Digital Licensing team.
249+
* Request feed-based (non‑API‑based) content dissemination for TDM via your WOL Account Admin.
150250

151-
### SICI DOIs are not supported.
251+
### SICI DOIs are not supported
152252

153-
The TDM APIs cannot support [SICI](https://en.wikipedia.org/wiki/Serial_Item_and_Contribution_Identifier) DOIs. For example:
253+
The TDM APIs cannot support [SICI](https://en.wikipedia.org/wiki/Serial_Item_and_Contribution_Identifier) DOIs, containing semicolons. For example:
154254
* 10.1002/1096-9861(20010212)430:3<283::AID-CNE1031>3.0.CO;2-V
255+
* 10.1002/(SICI)1096-8644(1998)107:27+<1::AID-AJPA2>3.0.CO;2-H
155256

156-
Potential Workarounds:
257+
**Potential Workarounds:**
157258
* Manually download entitled content on WOL, via https://onlinelibrary.wiley.com/doi/epdf/{doi}
158259

159260
## Troubleshooting
160261

161262
In most troubleshooting scenarios it can be helpful to enable logging and generate a report:
162263

163264
```python
164-
# Enable logging
265+
import logging
266+
from wiley_tdm import TDMClient
267+
165268
logging.basicConfig(
166269
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
167270
)
168271

169-
# Save the download results to a CSV file: 'results.csv'
272+
tdm = TDMClient()
273+
# ... run downloads ...
170274
tdm.save_results()
171275
```
172276

277+
Review the console logs and results.csv. For deeper analysis set `level=logging.DEBUG`
278+
173279
### Installation
174280

175281
If you encounter installation issues:
@@ -200,11 +306,13 @@ If problems persist, please [open an issue](https://github.com/WileyLabs/tdm-cli
200306

201307
### Access denied
202308

309+
The majority of reported issues relate to access problems. Please note that only IP based access is supported, see [Known Limitations](#known-limitations). If you do have IP based access and are still experiencing issues try the following:
310+
203311
Check access directly on [Wiley Online Library](https://onlinelibrary.wiley.com/).
204312
- If access denied: contact your Institution/Wiley and check your subscription is active.
205313
- If access granted: ensure you are accessing the TDM API from a known IP address (see below).
206314

207-
It is possible that the IP address you are accessing WOL from is different to where you are running your TDM code. Observe your IP address in the TDM console log and compare to the IP address in your [browser](https://api.ipify.org?format=json).
315+
It is possible that the IP address you are accessing WOL from is different to where you are running your TDM code. Enable TDM logging (see [Troubleshooting](#troubleshooting)), observe your IP address in the TDM console log and compare to the IP address in your [browser](https://api.ipify.org?format=json).
208316

209317
Example console output:
210318
```
@@ -213,14 +321,25 @@ Example console output:
213321

214322
Example Browser output:
215323

324+
https://api.ipify.org/?format=json
325+
216326
```json
217-
// https://api.ipify.org/?format=json
218327
{
219328
"ip": "XX.XX.XX.XX"
220329
}
221330
```
331+
If the IP addresses are different then seek guidance from your network administrator.
332+
333+
### Other issues
334+
335+
For other issues please contact: tdm@wiley.com and provide the following information:
336+
337+
* TDM API Token (UUID)
338+
* results.csv
339+
* console log
340+
* dois.txt (problematic DOIs)
341+
* Summary of problem
222342

223-
If problems persist, please contact: tdm@wiley.com
224343

225344
## Contributing
226345

0 commit comments

Comments
 (0)