Skip to content

Commit 5270d0d

Browse files
committed
feat: adds new arxiv and arxiv-search commands and handling of optional inclusion of images from the papers
1 parent 6bf3619 commit 5270d0d

File tree

7 files changed

+1281
-180
lines changed

7 files changed

+1281
-180
lines changed

README.md

Lines changed: 160 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
[![Tests](https://github.com/agustif/llm-arxiv/actions/workflows/test.yml/badge.svg)](https://github.com/agustif/llm-arxiv/actions/workflows/test.yml)
66
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/agustif/llm-arxiv/blob/main/LICENSE)
77

8-
LLM plugin for loading arXiv paper text and images content using the `arxiv` API and PyMuPDF.
8+
LLM plugin for loading arXiv papers and their images.
9+
10+
This plugin allows you to search for arXiv papers, fetch their text content, and optionally, their images directly into `llm`.
911

1012
## Installation
1113

@@ -15,42 +17,175 @@ Install this plugin in the same environment as [LLM](https://llm.datasette.io/).
1517
llm install llm-arxiv
1618
```
1719

18-
This plugin requires the `arxiv` and `PyMuPDF` packages.
20+
The command above will also install the necessary dependencies: `arxiv`, `PyMuPDF`, and `Pillow`.
1921

2022
## Usage
2123

22-
This plugin adds support for the `arxiv:` fragment prefix.
23-
You can use it to load from its arXiv ID or URL.
24+
This plugin provides three main ways to interact with arXiv papers:
2425

25-
```bash
26-
# Load by arXiv ID
27-
llm -f arxiv:2310.06825
26+
1. **As a fragment loader:** Allows you to inject arXiv paper content (text and optionally images) directly into a prompt using the `-f` or `--fragment` option with `llm`.
27+
2. **As a standalone command (`llm arxiv`):** Provides an `llm arxiv` command to fetch, process, and output paper content directly to stdout, which can then be piped to other commands or models.
28+
3. **As a search command (`llm arxiv-search`):** Allows you to search arXiv for papers based on a query string.
2829

29-
# Load by full URL
30-
llm -f arxiv:https://arxiv.org/abs/2310.06825
31-
```
32-
When the paper contains images, placeholders like ``See attached image 1`` are
33-
inserted into the text and the image data is returned as attachments.
30+
### 1. Fragment Loader (`-f arxiv:...`)
31+
32+
You can load an arXiv paper by its ID or full URL. The text content (converted to Markdown) and any selected images (as attachments) will be passed to the language model.
33+
34+
**Syntax:**
35+
36+
`llm -f 'arxiv:PAPER_ID_OR_URL[?options]' "Your prompt here..."`
37+
38+
* `PAPER_ID_OR_URL`: Can be an arXiv ID (e.g., `2310.06825`, `astro-ph/0601009`) or a full arXiv URL (e.g., `https://arxiv.org/abs/2310.06825`, `http://arxiv.org/pdf/2310.06825.pdf`).
39+
* `[?options]`: Optional query parameters to control image inclusion and resizing. (Remember to quote the argument if using `?` or `&` in your shell).
40+
41+
**Fragment Loader Options:**
42+
43+
* `i` / `include_images`: Controls image inclusion. If not specified, no images are included.
44+
* `?i` or `?i=` or `?i=all`: Include all images from the paper.
45+
* `?i=none`: Include no images (same as omitting `?i`).
46+
* `?i=P:pages`: Include all images from specified pages. `pages` is a comma-separated list of page numbers or ranges (e.g., `P:1`, `P:1,3-5`, `P:2,4`). Page numbers are 1-indexed.
47+
* `?i=G:indices`: Include images by their global index in the document (sequentially numbered as they appear). `indices` is a comma-separated list of image indices or ranges (e.g., `G:1`, `G:1-5,10`). Indices are 1-indexed.
48+
* `r` / `resize_images`: Controls image resizing. Resizing only applies if images are included.
49+
* `?r` or `?r=true`: Enable image resizing. Images will be resized to a maximum dimension of 512px by default, preserving aspect ratio. Only images larger than this will be downscaled.
50+
* `?r=PIXELS`: Enable image resizing and set a custom maximum dimension (e.g., `?r=800`).
51+
52+
**Examples (Fragment Loader):**
53+
54+
* Load text only:
55+
```bash
56+
llm -f 'arxiv:2310.06825' "Summarize this paper."
57+
```
58+
* Load text and all images (resized to default 512px max):
59+
```bash
60+
llm -f 'arxiv:2310.06825?i&r' -m gpt-4-vision-preview "Explain the diagrams in this paper."
61+
```
62+
* Load text and images from page 1 and 3, resized to 800px max:
63+
```bash
64+
llm -f 'arxiv:2310.06825?i=P:1,3&r=800' -m gemini-pro-vision "Describe the images on pages 1 and 3."
65+
```
66+
* Load text and the first 5 globally indexed images, no resizing:
67+
```bash
68+
llm -f 'arxiv:2310.06825?i=G:1-5' -m some-image-model "What do the first five images show?"
69+
```
70+
71+
### 2. Standalone Command (`llm arxiv ...`)
72+
73+
The `llm arxiv` command fetches and processes an arXiv paper.
74+
* If no prompt is provided, it outputs the paper's content as Markdown to standard output. This can be piped to other commands or LLMs.
75+
* If a `PROMPT` is provided, it processes the paper content (including any selected images as attachments) with the specified or default LLM.
76+
77+
**Syntax:**
78+
79+
`llm arxiv PAPER_ID_OR_URL [PROMPT] [OPTIONS]`
80+
81+
**Arguments:**
82+
83+
* `PAPER_ID_OR_URL`: The arXiv ID (e.g., `2310.06825`) or full URL.
84+
* `PROMPT` (Optional): A prompt to send to an LLM along with the paper's content.
85+
86+
**Command Options:**
87+
88+
* `-i SPEC` / `--include-images SPEC`:
89+
Controls image inclusion. If not specified and a prompt is given, `parse_image_selection_spec`'s default behavior for `None` (no images) applies. If no prompt is given, no images are processed by default.
90+
* `-i all` or (if `PROMPT` is present) simply `-i` with no value: Include all images.
91+
* `-i ""` (empty string value): Include all images.
92+
* `-i none`: Include no images.
93+
* `-i P:pages`: Include all images from specified pages (e.g., `P:1`, `P:1,3-5`).
94+
* `-i G:indices`: Include images by their global index (e.g., `G:1`, `G:1-5,10`).
95+
* `-r` / `--resize-images`:
96+
Enable image resizing. Images will be resized to a maximum dimension of 512px by default, preserving aspect ratio. Only images larger than this will be downscaled.
97+
* `-d PIXELS` / `--max-dimension PIXELS`:
98+
Set a custom maximum dimension in pixels for resizing. Requires `-r` to be active.
99+
* `-m MODEL_ID` / `--model MODEL_ID`:
100+
Specify the LLM model to use if a `PROMPT` is provided.
101+
* `-s SYSTEM_PROMPT` / `--system SYSTEM_PROMPT`:
102+
Specify a system prompt to use with the LLM if a `PROMPT` is provided.
103+
104+
**Examples (Standalone Command):**
105+
106+
* Get Markdown content of a paper:
107+
```bash
108+
llm arxiv 2310.06825
109+
```
110+
* Get Markdown, prepare all images (resized), then pipe to a model:
111+
```bash
112+
llm arxiv 2310.06825 -i all -r | llm -m gpt-4-vision-preview "Summarize this, paying attention to figures."
113+
```
114+
* Directly prompt an LLM with the paper's content and images from pages 2 and 4 (resized to 600px):
115+
```bash
116+
llm arxiv 2310.06825 "Explain figures on page 2 and 4." -i P:2,4 -r -d 600 -m gpt-4o
117+
```
118+
* Summarize a paper using the default LLM and include all images:
119+
```bash
120+
llm arxiv 2310.06825 "Summarize the key findings." -i all
121+
```
34122

123+
### 3. Search Command (`llm arxiv-search ...`)
124+
125+
The `llm arxiv-search` command allows you to search for papers on arXiv using a query string.
126+
127+
**Syntax:**
128+
129+
`llm arxiv-search [OPTIONS] QUERY_STRING`
130+
131+
**Arguments:**
132+
133+
* `QUERY_STRING`: The search query (e.g., "quantum computing", "author:Hawking title:black holes"). See [arXiv API user manual](https://arxiv.org/help/api/user-manual#query_details) for advanced query syntax.
134+
135+
**Options:**
136+
137+
* `-n INT`, `--max-results INT`: Maximum number of search results to return (Default: `5`).
138+
* `--sort-by [relevance|lastUpdatedDate|submittedDate]`: Sort order for search results (Default: `relevance`).
139+
* `--details`: Show more details for each result, including authors, full abstract, categories, publication/update dates, and PDF link.
140+
141+
**Output:**
142+
143+
For each search result, the command will display:
144+
* The paper's ID and Title.
145+
* A suggested command to fetch the full paper with `llm arxiv <ID>`. This command is styled (e.g., bold, green, underlined, prefixed with `$`) for visibility.
146+
* A brief abstract (or full details if `--details` is used).
147+
148+
Additionally, the script will attempt to copy all the suggested `llm arxiv <ID>` commands (newline-separated) to your system clipboard using an OSC 52 escape sequence. A message like `(Attempted to copy N command(s) to clipboard)` will be printed to stderr. The success of this automatic copy depends on your terminal emulator's support and configuration (e.g., iTerm2 needs clipboard access enabled for applications).
149+
150+
**Examples (Search Command):**
151+
152+
* Search for "large language models" and get top 3 results (brief):
153+
```bash
154+
llm arxiv-search -n 3 "large language models"
155+
```
156+
(This will also attempt to copy the 3 suggested `llm arxiv` commands to your clipboard.)
157+
158+
* Search for papers by author "Hinton" on "neural networks", sorted by submission date, with full details:
159+
```bash
160+
llm arxiv-search --sort-by submittedDate --details "au:Hinton AND ti:\"neural network\""
161+
```
162+
163+
## Image Handling Notes
164+
165+
* **Rationale for Optional Images:** Processing and including images can significantly increase the data size sent to language models. Many models have limitations on input context window size, and some may not support image inputs at all or may incur higher costs for them. The granular controls for image inclusion (all, none, specific pages/indices) and resizing allow users to manage this, ensuring that only necessary visual information is passed to the LLM, optimizing for cost, speed, and model compatibility.
166+
* Images are extracted from the PDF, converted to Markdown placeholders `[IMAGE: http://arxiv.org/abs/ID#page_X_img_Y]`, and attached as `llm.Attachment` objects if selected.
167+
* Supported input image formats from PDFs include common types like JPEG, PNG, GIF, BMP. Efforts are made to convert others, but complex or rare formats might be skipped.
168+
* When resized, images are converted to JPEG (for most common types) or PNG (if transparency or other features warrant it) to save tokens and improve compatibility with models.
169+
* Image processing errors are printed to `stderr` but do not stop the text extraction.
35170

36171
## Development
37172

38-
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
173+
To contribute to this plugin, clone the repository and install it in editable mode:
174+
39175
```bash
176+
git clone https://github.com/agustif/llm-arxiv.git
40177
cd llm-arxiv
178+
# It's recommended to use a virtual environment
41179
python -m venv venv
42-
source venv/bin/activate
43-
```
44-
Now install the dependencies and test dependencies:
45-
```bash
46-
# Install in editable mode with test dependencies
47-
python -m pip install -e '.[test]'
48-
```
49-
To run the tests:
50-
```bash
51-
pytest
180+
source venv/bin/activate # On Windows use `venv\\Scripts\\activate`
181+
# Install in editable mode
182+
pip install -e .
183+
# Install additional dependencies for testing (e.g., pytest, pytest-cov)
184+
pip install pytest pytest-cov
185+
# Run tests
186+
pytest tests/
52187
```
53188

54-
## AI Agent Guidance
189+
## AGENTS.md
55190

56-
For AI agents or assistants working on this codebase, please refer to the [AGENTS.md](AGENTS.md) file for specific instructions, codebase details, and development guidelines.
191+
See [AGENTS.md](AGENTS.md) for notes on how AI agents should interpret and use this tool and its outputs.

arxiv.py

Lines changed: 0 additions & 23 deletions
This file was deleted.

fitz.py

Lines changed: 0 additions & 2 deletions
This file was deleted.

llm.py

Lines changed: 0 additions & 17 deletions
This file was deleted.

0 commit comments

Comments
 (0)