Skip to content

Commit 0b260ec

Browse files
authored
feat(skills): add pdf-reader skill (qwibitai#772)
Thanks @glifocat! Clean skill package — good docs, solid tests, nice intent files. Pushed a small fix for path traversal on the PDF filename before merging.
1 parent 1e89d61 commit 0b260ec

12 files changed

Lines changed: 2238 additions & 0 deletions

File tree

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
name: add-pdf-reader
3+
description: Add PDF reading to NanoClaw agents. Extracts text from PDFs via pdftotext CLI. Handles WhatsApp attachments, URLs, and local files.
4+
---
5+
6+
# Add PDF Reader
7+
8+
Adds PDF reading capability to all container agents using poppler-utils (pdftotext/pdfinfo). PDFs sent as WhatsApp attachments are auto-downloaded to the group workspace.
9+
10+
## Phase 1: Pre-flight
11+
12+
### Check if already applied
13+
14+
Read `.nanoclaw/state.yaml`. If `add-pdf-reader` is in `applied_skills`, skip to Phase 3 (Verify).
15+
16+
## Phase 2: Apply Code Changes
17+
18+
### Initialize skills system (if needed)
19+
20+
If `.nanoclaw/` directory doesn't exist:
21+
22+
```bash
23+
npx tsx scripts/apply-skill.ts --init
24+
```
25+
26+
### Apply the skill
27+
28+
```bash
29+
npx tsx scripts/apply-skill.ts .claude/skills/add-pdf-reader
30+
```
31+
32+
This deterministically:
33+
- Adds `container/skills/pdf-reader/SKILL.md` (agent-facing documentation)
34+
- Adds `container/skills/pdf-reader/pdf-reader` (CLI script)
35+
- Three-way merges `poppler-utils` + COPY into `container/Dockerfile`
36+
- Three-way merges PDF attachment download into `src/channels/whatsapp.ts`
37+
- Three-way merges PDF tests into `src/channels/whatsapp.test.ts`
38+
- Records application in `.nanoclaw/state.yaml`
39+
40+
If merge conflicts occur, read the intent files:
41+
- `modify/container/Dockerfile.intent.md`
42+
- `modify/src/channels/whatsapp.ts.intent.md`
43+
- `modify/src/channels/whatsapp.test.ts.intent.md`
44+
45+
### Validate
46+
47+
```bash
48+
npm test
49+
npm run build
50+
```
51+
52+
### Rebuild container
53+
54+
```bash
55+
./container/build.sh
56+
```
57+
58+
### Restart service
59+
60+
```bash
61+
launchctl kickstart -k gui/$(id -u)/com.nanoclaw # macOS
62+
# Linux: systemctl --user restart nanoclaw
63+
```
64+
65+
## Phase 3: Verify
66+
67+
### Test PDF extraction
68+
69+
Send a PDF file in any registered WhatsApp chat. The agent should:
70+
1. Download the PDF to `attachments/`
71+
2. Respond acknowledging the PDF
72+
3. Be able to extract text when asked
73+
74+
### Test URL fetching
75+
76+
Ask the agent to read a PDF from a URL. It should use `pdf-reader fetch <url>`.
77+
78+
### Check logs if needed
79+
80+
```bash
81+
tail -f logs/nanoclaw.log | grep -i pdf
82+
```
83+
84+
Look for:
85+
- `Downloaded PDF attachment` — successful download
86+
- `Failed to download PDF attachment` — media download issue
87+
88+
## Troubleshooting
89+
90+
### Agent says pdf-reader command not found
91+
92+
Container needs rebuilding. Run `./container/build.sh` and restart the service.
93+
94+
### PDF text extraction is empty
95+
96+
The PDF may be scanned (image-based). pdftotext only handles text-based PDFs. Consider using the agent-browser to open the PDF visually instead.
97+
98+
### WhatsApp PDF not detected
99+
100+
Verify the message has `documentMessage` with `mimetype: application/pdf`. Some file-sharing apps send PDFs as generic files without the correct mimetype.
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
name: pdf-reader
3+
description: Read and extract text from PDF files — documents, reports, contracts, spreadsheets. Use whenever you need to read PDF content, not just when explicitly asked. Handles local files, URLs, and WhatsApp attachments.
4+
allowed-tools: Bash(pdf-reader:*)
5+
---
6+
7+
# PDF Reader
8+
9+
## Quick start
10+
11+
```bash
12+
pdf-reader extract report.pdf # Extract all text
13+
pdf-reader extract report.pdf --layout # Preserve tables/columns
14+
pdf-reader fetch https://example.com/doc.pdf # Download and extract
15+
pdf-reader info report.pdf # Show metadata + size
16+
pdf-reader list # List all PDFs in directory tree
17+
```
18+
19+
## Commands
20+
21+
### extract — Extract text from PDF
22+
23+
```bash
24+
pdf-reader extract <file> # Full text to stdout
25+
pdf-reader extract <file> --layout # Preserve layout (tables, columns)
26+
pdf-reader extract <file> --pages 1-5 # Pages 1 through 5
27+
pdf-reader extract <file> --pages 3-3 # Single page (page 3)
28+
pdf-reader extract <file> --layout --pages 2-10 # Layout + page range
29+
```
30+
31+
Options:
32+
- `--layout` — Maintains spatial positioning. Essential for tables, spreadsheets, multi-column docs.
33+
- `--pages N-M` — Extract only pages N through M (1-based, inclusive).
34+
35+
### fetch — Download and extract PDF from URL
36+
37+
```bash
38+
pdf-reader fetch <url> # Download, verify, extract with layout
39+
pdf-reader fetch <url> report.pdf # Also save a local copy
40+
```
41+
42+
Downloads the PDF, verifies it has a valid `%PDF` header, then extracts text with layout preservation. Temporary files are cleaned up automatically.
43+
44+
### info — PDF metadata and file size
45+
46+
```bash
47+
pdf-reader info <file>
48+
```
49+
50+
Shows title, author, page count, page size, PDF version, and file size on disk.
51+
52+
### list — Find all PDFs in directory tree
53+
54+
```bash
55+
pdf-reader list
56+
```
57+
58+
Recursively lists all `.pdf` files with page count and file size.
59+
60+
## WhatsApp PDF attachments
61+
62+
When a user sends a PDF on WhatsApp, it is automatically saved to the `attachments/` directory. The message will include a path hint like:
63+
64+
> [PDF attached: attachments/document.pdf]
65+
66+
To read the attached PDF:
67+
68+
```bash
69+
pdf-reader extract attachments/document.pdf --layout
70+
```
71+
72+
## Example workflows
73+
74+
### Read a contract and summarize key terms
75+
76+
```bash
77+
pdf-reader info attachments/contract.pdf
78+
pdf-reader extract attachments/contract.pdf --layout
79+
```
80+
81+
### Extract specific pages from a long report
82+
83+
```bash
84+
pdf-reader info report.pdf # Check total pages
85+
pdf-reader extract report.pdf --pages 1-3 # Executive summary
86+
pdf-reader extract report.pdf --pages 15-20 # Financial tables
87+
```
88+
89+
### Fetch and analyze a public document
90+
91+
```bash
92+
pdf-reader fetch https://example.com/annual-report.pdf report.pdf
93+
pdf-reader info report.pdf
94+
```
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
4+
# pdf-reader — CLI wrapper around poppler-utils (pdftotext, pdfinfo)
5+
# Provides extract, fetch, info, list commands for PDF processing.
6+
7+
VERSION="1.0.0"
8+
9+
usage() {
10+
cat <<'USAGE'
11+
pdf-reader — Extract text and metadata from PDF files
12+
13+
Usage:
14+
pdf-reader extract <file> [--layout] [--pages N-M]
15+
pdf-reader fetch <url> [filename]
16+
pdf-reader info <file>
17+
pdf-reader list
18+
pdf-reader help
19+
20+
Commands:
21+
extract Extract text from a PDF file to stdout
22+
fetch Download a PDF from a URL and extract text
23+
info Show PDF metadata and file size
24+
list List all PDFs in current directory tree
25+
help Show this help message
26+
27+
Extract options:
28+
--layout Preserve original layout (tables, columns)
29+
--pages Page range to extract (e.g. 1-5, 3-3 for single page)
30+
USAGE
31+
}
32+
33+
cmd_extract() {
34+
local file=""
35+
local layout=false
36+
local first_page=""
37+
local last_page=""
38+
39+
# Parse arguments
40+
while [[ $# -gt 0 ]]; do
41+
case "$1" in
42+
--layout)
43+
layout=true
44+
shift
45+
;;
46+
--pages)
47+
if [[ -z "${2:-}" ]]; then
48+
echo "Error: --pages requires a range argument (e.g. 1-5)" >&2
49+
exit 1
50+
fi
51+
local range="$2"
52+
first_page="${range%-*}"
53+
last_page="${range#*-}"
54+
shift 2
55+
;;
56+
-*)
57+
echo "Error: Unknown option: $1" >&2
58+
exit 1
59+
;;
60+
*)
61+
if [[ -z "$file" ]]; then
62+
file="$1"
63+
else
64+
echo "Error: Unexpected argument: $1" >&2
65+
exit 1
66+
fi
67+
shift
68+
;;
69+
esac
70+
done
71+
72+
if [[ -z "$file" ]]; then
73+
echo "Error: No file specified" >&2
74+
echo "Usage: pdf-reader extract <file> [--layout] [--pages N-M]" >&2
75+
exit 1
76+
fi
77+
78+
if [[ ! -f "$file" ]]; then
79+
echo "Error: File not found: $file" >&2
80+
exit 1
81+
fi
82+
83+
# Build pdftotext arguments
84+
local args=()
85+
if [[ "$layout" == true ]]; then
86+
args+=(-layout)
87+
fi
88+
if [[ -n "$first_page" ]]; then
89+
args+=(-f "$first_page")
90+
fi
91+
if [[ -n "$last_page" ]]; then
92+
args+=(-l "$last_page")
93+
fi
94+
95+
pdftotext ${args[@]+"${args[@]}"} "$file" -
96+
}
97+
98+
cmd_fetch() {
99+
local url="${1:-}"
100+
local filename="${2:-}"
101+
102+
if [[ -z "$url" ]]; then
103+
echo "Error: No URL specified" >&2
104+
echo "Usage: pdf-reader fetch <url> [filename]" >&2
105+
exit 1
106+
fi
107+
108+
# Create temporary file
109+
local tmpfile
110+
tmpfile="$(mktemp /tmp/pdf-reader-XXXXXX.pdf)"
111+
trap 'rm -f "$tmpfile"' EXIT
112+
113+
# Download
114+
echo "Downloading: $url" >&2
115+
if ! curl -sL -o "$tmpfile" "$url"; then
116+
echo "Error: Failed to download: $url" >&2
117+
exit 1
118+
fi
119+
120+
# Verify PDF header
121+
local header
122+
header="$(head -c 4 "$tmpfile")"
123+
if [[ "$header" != "%PDF" ]]; then
124+
echo "Error: Downloaded file is not a valid PDF (header: $header)" >&2
125+
exit 1
126+
fi
127+
128+
# Save with name if requested
129+
if [[ -n "$filename" ]]; then
130+
cp "$tmpfile" "$filename"
131+
echo "Saved to: $filename" >&2
132+
fi
133+
134+
# Extract with layout
135+
pdftotext -layout "$tmpfile" -
136+
}
137+
138+
cmd_info() {
139+
local file="${1:-}"
140+
141+
if [[ -z "$file" ]]; then
142+
echo "Error: No file specified" >&2
143+
echo "Usage: pdf-reader info <file>" >&2
144+
exit 1
145+
fi
146+
147+
if [[ ! -f "$file" ]]; then
148+
echo "Error: File not found: $file" >&2
149+
exit 1
150+
fi
151+
152+
pdfinfo "$file"
153+
echo ""
154+
echo "File size: $(du -h "$file" | cut -f1)"
155+
}
156+
157+
cmd_list() {
158+
local found=false
159+
160+
# Use globbing to find PDFs (globstar makes **/ match recursively)
161+
shopt -s nullglob globstar
162+
163+
# Use associative array to deduplicate (*.pdf overlaps with **/*.pdf)
164+
declare -A seen
165+
for pdf in *.pdf **/*.pdf; do
166+
[[ -v seen["$pdf"] ]] && continue
167+
seen["$pdf"]=1
168+
found=true
169+
170+
local pages="?"
171+
local size
172+
size="$(du -h "$pdf" | cut -f1)"
173+
174+
# Try to get page count from pdfinfo
175+
if page_line="$(pdfinfo "$pdf" 2>/dev/null | grep '^Pages:')"; then
176+
pages="$(echo "$page_line" | awk '{print $2}')"
177+
fi
178+
179+
printf "%-60s %5s pages %8s\n" "$pdf" "$pages" "$size"
180+
done
181+
182+
if [[ "$found" == false ]]; then
183+
echo "No PDF files found in current directory tree." >&2
184+
fi
185+
}
186+
187+
# Main dispatch
188+
command="${1:-help}"
189+
shift || true
190+
191+
case "$command" in
192+
extract) cmd_extract "$@" ;;
193+
fetch) cmd_fetch "$@" ;;
194+
info) cmd_info "$@" ;;
195+
list) cmd_list ;;
196+
help|--help|-h) usage ;;
197+
version|--version|-v) echo "pdf-reader $VERSION" ;;
198+
*)
199+
echo "Error: Unknown command: $command" >&2
200+
echo "Run 'pdf-reader help' for usage." >&2
201+
exit 1
202+
;;
203+
esac

0 commit comments

Comments
 (0)