Skip to content

Commit f75a12a

Browse files
docs+skill: add searxng-search optional skill and documentation
Closes the remaining gaps from PR NousResearch#11562 that weren't covered by the core SearXNG integration landed in NousResearch#20823. - optional-skills/research/searxng-search/ — installable skill with SKILL.md (curl-based usage, category support, Python example) and searxng.sh helper script for health checks and instance queries - website/docs/user-guide/configuration.md — SearXNG added to the Web Search Backends section (5 backends, backend table, per-capability split config example, correct search-only note) - website/docs/reference/environment-variables.md — SEARXNG_URL row - website/docs/reference/optional-skills-catalog.md — searxng-search entry The core SearXNG code, OPTIONAL_ENV_VARS, hermes tools picker, and tests were already on main via NousResearch#20823. This commit is purely additive docs + the optional skill scaffold. Credits from NousResearch#11562 salvage: @w4rum — original _searxng_search structure @nathansdev — tools_config.py integration @moyomartin — category support and result formatting @0xMihai — config/env var approach @nicobailon — skill and documentation structure @searxng-fan — error handling patterns @Local-First — self-hosted-first philosophy and docs
1 parent 006fcdd commit f75a12a

5 files changed

Lines changed: 246 additions & 4 deletions

File tree

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
name: searxng-search
3+
description: Free meta-search via SearXNG — aggregates results from 70+ search engines. Self-hosted or use a public instance. No API key needed. Falls back automatically when the web search toolset is unavailable.
4+
version: 1.0.0
5+
author: hermes-agent
6+
license: MIT
7+
metadata:
8+
hermes:
9+
tags: [search, searxng, meta-search, self-hosted, free, fallback]
10+
related_skills: [duckduckgo-search, domain-intel]
11+
fallback_for_toolsets: [web]
12+
---
13+
14+
# SearXNG Search
15+
16+
Free meta-search using [SearXNG](https://searxng.org/) — a privacy-respecting, self-hosted search aggregator that queries 70+ search engines simultaneously.
17+
18+
**No API key required** when using a public instance. Can also be self-hosted for full control. Automatically appears as a fallback when the main web search toolset (`FIRECRAWL_API_KEY`) is not configured.
19+
20+
## Configuration
21+
22+
SearXNG requires a `SEARXNG_URL` environment variable pointing to your SearXNG instance:
23+
24+
```bash
25+
# Public instances (no setup required)
26+
SEARXNG_URL=https://searxng.example.com
27+
28+
# Self-hosted SearXNG
29+
SEARXNG_URL=http://localhost:8888
30+
```
31+
32+
If no instance is configured, this skill is unavailable and the agent falls back to other search options.
33+
34+
## Detection Flow
35+
36+
Check what is actually available before choosing an approach:
37+
38+
```bash
39+
# Check if SEARXNG_URL is set and the instance is reachable
40+
curl -s --max-time 5 "${SEARXNG_URL}/search?q=test&format=json" | head -c 200
41+
```
42+
43+
Decision tree:
44+
1. If `SEARXNG_URL` is set and the instance responds, use SearXNG
45+
2. If `SEARXNG_URL` is unset or unreachable, fall back to other available search tools
46+
3. If the user wants SearXNG specifically, help them set up an instance or find a public one
47+
48+
## Method 1: CLI via curl (Preferred)
49+
50+
Use `curl` via `terminal` to call the SearXNG JSON API. This avoids assuming any particular Python package is installed.
51+
52+
```bash
53+
# Text search (JSON output)
54+
curl -s --max-time 10 \
55+
"${SEARXNG_URL}/search?q=python+async+programming&format=json&engines=google,bing&limit=10"
56+
57+
# With Safesearch off
58+
curl -s --max-time 10 \
59+
"${SEARXNG_URL}/search?q=example&format=json&safesearch=0"
60+
61+
# Specific categories (general, news, science, etc.)
62+
curl -s --max-time 10 \
63+
"${SEARXNG_URL}/search?q=AI+news&format=json&categories=news"
64+
```
65+
66+
### Common CLI Flags
67+
68+
| Flag | Description | Example |
69+
|------|-------------|---------|
70+
| `q` | Query string (URL-encoded) | `q=python+async` |
71+
| `format` | Output format: `json`, `csv`, `rss` | `format=json` |
72+
| `engines` | Comma-separated engine names | `engines=google,bing,ddg` |
73+
| `limit` | Max results per engine (default 10) | `limit=5` |
74+
| `categories` | Filter by category | `categories=news,science` |
75+
| `safesearch` | 0=none, 1=moderate, 2=strict | `safesearch=0` |
76+
| `time_range` | Filter: `day`, `week`, `month`, `year` | `time_range=week` |
77+
78+
### Parsing JSON Results
79+
80+
```bash
81+
# Extract titles and URLs from JSON
82+
curl -s --max-time 10 "${SEARXNG_URL}/search?q=fastapi&format=json&limit=5" \
83+
| python3 -c "
84+
import json, sys
85+
data = json.load(sys.stdin)
86+
for r in data.get('results', []):
87+
print(r.get('title',''))
88+
print(r.get('url',''))
89+
print(r.get('content','')[:200])
90+
print()
91+
"
92+
```
93+
94+
Returns per result: `title`, `url`, `content` (snippet), `engine`, `parsed_url`, `img_src`, `thumbnail`, `author`, `published_date`
95+
96+
## Method 2: Python API via `requests`
97+
98+
Use the SearXNG REST API directly from Python with the `requests` library:
99+
100+
```python
101+
import os, requests, urllib.parse
102+
103+
base_url = os.environ.get("SEARXNG_URL", "")
104+
if not base_url:
105+
raise RuntimeError("SEARXNG_URL is not set")
106+
107+
query = "fastapi deployment guide"
108+
params = {
109+
"q": query,
110+
"format": "json",
111+
"limit": 5,
112+
"engines": "google,bing",
113+
}
114+
115+
resp = requests.get(f"{base_url}/search", params=params, timeout=10)
116+
resp.raise_for_status()
117+
data = resp.json()
118+
119+
for r in data.get("results", []):
120+
print(r["title"])
121+
print(r["url"])
122+
print(r.get("content", "")[:200])
123+
print()
124+
```
125+
126+
## Method 3: searxng-data Python Package
127+
128+
For more structured access, install the `searxng-data` package:
129+
130+
```bash
131+
pip install searxng-data
132+
```
133+
134+
```python
135+
from searxng_data import engines
136+
137+
# List available engines
138+
print(engines.list_engines())
139+
```
140+
141+
Note: This package only provides engine metadata, not the search API itself.
142+
143+
## Self-Hosting SearXNG
144+
145+
To run your own SearXNG instance:
146+
147+
```bash
148+
# Using Docker
149+
docker run -d -p 8888:8080 \
150+
-v $(pwd)/searxng:/etc/searxng \
151+
searxng/searxng:latest
152+
153+
# Then set
154+
SEARXNG_URL=http://localhost:8888
155+
```
156+
157+
Or install via pip:
158+
```bash
159+
pip install searxng
160+
# Edit /etc/searxng/settings.yml
161+
searxng-run
162+
```
163+
164+
Public SearXNG instances are available at:
165+
- `https://searxng.example.com` (replace with any public instance)
166+
167+
## Workflow: Search then Extract
168+
169+
SearXNG returns titles, URLs, and snippets — not full page content. To get full page content, search first and then extract the most relevant URL with `web_extract`, browser tools, or `curl`.
170+
171+
```bash
172+
# Search for relevant pages
173+
curl -s "${SEARXNG_URL}/search?q=fastapi+deployment&format=json&limit=3"
174+
# Output: list of results with titles and URLs
175+
176+
# Then extract the best URL with web_extract
177+
```
178+
179+
## Limitations
180+
181+
- **Instance availability**: If the SearXNG instance is down or unreachable, search fails. Always check `SEARXNG_URL` is set and the instance is reachable.
182+
- **No content extraction**: SearXNG returns snippets, not full page content. Use `web_extract`, browser tools, or `curl` for full articles.
183+
- **Rate limiting**: Some public instances limit requests. Self-hosting avoids this.
184+
- **Engine coverage**: Available engines depend on the SearXNG instance configuration. Some engines may be disabled.
185+
- **Results freshness**: Meta-search aggregates external engines — result freshness depends on those engines.
186+
187+
## Troubleshooting
188+
189+
| Problem | Likely Cause | What To Do |
190+
|---------|--------------|------------|
191+
| `SEARXNG_URL` not set | No instance configured | Use a public SearXNG instance or set up your own |
192+
| Connection refused | Instance not running or wrong URL | Check the URL is correct and the instance is running |
193+
| Empty results | Instance blocks the query | Try a different instance or self-host |
194+
| Slow responses | Public instance under load | Self-host or use a less-loaded public instance |
195+
| `json` format not supported | Old SearXNG version | Try `format=rss` or upgrade SearXNG |
196+
197+
## Pitfalls
198+
199+
- **Always set `SEARXNG_URL`**: Without it, the skill cannot function.
200+
- **URL-encode queries**: Spaces and special characters must be URL-encoded in curl, or use `urllib.parse.quote()` in Python.
201+
- **Use `format=json`**: The default format may not be machine-readable. Always request JSON explicitly.
202+
- **Set a timeout**: Always use `--max-time` or `timeout=` to avoid hanging on unreachable instances.
203+
- **Self-hosting is best**: Public instances may go down, rate-limit, or block. A self-hosted instance is reliable.
204+
205+
## Instance Discovery
206+
207+
If `SEARXNG_URL` is not set and the user asks about SearXNG, help them either:
208+
1. Find a public SearXNG instance (search for "public searxng instance")
209+
2. Set up their own with Docker or pip
210+
211+
Public instances are listed at: https://searxng.org/
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash
2+
# Usage: ./searxng.sh <query> [max_results] [engines]
3+
# Example: ./searxng.sh "python async" 10 "google,bing"
4+
5+
QUERY="${1:-}"
6+
MAX="${2:-5}"
7+
ENGINES="${3:-google,bing}"
8+
9+
if [ -z "$SEARXNG_URL" ]; then
10+
echo "Error: SEARXNG_URL is not set"
11+
exit 1
12+
fi
13+
14+
if [ -z "$QUERY" ]; then
15+
echo "Usage: $0 <query> [max_results] [engines]"
16+
exit 1
17+
fi
18+
19+
ENCODED_QUERY=$(echo "$QUERY" | sed 's/ /+/g')
20+
21+
curl -s --max-time 10 \
22+
"${SEARXNG_URL}/search?q=${ENCODED_QUERY}&format=json&limit=${MAX}&engines=${ENGINES}"

website/docs/reference/environment-variables.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
120120
| `FIRECRAWL_API_KEY` | Web scraping and cloud browser ([firecrawl.dev](https://firecrawl.dev/)) |
121121
| `FIRECRAWL_API_URL` | Custom Firecrawl API endpoint for self-hosted instances (optional) |
122122
| `TAVILY_API_KEY` | Tavily API key for AI-native web search, extract, and crawl ([app.tavily.com](https://app.tavily.com/home)) |
123+
| `SEARXNG_URL` | SearXNG instance URL for free self-hosted web search — no API key required ([searxng.github.io](https://searxng.github.io/searxng/)) |
123124
| `TAVILY_BASE_URL` | Override the Tavily API endpoint. Useful for corporate proxies and self-hosted Tavily-compatible search backends. Same pattern as `GROQ_BASE_URL`. |
124125
| `EXA_API_KEY` | Exa API key for AI-native web search and contents ([exa.ai](https://exa.ai/)) |
125126
| `BROWSERBASE_API_KEY` | Browser automation ([browserbase.com](https://browserbase.com/)) |

website/docs/reference/optional-skills-catalog.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,7 @@ hermes skills uninstall <skill-name>
143143
| [**domain-intel**](/docs/user-guide/skills/optional/research/research-domain-intel) | Passive domain reconnaissance using Python stdlib. Subdomain discovery, SSL certificate inspection, WHOIS lookups, DNS records, domain availability checks, and bulk multi-domain analysis. No API keys required. |
144144
| [**drug-discovery**](/docs/user-guide/skills/optional/research/research-drug-discovery) | Pharmaceutical research assistant for drug discovery workflows. Search bioactive compounds on ChEMBL, calculate drug-likeness (Lipinski Ro5, QED, TPSA, synthetic accessibility), look up drug-drug interactions via OpenFDA, interpret ADMET... |
145145
| [**duckduckgo-search**](/docs/user-guide/skills/optional/research/research-duckduckgo-search) | Free web search via DuckDuckGo — text, news, images, videos. No API key needed. Prefer the `ddgs` CLI when installed; use the Python DDGS library only after verifying that `ddgs` is available in the current runtime. |
146+
| [**searxng-search**](/docs/user-guide/skills/optional/research/research-searxng-search) | Free meta-search via SearXNG — aggregates results from 70+ search engines. Self-hosted or use a public instance. No API key needed. Falls back automatically when the web search toolset is unavailable. |
146147
| [**gitnexus-explorer**](/docs/user-guide/skills/optional/research/research-gitnexus-explorer) | Index a codebase with GitNexus and serve an interactive knowledge graph via web UI + Cloudflare tunnel. |
147148
| [**parallel-cli**](/docs/user-guide/skills/optional/research/research-parallel-cli) | Optional vendor skill for Parallel CLI — agent-native web search, extraction, deep research, enrichment, FindAll, and monitoring. Prefer JSON output and non-interactive flows. |
148149
| [**qmd**](/docs/user-guide/skills/optional/research/research-qmd) | Search personal knowledge bases, notes, docs, and meeting transcripts locally using qmd — a hybrid retrieval engine with BM25, vector search, and LLM reranking. Supports CLI and MCP integration. |

website/docs/user-guide/configuration.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1425,23 +1425,30 @@ Environment scrubbing (strips `*_API_KEY`, `*_TOKEN`, `*_SECRET`, `*_PASSWORD`,
14251425

14261426
## Web Search Backends
14271427

1428-
The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
1428+
The `web_search`, `web_extract`, and `web_crawl` tools support five backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
14291429

14301430
```yaml
14311431
web:
1432-
backend: firecrawl # firecrawl | parallel | tavily | exa
1432+
backend: firecrawl # firecrawl | searxng | parallel | tavily | exa
1433+
1434+
# Or use per-capability keys to mix providers (e.g. free search + paid extract):
1435+
search_backend: "searxng"
1436+
extract_backend: "firecrawl"
14331437
```
14341438

14351439
| Backend | Env Var | Search | Extract | Crawl |
14361440
|---------|---------|--------|---------|-------|
14371441
| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
1442+
| **SearXNG** | `SEARXNG_URL` | ✔ | — | — |
14381443
| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
14391444
| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
14401445
| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |
14411446

1442-
**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `EXA_API_KEY` is set, Exa is used. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
1447+
**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `SEARXNG_URL` is set, SearXNG is used. If only `EXA_API_KEY` is set, Exa is used. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
1448+
1449+
**SearXNG** is a free, self-hosted, privacy-respecting metasearch engine that queries 70+ search engines. No API key needed — just set `SEARXNG_URL` to your instance (e.g., `http://localhost:8080`). SearXNG is search-only; `web_extract` and `web_crawl` require a separate extract provider (set `web.extract_backend`).
14431450

1444-
**Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth).
1451+
**Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=*** on the server to disable auth).
14451452

14461453
**Parallel search modes:** Set `PARALLEL_SEARCH_MODE` to control search behavior — `fast`, `one-shot`, or `agentic` (default: `agentic`).
14471454

0 commit comments

Comments
 (0)