|
| 1 | +--- |
| 2 | +name: searxng-search |
| 3 | +description: Free meta-search via SearXNG — aggregates results from 70+ search engines. Self-hosted or use a public instance. No API key needed. Falls back automatically when the web search toolset is unavailable. |
| 4 | +version: 1.0.0 |
| 5 | +author: hermes-agent |
| 6 | +license: MIT |
| 7 | +metadata: |
| 8 | + hermes: |
| 9 | + tags: [search, searxng, meta-search, self-hosted, free, fallback] |
| 10 | + related_skills: [duckduckgo-search, domain-intel] |
| 11 | + fallback_for_toolsets: [web] |
| 12 | +--- |
| 13 | + |
| 14 | +# SearXNG Search |
| 15 | + |
| 16 | +Free meta-search using [SearXNG](https://searxng.org/) — a privacy-respecting, self-hosted search aggregator that queries 70+ search engines simultaneously. |
| 17 | + |
| 18 | +**No API key required** when using a public instance. Can also be self-hosted for full control. Automatically appears as a fallback when the main web search toolset (`FIRECRAWL_API_KEY`) is not configured. |
| 19 | + |
| 20 | +## Configuration |
| 21 | + |
| 22 | +SearXNG requires a `SEARXNG_URL` environment variable pointing to your SearXNG instance: |
| 23 | + |
| 24 | +```bash |
| 25 | +# Public instances (no setup required) |
| 26 | +SEARXNG_URL=https://searxng.example.com |
| 27 | + |
| 28 | +# Self-hosted SearXNG |
| 29 | +SEARXNG_URL=http://localhost:8888 |
| 30 | +``` |
| 31 | + |
| 32 | +If no instance is configured, this skill is unavailable and the agent falls back to other search options. |
| 33 | + |
| 34 | +## Detection Flow |
| 35 | + |
| 36 | +Check what is actually available before choosing an approach: |
| 37 | + |
| 38 | +```bash |
| 39 | +# Check if SEARXNG_URL is set and the instance is reachable |
| 40 | +curl -s --max-time 5 "${SEARXNG_URL}/search?q=test&format=json" | head -c 200 |
| 41 | +``` |
| 42 | + |
| 43 | +Decision tree: |
| 44 | +1. If `SEARXNG_URL` is set and the instance responds, use SearXNG |
| 45 | +2. If `SEARXNG_URL` is unset or unreachable, fall back to other available search tools |
| 46 | +3. If the user wants SearXNG specifically, help them set up an instance or find a public one |
| 47 | + |
| 48 | +## Method 1: CLI via curl (Preferred) |
| 49 | + |
| 50 | +Use `curl` via `terminal` to call the SearXNG JSON API. This avoids assuming any particular Python package is installed. |
| 51 | + |
| 52 | +```bash |
| 53 | +# Text search (JSON output) |
| 54 | +curl -s --max-time 10 \ |
| 55 | + "${SEARXNG_URL}/search?q=python+async+programming&format=json&engines=google,bing&limit=10" |
| 56 | + |
| 57 | +# With Safesearch off |
| 58 | +curl -s --max-time 10 \ |
| 59 | + "${SEARXNG_URL}/search?q=example&format=json&safesearch=0" |
| 60 | + |
| 61 | +# Specific categories (general, news, science, etc.) |
| 62 | +curl -s --max-time 10 \ |
| 63 | + "${SEARXNG_URL}/search?q=AI+news&format=json&categories=news" |
| 64 | +``` |
| 65 | + |
| 66 | +### Common CLI Flags |
| 67 | + |
| 68 | +| Flag | Description | Example | |
| 69 | +|------|-------------|---------| |
| 70 | +| `q` | Query string (URL-encoded) | `q=python+async` | |
| 71 | +| `format` | Output format: `json`, `csv`, `rss` | `format=json` | |
| 72 | +| `engines` | Comma-separated engine names | `engines=google,bing,ddg` | |
| 73 | +| `limit` | Max results per engine (default 10) | `limit=5` | |
| 74 | +| `categories` | Filter by category | `categories=news,science` | |
| 75 | +| `safesearch` | 0=none, 1=moderate, 2=strict | `safesearch=0` | |
| 76 | +| `time_range` | Filter: `day`, `week`, `month`, `year` | `time_range=week` | |
| 77 | + |
| 78 | +### Parsing JSON Results |
| 79 | + |
| 80 | +```bash |
| 81 | +# Extract titles and URLs from JSON |
| 82 | +curl -s --max-time 10 "${SEARXNG_URL}/search?q=fastapi&format=json&limit=5" \ |
| 83 | + | python3 -c " |
| 84 | +import json, sys |
| 85 | +data = json.load(sys.stdin) |
| 86 | +for r in data.get('results', []): |
| 87 | + print(r.get('title','')) |
| 88 | + print(r.get('url','')) |
| 89 | + print(r.get('content','')[:200]) |
| 90 | + print() |
| 91 | +" |
| 92 | +``` |
| 93 | + |
| 94 | +Returns per result: `title`, `url`, `content` (snippet), `engine`, `parsed_url`, `img_src`, `thumbnail`, `author`, `published_date` |
| 95 | + |
| 96 | +## Method 2: Python API via `requests` |
| 97 | + |
| 98 | +Use the SearXNG REST API directly from Python with the `requests` library: |
| 99 | + |
| 100 | +```python |
| 101 | +import os, requests, urllib.parse |
| 102 | + |
| 103 | +base_url = os.environ.get("SEARXNG_URL", "") |
| 104 | +if not base_url: |
| 105 | + raise RuntimeError("SEARXNG_URL is not set") |
| 106 | + |
| 107 | +query = "fastapi deployment guide" |
| 108 | +params = { |
| 109 | + "q": query, |
| 110 | + "format": "json", |
| 111 | + "limit": 5, |
| 112 | + "engines": "google,bing", |
| 113 | +} |
| 114 | + |
| 115 | +resp = requests.get(f"{base_url}/search", params=params, timeout=10) |
| 116 | +resp.raise_for_status() |
| 117 | +data = resp.json() |
| 118 | + |
| 119 | +for r in data.get("results", []): |
| 120 | + print(r["title"]) |
| 121 | + print(r["url"]) |
| 122 | + print(r.get("content", "")[:200]) |
| 123 | + print() |
| 124 | +``` |
| 125 | + |
| 126 | +## Method 3: searxng-data Python Package |
| 127 | + |
| 128 | +For more structured access, install the `searxng-data` package: |
| 129 | + |
| 130 | +```bash |
| 131 | +pip install searxng-data |
| 132 | +``` |
| 133 | + |
| 134 | +```python |
| 135 | +from searxng_data import engines |
| 136 | + |
| 137 | +# List available engines |
| 138 | +print(engines.list_engines()) |
| 139 | +``` |
| 140 | + |
| 141 | +Note: This package only provides engine metadata, not the search API itself. |
| 142 | + |
| 143 | +## Self-Hosting SearXNG |
| 144 | + |
| 145 | +To run your own SearXNG instance: |
| 146 | + |
| 147 | +```bash |
| 148 | +# Using Docker |
| 149 | +docker run -d -p 8888:8080 \ |
| 150 | + -v $(pwd)/searxng:/etc/searxng \ |
| 151 | + searxng/searxng:latest |
| 152 | + |
| 153 | +# Then set |
| 154 | +SEARXNG_URL=http://localhost:8888 |
| 155 | +``` |
| 156 | + |
| 157 | +Or install via pip: |
| 158 | +```bash |
| 159 | +pip install searxng |
| 160 | +# Edit /etc/searxng/settings.yml |
| 161 | +searxng-run |
| 162 | +``` |
| 163 | + |
| 164 | +Public SearXNG instances are available at: |
| 165 | +- `https://searxng.example.com` (replace with any public instance) |
| 166 | + |
| 167 | +## Workflow: Search then Extract |
| 168 | + |
| 169 | +SearXNG returns titles, URLs, and snippets — not full page content. To get full page content, search first and then extract the most relevant URL with `web_extract`, browser tools, or `curl`. |
| 170 | + |
| 171 | +```bash |
| 172 | +# Search for relevant pages |
| 173 | +curl -s "${SEARXNG_URL}/search?q=fastapi+deployment&format=json&limit=3" |
| 174 | +# Output: list of results with titles and URLs |
| 175 | + |
| 176 | +# Then extract the best URL with web_extract |
| 177 | +``` |
| 178 | + |
| 179 | +## Limitations |
| 180 | + |
| 181 | +- **Instance availability**: If the SearXNG instance is down or unreachable, search fails. Always check `SEARXNG_URL` is set and the instance is reachable. |
| 182 | +- **No content extraction**: SearXNG returns snippets, not full page content. Use `web_extract`, browser tools, or `curl` for full articles. |
| 183 | +- **Rate limiting**: Some public instances limit requests. Self-hosting avoids this. |
| 184 | +- **Engine coverage**: Available engines depend on the SearXNG instance configuration. Some engines may be disabled. |
| 185 | +- **Results freshness**: Meta-search aggregates external engines — result freshness depends on those engines. |
| 186 | + |
| 187 | +## Troubleshooting |
| 188 | + |
| 189 | +| Problem | Likely Cause | What To Do | |
| 190 | +|---------|--------------|------------| |
| 191 | +| `SEARXNG_URL` not set | No instance configured | Use a public SearXNG instance or set up your own | |
| 192 | +| Connection refused | Instance not running or wrong URL | Check the URL is correct and the instance is running | |
| 193 | +| Empty results | Instance blocks the query | Try a different instance or self-host | |
| 194 | +| Slow responses | Public instance under load | Self-host or use a less-loaded public instance | |
| 195 | +| `json` format not supported | Old SearXNG version | Try `format=rss` or upgrade SearXNG | |
| 196 | + |
| 197 | +## Pitfalls |
| 198 | + |
| 199 | +- **Always set `SEARXNG_URL`**: Without it, the skill cannot function. |
| 200 | +- **URL-encode queries**: Spaces and special characters must be URL-encoded in curl, or use `urllib.parse.quote()` in Python. |
| 201 | +- **Use `format=json`**: The default format may not be machine-readable. Always request JSON explicitly. |
| 202 | +- **Set a timeout**: Always use `--max-time` or `timeout=` to avoid hanging on unreachable instances. |
| 203 | +- **Self-hosting is best**: Public instances may go down, rate-limit, or block. A self-hosted instance is reliable. |
| 204 | + |
| 205 | +## Instance Discovery |
| 206 | + |
| 207 | +If `SEARXNG_URL` is not set and the user asks about SearXNG, help them either: |
| 208 | +1. Find a public SearXNG instance (search for "public searxng instance") |
| 209 | +2. Set up their own with Docker or pip |
| 210 | + |
| 211 | +Public instances are listed at: https://searxng.org/ |
0 commit comments