Skip to content

Commit 8848c2e

Browse files
committed
docs: clarify sync sources and wiretap
1 parent e2db10e commit 8848c2e

1 file changed

Lines changed: 50 additions & 14 deletions

File tree

README.md

Lines changed: 50 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,17 @@
11
# discrawl 🛰️ — Mirror Discord into SQLite; search server history locally
22

3-
`discrawl` mirrors Discord guild data into local SQLite so you can search, inspect, and query server history without depending on Discord search. It can also import classifiable Discord Desktop cache messages for DM recovery/search without using a user token. Teams can publish that archive as a private Git snapshot repo, so readers get fresh org memory without Discord bot credentials.
3+
`discrawl` mirrors Discord guild data into local SQLite so you can search, inspect, and query server history without depending on Discord search. It can also import classifiable Discord Desktop cache messages for DM recovery/search without using a user token.
44

5-
Live guild sync uses real bot tokens. Desktop wiretap mode reads local cache artifacts only; it does not extract credentials or run a selfbot. Data stays local unless you explicitly publish a Git-backed snapshot.
5+
Teams can publish the archive as a private Git snapshot repo, so readers get fresh org memory without Discord bot credentials.
6+
7+
There are two local archive sources:
8+
9+
- Discord bot API sync for guilds, channels, members, threads, and message history the configured bot can access
10+
- Discord Desktop cache import for local, classifiable cached messages, including proven DMs under `@me`
11+
12+
Desktop wiretap mode reads local cache artifacts only. It does not extract credentials, use user tokens, call the Discord API as your user, or run a selfbot.
13+
14+
Data stays local unless you explicitly publish a Git-backed snapshot.
615

716
## What It Does
817

@@ -104,7 +113,7 @@ Examples below assume `discrawl` is on `PATH`. If you built from source without
104113

105114
## Quick Start
106115

107-
Reuse an existing OpenClaw Discord bot config:
116+
Reuse an existing OpenClaw Discord bot config and refresh both bot-visible guild data and local desktop cache data:
108117

109118
```bash
110119
discrawl init --from-openclaw ~/.openclaw/openclaw.json
@@ -113,9 +122,10 @@ discrawl sync --full
113122
discrawl sync
114123
discrawl search "panic: nil pointer"
115124
discrawl tail
116-
discrawl wiretap
117125
```
118126

127+
Use `discrawl sync --source wiretap` when you only want the local Discord Desktop cache import and do not want bot-token API sync.
128+
119129
Multi-account OpenClaw setup:
120130

121131
```bash
@@ -169,31 +179,48 @@ When OpenClaw config tokens use `${ENV_VAR}` placeholders, `init` and `doctor` r
169179

170180
### `sync`
171181

172-
Refreshes guild state into SQLite. Run one explicit `--full` pass when you want a complete historical archive; use plain `sync` afterward for frequent latest-message refreshes.
182+
Refreshes SQLite from one or both archive sources.
183+
184+
By default, `sync` runs both sources:
185+
186+
- Discord bot-token sync for bot-visible guild data
187+
- local Discord Desktop cache import for classifiable cached messages and proven DMs
188+
189+
Run one explicit `--full` pass when you want a complete historical guild archive. Use plain `sync` afterward for frequent latest-message and desktop-cache refreshes.
173190

174191
```bash
175192
discrawl sync
176193
discrawl sync --full
177194
discrawl sync --full --all
178195
discrawl sync --guild 123456789012345678
179196
discrawl sync --guilds 123,456 --concurrency 8
180-
discrawl sync --source both
181-
discrawl sync --source discord
182-
discrawl sync --source wiretap
197+
discrawl sync --source both # default: bot API + desktop cache
198+
discrawl sync --source discord # bot API only; aliases: key, bot, api
199+
discrawl sync --source wiretap # desktop cache only; aliases: desktop, cache
183200
discrawl sync --guild 123456789012345678 --all-channels
184201
discrawl sync --channels 111,222 --since 2026-03-01T00:00:00Z
185202
```
186203

187-
Sync modes:
204+
Sync sources:
205+
206+
| Source | Reads from | Stores |
207+
| --- | --- | --- |
208+
| `both` | Discord bot API and local Discord Desktop cache | bot-visible guild data plus classifiable cached desktop messages |
209+
| `discord` / `key` | Discord bot API | guilds, channels, threads, members, and messages the bot can access |
210+
| `wiretap` | local Discord Desktop cache files | classifiable cached messages; proven DMs are stored under `@me` |
211+
212+
Sync modes control the Discord bot API side of a run. When `wiretap` is selected, the desktop cache import runs once alongside the chosen bot sync mode.
213+
214+
Bot sync modes:
188215

189216
| Command | Use when | Behavior |
190217
| --- | --- | --- |
191218
| `discrawl sync` | routine refresh | imports any stale Git snapshot first, skips member refreshes, checks live top-level channels plus active threads, and only fetches new messages for channels with a stored latest cursor |
192219
| `discrawl sync --all-channels` | repair pass | broad incremental sweep across every stored channel/thread, including archived threads |
193220
| `discrawl sync --full` | historical backfill | crawls older history until channels are complete; can take a long time on large servers |
194221

195-
`sync` already uses parallel channel workers. `--concurrency` overrides the default, and the default is auto-sized from `GOMAXPROCS` with a floor of `8` and a cap of `32`.
196-
`--source` selects what gets refreshed: `both` (default), `discord`/`key` for bot-token API sync only, or `wiretap` for local Discord Desktop cache import only.
222+
`sync` already uses parallel channel workers for bot API message crawling.
223+
`--concurrency` overrides the default, and the default is auto-sized from `GOMAXPROCS` with a floor of `8` and a cap of `32`.
197224
`--all` ignores `default_guild_id` and fans out across every discovered guild the bot can access.
198225
`--skip-members` refreshes guild/channel/message data without crawling the full member list, which is useful for frequent Git snapshot publishers that only need latest messages.
199226
`--latest-only` is still accepted for explicit latest-only runs; it is now the default for untargeted `sync`. Use `--all-channels` to opt out of the fast default without doing a full historical crawl.
@@ -218,7 +245,11 @@ discrawl tail --repair-every 30m
218245

219246
### `wiretap`
220247

221-
Imports classifiable Discord Desktop message payloads into the same local SQLite archive. This is the path for searchable DMs because bot tokens cannot read personal direct messages.
248+
Imports classifiable Discord Desktop message payloads into the same local SQLite archive.
249+
250+
This is the path for searchable DMs because bot tokens cannot read personal direct messages.
251+
252+
`wiretap` is also available through `discrawl sync --source wiretap` and is included in the default `discrawl sync --source both` path.
222253

223254
```bash
224255
discrawl wiretap
@@ -229,9 +260,10 @@ discrawl wiretap --watch-every 2m
229260

230261
Notes:
231262

232-
- stores only classifiable cache messages in the normal `guilds` / `channels` / `messages` tables
263+
- stores classifiable cache messages in the same `guilds`, `channels`, and `messages` tables used by bot sync
233264
- stores proven DMs under the synthetic guild id `@me`
234-
- drops message payloads whose channel cannot be classified from cached channel metadata or Discord route URLs
265+
- drops message payloads whose channel cannot be classified from cached channel metadata or Discord route URLs; dropped rows are counted as `skipped_messages`
266+
- imports what Discord Desktop has cached locally, not complete live DM history
235267
- scans local `.ldb`, `.log`, `.json`, and `.txt` artifacts for Discord message JSON
236268
- does not extract, store, or print Discord auth tokens
237269
- `--max-file-bytes` skips unusually large files; default is 64 MiB
@@ -565,6 +597,10 @@ With remote providers, message text is sent during `discrawl embed`, and search
565597
- FTS index rows
566598
- optional local embedding queue metadata and vectors
567599

600+
Messages imported from Discord Desktop use the same message, attachment, mention, and FTS paths as bot-synced messages.
601+
602+
Proven DMs use `@me` as their guild id. Unclassifiable desktop-cache payloads are skipped instead of being stored as unknown synthetic data.
603+
568604
SQLite schema migrations are versioned with `PRAGMA user_version`. Startup now fails fast when a local DB schema is newer than the supported binary.
569605

570606
Attachment binaries are not stored in SQLite.

0 commit comments

Comments
 (0)