Add ArXiv, HuggingFace, Dev.to, Lobsters data sources#367
Conversation
There was a problem hiding this comment.
Pull request overview
Adds new daily digest data sources (ArXiv, Hugging Face, Dev.to, Lobste.rs) and wires them into the existing fetch → prompt → LLM → save/issue pipeline.
Changes:
- Add fetchers for ArXiv papers, Hugging Face trending models, Dev.to articles, and Lobste.rs stories.
- Add new prompt builders and report saver functions for ArXiv/HF/Community digests (zh/en).
- Register new report IDs/labels across i18n, GitHub label colors, manifest generation, and the static index page.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/report-savers.ts | Adds saver functions for ArXiv/HF/Community reports and connects them to prompts/issues/files. |
| src/prompts-data.ts | Adds prompt builders for ArXiv/HF/Community digests. |
| src/arxiv.ts | New ArXiv fetch + lightweight Atom parsing. |
| src/hf.ts | New Hugging Face models fetcher. |
| src/devto.ts | New Dev.to articles fetcher. |
| src/lobsters.ts | New Lobste.rs stories fetcher (JSON endpoints). |
| src/index.ts | Wires new fetchers and report generation into main flow + highlights inputs. |
| src/i18n.ts | Adds report metadata, labels, and notify labels for new report types. |
| src/github.ts | Adds GitHub label colors for new labels. |
| src/generate-manifest.ts | Includes new report IDs in manifest generation list. |
| index.html | Adds new report IDs to the front-end label map. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,104 @@ | |||
| /** | |||
| * Lobste.rs AI stories fetched via RSS feed. | |||
There was a problem hiding this comment.
File header says Lobste.rs stories are fetched via an RSS feed, but the implementation actually calls Lobste.rs tag-based JSON endpoints (e.g., /t/ai.json). Please update the comment to match the JSON API approach (or switch implementation to RSS if that’s the intent).
| * Lobste.rs AI stories fetched via RSS feed. | |
| * Lobste.rs AI stories fetched via tag-based JSON endpoints (e.g., /t/ai.json). |
| const oneDayAgo = Date.now() - 7 * 24 * 60 * 60 * 1000; | ||
| const stories = [...seen.values()] | ||
| .filter((s) => new Date(s.publishedAt).getTime() > oneDayAgo) |
There was a problem hiding this comment.
The variable name oneDayAgo is misleading: it’s set to Date.now() - 7 * 24 * 60 * 60 * 1000 (7 days), and the comment above also says “last 7 days”. Rename the variable (e.g., sevenDaysAgo) to avoid confusion and make the filter logic self-explanatory.
| const oneDayAgo = Date.now() - 7 * 24 * 60 * 60 * 1000; | |
| const stories = [...seen.values()] | |
| .filter((s) => new Date(s.publishedAt).getTime() > oneDayAgo) | |
| const sevenDaysAgo = Date.now() - 7 * 24 * 60 * 60 * 1000; | |
| const stories = [...seen.values()] | |
| .filter((s) => new Date(s.publishedAt).getTime() > sevenDaysAgo) |
| * Strategy: query cs.AI + cs.CL + cs.LG categories for papers | ||
| * submitted in the last 24 hours, sorted by submission date. |
There was a problem hiding this comment.
The module docstring claims papers are “submitted in the last 24 hours”, but fetchArxivData() doesn’t apply any time-based filter (it fetches the newest max_results per category and dedupes). Either implement a published/updated cutoff (e.g., now - 24h) or adjust the comment so it doesn’t overpromise.
| * Strategy: query cs.AI + cs.CL + cs.LG categories for papers | |
| * submitted in the last 24 hours, sorted by submission date. | |
| * Strategy: query cs.AI + cs.CL + cs.LG categories for the newest papers, | |
| * sorted by submission date (up to ARXIV_MAX_RESULTS per category). |
| // --------------------------------------------------------------------------- | ||
|
|
||
| const ARXIV_MAX_RESULTS = 50; | ||
| const API_URL = "http://export.arxiv.org/api/query"; |
There was a problem hiding this comment.
API_URL uses plain HTTP. If ArXiv’s export API supports HTTPS in your runtime, prefer an https:// endpoint to avoid MITM / proxy tampering; otherwise, add a short note explaining why HTTP is required here.
| const API_URL = "http://export.arxiv.org/api/query"; | |
| const API_URL = "https://export.arxiv.org/api/query"; |
| * Strategy: fetch recently updated models sorted by downloads/likes, | ||
| * filter for notable AI models. |
There was a problem hiding this comment.
The top-level comment says the strategy is to “filter for notable AI models”, but fetchHfData() currently returns the raw API results without any filtering (only mapping fields). Either add an actual filter step (by tags/pipeline, etc.) or update the comment so it matches the implementation.
| * Strategy: fetch recently updated models sorted by downloads/likes, | |
| * filter for notable AI models. | |
| * Strategy: fetch recently updated models sorted by likes and return | |
| * a mapped subset of fields from the HF API response (no extra filtering). |
No description provided.