Skip to content

Add ArXiv, HuggingFace, Dev.to, Lobsters data sources#367

Merged
duanyytop merged 3 commits intomasterfrom
feat/new-data-sources
Apr 2, 2026
Merged

Add ArXiv, HuggingFace, Dev.to, Lobsters data sources#367
duanyytop merged 3 commits intomasterfrom
feat/new-data-sources

Conversation

@duanyytop
Copy link
Copy Markdown
Owner

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new daily digest data sources (ArXiv, Hugging Face, Dev.to, Lobste.rs) and wires them into the existing fetch → prompt → LLM → save/issue pipeline.

Changes:

  • Add fetchers for ArXiv papers, Hugging Face trending models, Dev.to articles, and Lobste.rs stories.
  • Add new prompt builders and report saver functions for ArXiv/HF/Community digests (zh/en).
  • Register new report IDs/labels across i18n, GitHub label colors, manifest generation, and the static index page.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/report-savers.ts Adds saver functions for ArXiv/HF/Community reports and connects them to prompts/issues/files.
src/prompts-data.ts Adds prompt builders for ArXiv/HF/Community digests.
src/arxiv.ts New ArXiv fetch + lightweight Atom parsing.
src/hf.ts New Hugging Face models fetcher.
src/devto.ts New Dev.to articles fetcher.
src/lobsters.ts New Lobste.rs stories fetcher (JSON endpoints).
src/index.ts Wires new fetchers and report generation into main flow + highlights inputs.
src/i18n.ts Adds report metadata, labels, and notify labels for new report types.
src/github.ts Adds GitHub label colors for new labels.
src/generate-manifest.ts Includes new report IDs in manifest generation list.
index.html Adds new report IDs to the front-end label map.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lobsters.ts Outdated
@@ -0,0 +1,104 @@
/**
* Lobste.rs AI stories fetched via RSS feed.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File header says Lobste.rs stories are fetched via an RSS feed, but the implementation actually calls Lobste.rs tag-based JSON endpoints (e.g., /t/ai.json). Please update the comment to match the JSON API approach (or switch implementation to RSS if that’s the intent).

Suggested change
* Lobste.rs AI stories fetched via RSS feed.
* Lobste.rs AI stories fetched via tag-based JSON endpoints (e.g., /t/ai.json).

Copilot uses AI. Check for mistakes.
Comment thread src/lobsters.ts Outdated
Comment on lines +92 to +94
const oneDayAgo = Date.now() - 7 * 24 * 60 * 60 * 1000;
const stories = [...seen.values()]
.filter((s) => new Date(s.publishedAt).getTime() > oneDayAgo)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name oneDayAgo is misleading: it’s set to Date.now() - 7 * 24 * 60 * 60 * 1000 (7 days), and the comment above also says “last 7 days”. Rename the variable (e.g., sevenDaysAgo) to avoid confusion and make the filter logic self-explanatory.

Suggested change
const oneDayAgo = Date.now() - 7 * 24 * 60 * 60 * 1000;
const stories = [...seen.values()]
.filter((s) => new Date(s.publishedAt).getTime() > oneDayAgo)
const sevenDaysAgo = Date.now() - 7 * 24 * 60 * 60 * 1000;
const stories = [...seen.values()]
.filter((s) => new Date(s.publishedAt).getTime() > sevenDaysAgo)

Copilot uses AI. Check for mistakes.
Comment thread src/arxiv.ts Outdated
Comment on lines +4 to +5
* Strategy: query cs.AI + cs.CL + cs.LG categories for papers
* submitted in the last 24 hours, sorted by submission date.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring claims papers are “submitted in the last 24 hours”, but fetchArxivData() doesn’t apply any time-based filter (it fetches the newest max_results per category and dedupes). Either implement a published/updated cutoff (e.g., now - 24h) or adjust the comment so it doesn’t overpromise.

Suggested change
* Strategy: query cs.AI + cs.CL + cs.LG categories for papers
* submitted in the last 24 hours, sorted by submission date.
* Strategy: query cs.AI + cs.CL + cs.LG categories for the newest papers,
* sorted by submission date (up to ARXIV_MAX_RESULTS per category).

Copilot uses AI. Check for mistakes.
Comment thread src/arxiv.ts Outdated
// ---------------------------------------------------------------------------

const ARXIV_MAX_RESULTS = 50;
const API_URL = "http://export.arxiv.org/api/query";
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API_URL uses plain HTTP. If ArXiv’s export API supports HTTPS in your runtime, prefer an https:// endpoint to avoid MITM / proxy tampering; otherwise, add a short note explaining why HTTP is required here.

Suggested change
const API_URL = "http://export.arxiv.org/api/query";
const API_URL = "https://export.arxiv.org/api/query";

Copilot uses AI. Check for mistakes.
Comment thread src/hf.ts Outdated
Comment on lines +4 to +5
* Strategy: fetch recently updated models sorted by downloads/likes,
* filter for notable AI models.
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top-level comment says the strategy is to “filter for notable AI models”, but fetchHfData() currently returns the raw API results without any filtering (only mapping fields). Either add an actual filter step (by tags/pipeline, etc.) or update the comment so it matches the implementation.

Suggested change
* Strategy: fetch recently updated models sorted by downloads/likes,
* filter for notable AI models.
* Strategy: fetch recently updated models sorted by likes and return
* a mapped subset of fields from the HF API response (no extra filtering).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants