Skip to content

After source ingestion, update last_synced_at #2

@MathyouMB

Description

@MathyouMB

NOTE: Don't assign yourself unless you have have confirmed with Matthew you've got a working environment

🧠 Context

The last_synced_at field on a source is intended to track when it was most recently ingested. However, this field is currently not updated after ingestion runs, which makes it harder to determine whether chunks are stale or current.

This field needs to be updated immediately after a source is successfully ingested to ensure downstream logic (like stale chunk cleanup) works correctly.


🛠 Implementation Plan

  1. In src/ingestion/services/webpage_ingestion_service.py, after successfully processing and storing chunks:

    • Set source.last_synced_at = datetime.utcnow().
    • Save the updated source to the database.
  2. Ensure this happens before triggering the cleanup task.

  3. Add a test to confirm that last_synced_at is updated after ingestion.


✅ Acceptance Criteria

  • After ingestion completes successfully for a source, update source.last_synced_at to the current timestamp.
  • Save the updated source object to persist the change.
  • This should happen in WebpageIngestionService after all chunks are created and saved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions