Skip to content

fix(providers/amazon): S3DagBundle does not delete stale dag recursively#63104

Merged
vincbeck merged 3 commits intoapache:mainfrom
jerryzhou196:s3-recursive-prune
Mar 9, 2026
Merged

fix(providers/amazon): S3DagBundle does not delete stale dag recursively#63104
vincbeck merged 3 commits intoapache:mainfrom
jerryzhou196:s3-recursive-prune

Conversation

@jerryzhou196
Copy link
Contributor

@jerryzhou196 jerryzhou196 commented Mar 8, 2026

This copies the implementation made for GCS so that we are pruning stale files and directories in a S3Bundle.

Testing:

  • It handles infinite symlinks as we call the .resolve() which resolves symlinks to their absolute path.
  • It tests nested stale file deletion, preservation of non-stale nested files, and cleanup of empty subdirectories
  • I also manually tested by uploading some files to S3, reading them in an S3 bundle, and then removing them and checking the old ones were removed

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
  • Cursor + Claude Code

…pdate tests to cover nested directories and ensure proper logging of deleted files and directories.
@jerryzhou196 jerryzhou196 requested a review from o-nikolas as a code owner March 8, 2026 06:25
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 8, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Mar 8, 2026
@jerryzhou196 jerryzhou196 changed the title S3 recursive prune Match S3 stale pruning with GCS Mar 8, 2026
@eladkal eladkal changed the title Match S3 stale pruning with GCS fix: S3DagBundle does not delete stale dag recursively Mar 8, 2026
@eladkal eladkal requested a review from vincbeck March 8, 2026 06:34
@jerryzhou196 jerryzhou196 changed the title fix: S3DagBundle does not delete stale dag recursively feat(providers/amazon): S3DagBundle does not delete stale dag recursively Mar 8, 2026
@jerryzhou196 jerryzhou196 changed the title feat(providers/amazon): S3DagBundle does not delete stale dag recursively fix(providers/amazon): S3DagBundle does not delete stale dag recursively Mar 8, 2026
@jerryzhou196
Copy link
Contributor Author

jerryzhou196 commented Mar 8, 2026

Ah, fixed the failing test. Not sure how to re-run CI.

@vincbeck vincbeck merged commit e491aac into apache:main Mar 9, 2026
92 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 9, 2026

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

jason810496 pushed a commit to jason810496/airflow that referenced this pull request Mar 10, 2026
…ely (apache#63104)

* Refactor S3Hook's local file synchronization logic to mach GCSHook. Update tests to cover nested directories and ensure proper logging of deleted files and directories.

* Update S3Hook logging level for deleted files and directories from info to debug to reduce log verbosity.
thejoeejoee pushed a commit to thejoeejoee/airflow that referenced this pull request Mar 10, 2026
…ely (apache#63104)

* Refactor S3Hook's local file synchronization logic to mach GCSHook. Update tests to cover nested directories and ensure proper logging of deleted files and directories.

* Update S3Hook logging level for deleted files and directories from info to debug to reduce log verbosity.
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
…ely (apache#63104)

* Refactor S3Hook's local file synchronization logic to mach GCSHook. Update tests to cover nested directories and ensure proper logging of deleted files and directories.

* Update S3Hook logging level for deleted files and directories from info to debug to reduce log verbosity.
Pyasma pushed a commit to Pyasma/airflow that referenced this pull request Mar 13, 2026
…ely (apache#63104)

* Refactor S3Hook's local file synchronization logic to mach GCSHook. Update tests to cover nested directories and ensure proper logging of deleted files and directories.

* Update S3Hook logging level for deleted files and directories from info to debug to reduce log verbosity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3DagBundle does not delete stale dag recursively

2 participants