AIP 67 - Multi-Team: Update Edge Executor to support multi team by wjddn279 · Pull Request #61646 · apache/airflow

wjddn279 · 2026-02-08T16:03:38Z

Config isolation

No major issues here. Following the same pattern as other executors (LocalExecutor, CeleryExecutor), all direct reads from the global conf have been replaced with self.conf, which is a team-aware ExecutorConf instance created in the base executor. This ensures that each team's executor reads team-specific configuration values (e.g., heartbeat interval, purge intervals) without affecting other teams.

Multi-instance isolation

Determining the right approach for isolating resources between teams in EdgeExecutor was not straightforward. Looking at the CeleryExecutor as a reference, it achieves full team isolation by assigning each team a separate broker and separate Celery worker pool. Based on this, I concluded that edge workers should also be partitioned per team.

However, unlike CeleryExecutor which uses external brokers, EdgeExecutor manages all persistence through shared DB tables. This means team-level isolation needs to happen at the query level. Specifically, the maintenance operations (_purge_jobs, _update_orphaned_jobs, _check_worker_liveness) were previously operating on all rows in these tables indiscriminately. In a multi-team setup where each team may have different configuration values, this could lead to one team's executor incorrectly purging another team's jobs or marking another team's workers as dead.

To address this, I introduced _managed_queues -- a per-instance set that tracks which queues this executor is responsible for. It is initialized with the default_queue from the (possibly team-specific) config and grows as queue_workload() is called. All maintenance queries now filter by WHERE queue IN (_managed_queues), and worker liveness checks skip workers whose registered queues do not overlap with the executor's managed queues.

This approach assumes that each team uses a distinct set of queues and that different teams do not share the same queue names.

Questions

Are queues guaranteed to be unique across teams? For example, could team_1 and team_2 both use a queue named "default"? If so, the current queue-based isolation would break down.
If queue uniqueness across teams is intended, is there a mechanism to enforce it? The queue parameter is set by users at the DAG/task level, so there is nothing currently preventing two teams from accidentally (or intentionally) using the same queue name.
Alternatively, would it be more appropriate to add a team_name column to EdgeJobModel / EdgeWorkerModel for explicit team-level filtering, rather than relying on queue-based inference?

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

jscheffl · 2026-02-08T18:56:20Z

Not sure whether AI concluded this:

Specifically, the maintenance operations (_purge_jobs, _update_orphaned_jobs, _check_worker_liveness) were previously operating on all rows in these tables indiscriminately. In a multi-team setup where each team may have different configuration values, this could lead to one team's executor incorrectly purging another team's jobs or marking another team's workers as dead.

This is WRONG. jobs are only purged on completion. Not while being in the queue. So should not be a problem. I would doubt that different configurations per team are needed.

I'd assume that in case Edge is needed that workers are started per team. Whereas central tables are shared.

Shall we make the PR "draft" until questions are clear?

Regarding the questions:

Are queues guaranteed to be unique across teams? For example, could team_1 and team_2 both use a queue named "default"? If so, the current queue-based isolation would break down. --> Not an expert in the multi team setup ... @o-nikolas can you provide an answer?
If queue uniqueness across teams is intended, is there a mechanism to enforce it? The queue parameter is set by users at the DAG/task level, so there is nothing currently preventing two teams from accidentally (or intentionally) using the same queue name. --> same answer bucket.
Alternatively, would it be more appropriate to add a team_name column to EdgeJobModel / EdgeWorkerModel for explicit team-level filtering, rather than relying on queue-based inference? --> without knowing the answers from above... I assume so. Which would be a blocker until the DB migration tooling in integrated in Edge which is still pending from Introduce EdgeDBManager: Independent Provider Specific Database Schema Management #61155 - before this not merged we should not extend the DB scheme.

providers/edge3/src/airflow/providers/edge3/executors/edge_executor.py

jscheffl

Blocking merge. Besides the changes I do not agree there is ZERO documentation being added how to make a multi-team setup possible. Also I assume DB Scheme must be adjusted - depending on if queues overlap.

dheerajturaga

Given that Edge executor is unique in persisting all state to shared tables (edge_job, edge_worker), the isolation approach needs to be discussed and agreed upon with the AIP-67 authors and the Edge provider maintainers before writing the implementation and not left as open questions in the PR body. I'd suggest starting that conversation first, since implementing a correct solution would involve modifying the table schemas for edge. What has been done for Celery can't be simply applied as is.

wjddn279 · 2026-02-09T01:12:53Z

This is WRONG. jobs are only purged on completion. Not while being in the queue. So should not be a problem. I would doubt that different configurations per team are needed.

I see my message wasn't conveyed as I intended.

This could lead to one team's executor incorrectly purging another team's jobs or marking another team's workers as dead

This shouldn't happen. The AI mistranslated it.🥲 Sorry for that.

However, the cleanup process in edge_executor's multi-instance environment runs on all instances and performs checks on all rows (regardless of team). This can cause side effects because it runs at unintended intervals compared to when there's only one instance. I thought it would be better if it only ran for the jobs managed by that particular instance.

Therefore, we needed a minimum unit to distinguish teams in that table (edge_job), and the best option I thought might be possible (without changing the table) was to distinguish by queue. That's why I left a separate question asking if that was actually feasible. I also tried implementing it in that direction for now. Of course, I'm open to changing the approach.

wjddn279 · 2026-02-09T01:15:49Z

@jscheffl @dheerajturaga

Thank you for the review. Actually, I examined several cases but couldn't determine the exact direction, so I implemented it in what I thought was the best approach, and I'm of course willing to change it according to the decision. I should have checked first...

I'll change the PR to draft.

o-nikolas · 2026-02-09T23:42:33Z

@jscheffl @wjddn279

Regarding the questions:

1. Are queues guaranteed to be unique across teams? For example, could team_1 and team_2 both use a queue named "default"? If so, the current queue-based isolation would break down. --> Not an expert in the multi team setup ... @o-nikolas can you provide an answer?

The queue param has always been an odd duck. Originally hardcoded into Airflow for the Celery Executor specifically, then abused for the old concrete hybrid executors (to specify which executor a task would be sent to). It's now used by only CeleryExecutor and EdgeExecutor (as far as I know). I would personally like to see this thing deprecated in favour of using the executor_config, but let's at least not abuse it yet again. It is not yet team-ified and I don't have any specific plans to do so. It is a property of a task, which is team-aware, so I'm not sure what benefit making it team aware would be or how that would look. Tasks will be sent to the correct executor instance for their team inside the scheduler well before the queue property is ever evaluated inside the executor (which is team aware_, so I'm not sure how it would help in this situation or what it would mean to make it team aware.

3. Alternatively, would it be more appropriate to add a team_name column to EdgeJobModel / EdgeWorkerModel for explicit team-level filtering, rather than relying on queue-based inference? --> without knowing the answers from above... I assume so. Which would be a blocker until the DB migration tooling in integrated in Edge which is still pending from [Add EdgeDBManager for provider-specific database migrations #61155](https://github.com/apache/airflow/pull/61155) - before this not merged we should not extend the DB scheme.

I believe this is the best answer. Let's not abuse queue again and implement this right 😄

wjddn279 · 2026-02-19T07:29:54Z

@o-nikolas @jscheffl @dheerajturaga

So to summarize your points — can I confirm that we've reached an agreement on adding a team_name column to the EdgeJobModel / EdgeWorkerModel tables?

If so, I'll update the worker startup to allow specifying a team_name to belong to, alongside the existing queue subscription. A worker will only execute a job if it belongs to the same team_name and the job is delivered through one of the specified queues.

jscheffl · 2026-02-19T21:07:08Z

@o-nikolas @jscheffl @dheerajturaga

So to summarize your points — can I confirm that we've reached an agreement on adding a team_name column to the EdgeJobModel / EdgeWorkerModel tables?

If so, I'll update the worker startup to allow specifying a team_name to belong to, alongside the existing queue subscription. A worker will only execute a job if it belongs to the same team_name and the job is delivered through one of the specified queues.

Yes and - I assume this is with others as well - Multi Team is an option feature, so the CLI param and column is optionally respected only. probably NULL for most setups and this is to be respected as well.

Good thing about the delayed discussion is that meanwhile the DB manager for table migrations has been merged in #61155 so if you need a column you cann add the first migration there now.

wjddn279 · 2026-02-23T09:31:26Z

@o-nikolas @jscheffl @dheerajturaga

The task is complete and ready for review.

The following work has been done:

Wrote a migration to add the team_name column.
Enabled multi-instance support for the existing executor per team_name. The scope of tasks for each worker is now limited to the same team_name.
Updated the commands and API endpoints related to existing workers to optionally accept team_name, and modified them to be scoped to the same team_name.

Notes

When each worker starts, cases like team_name1, team_name2, and no team are treated as 3 separate team scopes: team_name1, team_name2, and no team (team_name column = null).

wjddn279 · 2026-02-23T09:32:15Z

Also, it seems like there are no integration tests for this. Would it be okay if I write some when I have time?

jscheffl · 2026-02-23T21:43:19Z

Also, it seems like there are no integration tests for this. Would it be okay if I write some when I have time?

Intergation tests for Edge are a long standing item on my bucket list. Never had time to make them. I'd be very very happy about a contribution. Maybe in a separate PR.

Otherwise some back-compat tests are failing and static checks need fixing...

jscheffl

Some more comments. In general and structurally looking very good already!

providers/edge3/src/airflow/providers/edge3/cli/api_client.py

providers/edge3/src/airflow/providers/edge3/models/db.py

providers/edge3/src/airflow/providers/edge3/models/edge_worker.py

providers/edge3/tests/unit/edge3/models/test_db.py

jscheffl · 2026-02-23T21:58:12Z

Forgot to mention: There are zero docs. Can you add a description about multi teamto RST docs as well - especially also highlighting the security restriction for the time being?

wjddn279 · 2026-03-09T10:27:37Z

@dheerajturaga
I made the ci / cd green and merge the changes in #62896

There are two things I'd like to confirm:

Could you please verify whether the team_name column exists, as raised in the previous review (AIP 67 - Multi-Team: Update Edge Executor to support multi team #61646 (comment))?
Could you confirm the migration order? Is version 3.1.0 correct for team_name support?

jscheffl · 2026-03-09T20:53:32Z

2. Could you confirm the migration order? Is version 3.1.0 correct for team_name support?

Regarding (2): 3.2.0 is the provider version for which a DB migratio is targetted. At the moment 3.1.0 is the version being available. Tomorrow a new provider will be cut and due to functional enhancements this will get to be 3.2.0.

If your PR is merged before tomorrow it will get into 3.2.0, else it will get in two weeks later into 3.3.0. As some reviews are open... not sure if this can be achieved. I'd rather propose to re-work targetting for a 3.3.0 version.

wjddn279 · 2026-03-10T08:57:12Z

If your PR is merged before tomorrow it will get into 3.2.0, else it will get in two weeks later into 3.3.0. As some reviews are open... not sure if this can be achieved. I'd rather propose to re-work targetting for a 3.3.0 version.

no problem! changed!

jscheffl · 2026-03-10T23:14:38Z

Now looks good, as we are just in the moment releasing new providers maybe hold on for 1-2 days that no erro needs fixing... @dheerajturaga WDYT, good enough now to be merged?

potiuk · 2026-03-12T01:39:55Z

@wjddn279 @dheerajturaga — This PR has new commits since the last review requesting changes, and it looks like the author has followed up. Could you take another look when you have a chance to see if the review comments have been addressed? Thanks!

wjddn279 requested review from dheerajturaga and jscheffl as code owners February 8, 2026 16:03

boring-cyborg bot added area:providers provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Feb 8, 2026

wjddn279 mentioned this pull request Feb 8, 2026

AIP 67 - Mult Team: Updating Executors to Support Multi Team #60912

Open

8 tasks

jscheffl reviewed Feb 8, 2026

View reviewed changes

providers/edge3/src/airflow/providers/edge3/executors/edge_executor.py Outdated Show resolved Hide resolved

jscheffl reviewed Feb 8, 2026

View reviewed changes

providers/edge3/src/airflow/providers/edge3/executors/edge_executor.py Outdated Show resolved Hide resolved

jscheffl reviewed Feb 8, 2026

View reviewed changes

providers/edge3/src/airflow/providers/edge3/executors/edge_executor.py Outdated Show resolved Hide resolved

jscheffl requested changes Feb 8, 2026

View reviewed changes

dheerajturaga requested changes Feb 9, 2026

View reviewed changes

wjddn279 marked this pull request as draft February 9, 2026 01:16

wjddn279 force-pushed the edge-worker-multi-team branch from 1951cbf to 67c6926 Compare February 23, 2026 09:12

wjddn279 marked this pull request as ready for review February 23, 2026 09:19

jscheffl reviewed Feb 23, 2026

View reviewed changes

wjddn279 force-pushed the edge-worker-multi-team branch 2 times, most recently from d55c3f9 to e7400d6 Compare February 25, 2026 09:09

jscheffl mentioned this pull request Feb 25, 2026

Re-introducing --use-migration-files and fix inconsistences between ORM/migration files #62234

Merged

wjddn279 force-pushed the edge-worker-multi-team branch 3 times, most recently from bf98bba to cdba6d9 Compare February 26, 2026 12:38

wjddn279 force-pushed the edge-worker-multi-team branch 8 times, most recently from 18851cf to 6fcfab5 Compare March 9, 2026 06:03

wjddn279 marked this pull request as ready for review March 9, 2026 07:06

wjddn279 force-pushed the edge-worker-multi-team branch from 6fcfab5 to d4ba386 Compare March 9, 2026 08:36

wjddn279 force-pushed the edge-worker-multi-team branch from d4ba386 to 9184670 Compare March 10, 2026 08:55

wjddn279 force-pushed the edge-worker-multi-team branch from 9184670 to f926598 Compare March 10, 2026 09:10

wjddn279 added 10 commits March 10, 2026 23:15

Update Edge Executor to support multi team

6443167

fix logics for multi-team

2726d8a

add the contents of multi-team in docs

8c6341d

fix query logic

413e3ad

fix lint

7746c15

fix for test

fceda4f

fix docs and db manager logic

a9010a4

fix logic

8bd4482

fix migration order

1a7611a

fix mypy test error

1233232

wjddn279 force-pushed the edge-worker-multi-team branch from f926598 to 1233232 Compare March 10, 2026 14:16

Merge branch 'main' into edge-worker-multi-team

507244f

potiuk added the ready for maintainer review Set after triaging when all criteria pass. label Mar 12, 2026

Conversation

wjddn279 commented Feb 8, 2026

Config isolation

Multi-instance isolation

Questions

Was generative AI tooling used to co-author this PR?

Uh oh!

jscheffl commented Feb 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

dheerajturaga left a comment

Choose a reason for hiding this comment

Uh oh!

wjddn279 commented Feb 9, 2026

Uh oh!

wjddn279 commented Feb 9, 2026

Uh oh!

o-nikolas commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjddn279 commented Feb 19, 2026

Uh oh!

jscheffl commented Feb 19, 2026

Uh oh!

wjddn279 commented Feb 23, 2026

Uh oh!

wjddn279 commented Feb 23, 2026

Uh oh!

jscheffl commented Feb 23, 2026

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscheffl commented Feb 23, 2026

Uh oh!

wjddn279 commented Mar 9, 2026

Uh oh!

jscheffl commented Mar 9, 2026

Uh oh!

wjddn279 commented Mar 10, 2026

Uh oh!

jscheffl commented Mar 10, 2026

Uh oh!

potiuk commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

o-nikolas commented Feb 9, 2026 •

edited

Loading