-
Notifications
You must be signed in to change notification settings - Fork 16.7k
Description
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow-providers-google==20.0.0rc1
Apache Airflow version
main
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Other
Deployment details
No response
What happened
ComputeEngineInsertInstanceOperator currently treats the mere presence of a Compute Engine instance as success. When an instance already exists, the operator logs that it exists and immediately returns success without verifying whether the existing resource matches the requested configuration. As a result, configuration differences are not detected on subsequent DAG runs—particularly changes to critical fields such as machine_type or boot disk settings. If these values are modified in the DAG and the task is re-run, the operator neither recreates nor updates the instance; it simply succeeds silently. This leads to incorrect idempotence semantics and allows infrastructure drift to persist undetected across DAG executions.
What you think should happen instead
The operator should verify that the existing Compute Engine instance matches the configuration defined in the DAG rather than relying solely on presence-based idempotence. On subsequent DAG runs, it should detect differences in critical configuration fields—such as machine_type and disk parameters—and surface those differences explicitly. At minimum, configuration drift should be logged so that users are aware of the mismatch. Ideally, the operator should support reconciling the resource to the declared state, for example by recreating the instance when differences are detected and an explicit flag is set. This would ensure consistent, declarative behavior across DAG re-runs and prevent silent infrastructure drift.
How to reproduce
- Configure a Google Cloud connection in Airflow (e.g.
google_cloud_default) with a service account that has permission to create and manage Compute Engine instances. - Ensure you have a valid GCP project ID and zone (for example,
us-central1-a). - Create the following DAG:
from airflow import DAG
from datetime import datetime
from airflow.providers.google.cloud.operators.compute import (
ComputeEngineInsertInstanceOperator,
)
PROJECT_ID = "<YOUR_PROJECT_ID>"
ZONE = "us-central1-a"
INSTANCE_NAME = "airflow-idempotence-test"
with DAG(
dag_id="gce_idempotence_repro",
start_date=datetime(2025, 1, 1),
schedule=None,
catchup=False,
) as dag:
create_instance = ComputeEngineInsertInstanceOperator(
task_id="create_instance",
project_id=PROJECT_ID,
zone=ZONE,
body={
"name": INSTANCE_NAME,
"machine_type": f"zones/{ZONE}/machineTypes/e2-medium", # Initial machine type used in this repro
"disks": [
{
"boot": True,
"auto_delete": True,
"initialize_params": {
"source_image": "projects/debian-cloud/global/images/family/debian-11"
},
}
],
"network_interfaces": [
{
"network": "global/networks/default"
}
],
},
)
-
Trigger the DAG once and confirm that the instance is created successfully.
-
Update the machine type:
"machine_type": f"zones/{ZONE}/machineTypes/n2-standard-4",
(Any valid machine type different from the original may be used; in this repro,n2-standard-4is used.) -
Trigger the DAG again.
Observed Behavior
The task logs that the instance already exists and completes successfully. However, the instance configuration remains unchanged (e.g., the machine type stays e2-medium), and no configuration differences are detected or logged. This shows that changes to fields such as machine_type (and disk configuration) are not recognized on DAG re-run, and the operator does not reconcile the resource to the requested state.
Anything else
This report is not proposing destructive behavior by default. Automatically deleting and recreating instances when differences are detected may not be appropriate for all users or environments. However, at a minimum, configuration differences should be surfaced in logs. Any reconciliation behavior (such as recreation) should be explicitly opt-in, allowing users to choose stronger convergence semantics when desired without changing existing default behavior.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct