Skip to content

Data Resource Lifecycle Adjustments #17034

@apparentlymart

Description

@apparentlymart

Background Info

Back in #6598 we introduced the idea of data sources, allowing us to model reading data from external sources as a first-class concept. This has generally been a successful addition, with some great new patterns emerging around it.

However, the current lifecycle for data sources creates some minor problems. As currently implemented, data sources are read during the "refresh" phase that runs prior to creating a plan, except in two situations:

  • If any of the configuration arguments in the corresponding data block have <computed> values.
  • If depends_on is non-empty for the data resource.

In both of the above cases, the read action is deferred until the "apply" phase, which in turn causes all of the result attributes to appear as <computed> in the plan.

Unfortunately both of the above situations are problematic today, as a consequence of data sources being processed during "refresh". These problems are described in more detail in the following sections.

When Data Resource Arguments change

Because data resources are read during the "refresh" phase, references to attributes of resources are resolved from their value in state rather than their value in the resulting diff. This results in a "change lag" , where certain changes to configuration require two runs of terraform apply to fully take effect. The first run reads the data source with the old resource values and then updates the resource, while the second run reads the data source using the new resource values, possibly causing further cascading changes to other resources.

This is particularly tricky for situations where a resource has custom diff logic (via the mechanism added in #14887) that detects and reports changes to Computed attributes that are side-effects of the requested changes, since this can result in additional value changes that are not reflected in the data source read.

The most problematic case is when an attribute is marked as <computed> during an update: this should cause any dependent data resource to be deferred until apply time, but instead the old value is used to do the read and the computed value resulting from the change is not detected at all.

Trouble with depends_on

The current behavior for depends_on for data resources is essentially useless, since it always results in a "perma-diff". The reason for this is that depends_on doesn't give Terraform enough information to know what aspect of the dependency is important, and so it must conservatively always defer the read until the "apply" phase to preserve the guarantee that it happens after the resource changes are finalized.

Ideally we'd like the data resource read to be deferred until apply time only if there are pending changes to a dependent resource, but that is not currently possible because we process data resources during the "refresh" phase where resource diffs have not yet been created, and thus we cannot determine if a change is pending.

Proposed Change

The above issues can be addressed by moving data source processing into the "plan" phase.

This was seen as undesirable during the original data source design because it would cause the "plan" phase to, for the first time, require reaching out to external endpoints. However, we have since made that compromise in order to improve the robustness of planning in #14887. In this new context, reading from data sources during plan is consistent with our goal of having Terraform make whatever requests it needs to make in order to produce an accurate, comprehensive plan. #15895 proposes some adjustments to the behavior of terraform validate so that it can be used as the "offline static check" command, allowing terraform plan to be more complex and require valid credentials for remote APIs even when the implicit refresh is disabled.

Including data source reads in the plan graph means that Terraform will produce diffs for resources before attempting to read data sources that depend on them, which addresses all of the problems described above: the data source arguments can be interpolated from the new values as defined in the diff, rather than the old values present in state.

In particular, the diff can be consulted in order to decide whether a data resource must be deferred until the "apply" phase, allowing any new <computed> values to be considered, and allowing depends_on to defer only if there is a non-empty diff for the referenced resource(s).

Effect on the Managed Resource Lifecycle

This change does not greatly affect the lifecycle for managed resources, but it does restore the original property that the "refresh" phase is a state-only operation with the exception of provider configuration.

An interesting implication of this is that it is in principle safe to disregard any inter-resource dependencies for refresh purposes and instead construct a flatter graph where each resource depends only on its provider. This in turn can permit greater parallelism in read calls, and more opportunities for API request consolidation once #7388 is addressed.

Effect on terraform refresh

Since data sources are currently processed in the "refresh" phase, the terraform refresh command currently updates them. This can be useful in situations where a root module output depends on a data source attribute and its state is being consumed with terraform_remote_state.

Moving data source reads to the plan phase will mean that terraform refresh will no longer update them. The change proposed in #15419 can partially mitigate this by making data resource updates -- and corresponding output updates -- an explicit part of the normal terraform apply flow, which is a more desirable outcome for the reasons described in that issue.

To retain the ability to only update data resources, without applying other changes, we can add a new argument to terraform plan (and, by extension, terraform apply with no explicit plan file argument) -read-only, which produces a reduced plan graph that only includes the data resources.

Bringing data source refresh into the main plan+apply workflow is superior to the current terraform refresh approach because it allows the user to evaluate and approve the resulting changes to outputs, rather than just blindly accepting these updates and potentially disrupting downstream remote state consumers.

For users that still want to accept data source updates without a confirmation step, the command line terraform apply -read-only -auto-approve would be equivalent to the current terraform refresh behavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions