Data Resource Lifecycle Adjustments

# Background Info

Back in #6598 we introduced the idea of _data sources_, allowing us to model reading data from external sources as a first-class concept. This has generally been a successful addition, with some great new patterns emerging around it.

However, the current lifecycle for data sources creates some minor problems. As currently implemented, data sources are read during the "refresh" phase that runs prior to creating a plan, except in two situations:

* If any of the configuration arguments in the corresponding `data` block have `<computed>` values.
* If `depends_on` is non-empty for the data resource.

In both of the above cases, the read action is deferred until the "apply" phase, which in turn causes all of the result attributes to appear as `<computed>` in the plan.

Unfortunately both of the above situations are problematic today, as a consequence of data sources being processed during "refresh". These problems are described in more detail in the following sections.

## When Data Resource Arguments change

Because data resources are read during the "refresh" phase, references to attributes of resources are resolved from their value in state rather than their value in the resulting diff. This results in a "change lag" , where certain changes to configuration require two runs of `terraform apply` to fully take effect. The first run reads the data source with the _old_ resource values and _then_ updates the resource, while the second run reads the data source using the _new_ resource values, possibly causing further cascading changes to other resources.

This is particularly tricky for situations where a resource has custom diff logic (via the mechanism added in #14887) that detects and reports changes to `Computed` attributes that are side-effects of the requested changes, since this can result in additional value changes that are not reflected in the data source read.

The most problematic case is when an attribute is marked as `<computed>` during an update: this _should_ cause any dependent data resource to be deferred until apply time, but instead the old value is used to do the read and the computed value resulting from the change is not detected at all.

## Trouble with `depends_on`

The current behavior for `depends_on` for data resources is essentially useless, since it always results in a "perma-diff". The reason for this is that `depends_on` doesn't give Terraform enough information to know what aspect of the dependency is important, and so it must conservatively _always_ defer the read until the "apply" phase to preserve the guarantee that it happens after the resource changes are finalized.

Ideally we'd like the data resource read to be deferred until apply time only if there are pending changes to a dependent resource, but that is not currently possible because we process data resources during the "refresh" phase where resource diffs have not yet been created, and thus we cannot determine if a change is pending.

# Proposed Change

The above issues can be addressed by moving data source processing into the "plan" phase.

This was seen as undesirable during the original data source design because it would cause the "plan" phase to, for the first time, require reaching out to external endpoints. However, we have since made that compromise in order to improve the robustness of planning in #14887. In this new context, reading from data sources during plan is consistent with our goal of having Terraform make whatever requests it needs to make in order to produce an accurate, comprehensive plan. #15895 proposes some adjustments to the behavior of `terraform validate` so that it can be used as the "offline static check" command, allowing `terraform plan` to be more complex and require valid credentials for remote APIs even when the implicit refresh is disabled.

Including data source reads in the _plan_ graph means that Terraform will produce diffs for resources _before_ attempting to read data sources that depend on them, which addresses all of the problems described above: the data source arguments can be interpolated from the new values as defined in the diff, rather than the old values present in state.

In particular, the diff can be consulted in order to decide whether a data resource must be deferred until the "apply" phase, allowing any new `<computed>` values to be considered, and allowing `depends_on` to defer only if there is a non-empty diff for the referenced resource(s).

## Effect on the Managed Resource Lifecycle

This change does not greatly affect the lifecycle for _managed_ resources, but it _does_ restore the original property that the "refresh" phase is a state-only operation with the exception of provider configuration.

An interesting implication of this is that it is in principle safe to disregard any inter-resource dependencies for refresh purposes and instead construct a flatter graph where each resource depends only on its provider. This in turn can permit greater parallelism in read calls, and more opportunities for API request consolidation once #7388 is addressed.

## Effect on `terraform refresh`

Since data sources are currently processed in the "refresh" phase, the `terraform refresh` command currently updates them. This can be useful in situations where a root module output depends on a data source attribute and its state is being consumed with `terraform_remote_state`.

Moving data source reads to the plan phase will mean that `terraform refresh` will no longer update them. The change proposed in #15419 can partially mitigate this by making data resource updates -- and corresponding output updates -- an explicit part of the normal `terraform apply` flow, which is a more desirable outcome for the reasons described in that issue.

To retain the ability to _only_ update data resources, without applying other changes, we can add a new argument to `terraform plan` (and, by extension, `terraform apply` with no explicit plan file argument) `-read-only`, which produces a reduced plan graph that _only_ includes the data resources.

Bringing data source refresh into the main plan+apply workflow is superior to the current `terraform refresh` approach because it allows the user to evaluate and approve the resulting changes to outputs, rather than just blindly accepting these updates and potentially disrupting downstream remote state consumers.

For users that still want to accept data source updates without a confirmation step, the command line `terraform apply -read-only -auto-approve` would be equivalent to the current `terraform refresh` behavior.


  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Resource Lifecycle Adjustments #17034

Background Info

When Data Resource Arguments change

Trouble with `depends_on`

Proposed Change

Effect on the Managed Resource Lifecycle

Effect on `terraform refresh`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data Resource Lifecycle Adjustments #17034

Description

Background Info

When Data Resource Arguments change

Trouble with depends_on

Proposed Change

Effect on the Managed Resource Lifecycle

Effect on terraform refresh

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Trouble with `depends_on`

Effect on `terraform refresh`