Skip to content

Fix & improve state locking of OSS backend#24149

Merged
jbardin merged 7 commits intohashicorp:masterfrom
mlafeldt:fix-oss-state-locking
Mar 11, 2020
Merged

Fix & improve state locking of OSS backend#24149
jbardin merged 7 commits intohashicorp:masterfrom
mlafeldt:fix-oss-state-locking

Conversation

@mlafeldt
Copy link
Copy Markdown
Contributor

While using the OSS backend with Terragrunt, which can execute many Terraform modules in parallel, I found multiple bugs in the way the backend handles state locking via Tablestore.

Current behavior

Currently, the locking works as follows:

  • User creates a Tablestore table with some arbitrary primary key
  • The backend discovers this primary key and always writes the fixed string "terraform-remote-state-lock" to it
  • The backend then tries to update/delete rows based on this primary key alone, ignoring any other attributes
Something (PK) Digest Info
terraform-remote-state-lock (empty) {ID: ...}
terraform-remote-state-lock-md5 <hash> (empty)

This won't work when having more than one lock in the table at a time. What the backend actually does is provide a global lock for Terraform, not per-module locking.

For the same reason, MD5 hashing is broken as there can only be one hash at a time.

New behavior

This PR changes the logic to follow the more obvious data model of DynamoDB, which requires users to create a Tablestore table with a primary key named LockID:

LockID (PK) Digest Info
<statefile-path> (empty) {ID: ...}
<statefile-path>-md5 <hash> (empty)

Note that this breaks backward compatibility since the schema of the Tablestore table is different. I tried to fix the backend without having to change the schema, but couldn't make it work, even with additional row conditions/filters.

Tests are passing too:

$ TF_ACC=1 go test ./backend/remote-state/oss/
ok      github.com/hashicorp/terraform/backend/remote-state/oss 24.788s

cc @arafato @xiaozhu36

So that is shows up in lock errors, etc.
To allow using the same Tablestore table with multiple OSS buckets.

e.g. instead of env:/some/path/terraform.tfstate

the LockID now becomes some-bucket/env:/some/path/terraform.tfstate
@mlafeldt mlafeldt changed the title Fix state locking bugs in OSS backend Fix & improve state locking of OSS backend Feb 18, 2020
@mlafeldt
Copy link
Copy Markdown
Contributor Author

I also added a test that shows the buggy behavior fixed by the PR. PTAL.

@mlafeldt
Copy link
Copy Markdown
Contributor Author

@apparentlymart @pselle Is there anything I can do to move this forward?

@xiaozhu36
Copy link
Copy Markdown
Contributor

HI @mlafeldt Thanks for your contribution and I have a question that what do you mean "this breaks backward compatibility since the schema of the Tablestore table is different. I tried to fix the backend without having to change the schema, but couldn't make it work, even with additional row conditions/filters." ? Does it effect the existing state and lock?

@mlafeldt
Copy link
Copy Markdown
Contributor Author

Does it effect the existing state and lock?

Yes, that's what I meant with breaking backward compatibility. Users have to create a new TableStore table with a primary key named LockID (see my doc changes). As I wrote, I tried to fix the code without breaking the schema, but I don't think it's possible given that the PK is currently hard-coded.

@xiaozhu36
Copy link
Copy Markdown
Contributor

@mlafeldt Ok. Thanks for you feedback.

@danieldreier
Copy link
Copy Markdown
Contributor

@mlafeldt I wanted to check in here as the Terraform Core engineering manager. Thank you for contributing this improvement! @pkolyvas, our product manager, prompted @xiaozhu36 for review after you reached out, because he's the original author of the backend. I'm going to prompt the core team to review this now that he's approved it.

@jbardin jbardin self-assigned this Mar 11, 2020
@jbardin jbardin merged commit f622110 into hashicorp:master Mar 11, 2020
@jbardin
Copy link
Copy Markdown
Member

jbardin commented Mar 11, 2020

Thanks for the contribution @mlafeldt, and the review @xiaozhu36 !

The master branch is the in-progress work for the 0.13 release, and this will be included in the next major release. This PR cannot be backported to the 0.12 release branch due to the backwards incompatible change.

Thanks!

noahmercado pushed a commit to noahmercado/terraform that referenced this pull request Apr 8, 2020
Fix & improve state locking of OSS backend
@ghost
Copy link
Copy Markdown

ghost commented Apr 11, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants