Data sharing#17
Conversation
|
I'm putting this in draft for now because there's still a bit of work to finish off the update logic, and probably some cleanup as well. But I wanted to go ahead and open the PR to show what it will look like. Before putting this into ready for review I'll probably squash the commits. |
…ove entire schema with all assets works. Modify manually-managed assets in datashare schema still not implemented.
6203392 to
598db51
Compare
|
One other thing to note... UPDATE: I've identified the issue, but it's kind of a deal-breaker for being able to manage functions in a data share. The issue is that when you manage UDFs in Redshift, you have to include the parameter list, like so: ALTER DATASHARE ADD FUNCTION foo(varchar);(NB: this is also the syntax you need in order to However you can't read back the parameter list -
I can think of 3 ways of handling this:
@winglot what is your preferred approach? I think option 1 makes the most sense from a user standpoint, but I don't want to rip out all of the code I've written yet if you think there's value in being able to manage specific tables within a datashare/schema. |
|
further to my comment above, I think what I will probably do is create an alternative branch/PR for option 1 and close this one for the time being. That will make things work in a reliable and expected way, and leave the door open to more fine-grained controls if AWS ever roll out a fix for the above issue, without having to rewrite all the code from scratch. |
Update actions/checkout digest to 8ade135
Adds a
redshit_datashareresource to manage data sharing between Redshift clusters. This should be defined on the producer cluster. For managing the schemas and objects, we use a nested attribute block.Note that we're rolling up the management of schemas (and their objects) into two modes,
autoandmanual. This avoids a lot of weird edge cases that would otherwise come up if we'd tried to expose individual settings forALL TABLES,ALL FUNCTIONS, andINCLUDENEWin theALTER DATASHAREcommand:automode, we addALL TABLES IN SCHEMAandALL FUNCTIONS IN SCHEMAandSET INCLUDENEW=true FOR SCHEMA, so that newly-created tables/functions are automatically exposed to the datashare by the redshift cluster itself, without needing to re-run terraform.manualmode, we only expose the specific tables/functions that are configured in theschemablock.This PR turned out to be so big that I decided it best not to include the corresponding data source. For that one I intend for it to follow the same structure as what's in this PR, and allow it to be defined on either the producer or the consumer.
The update code turned out to be way more involved than I'd originally hoped, due to issues with terraform-plugin-sdk. What I really wanted for the nested schema blocks was to use blocks, but treat them in the backend as a map instead of a set/list (in other words, treat
nameas the unique identifier for the nested schema block). Unfortunately,schema.TypeMapcan only store primitive types. Doing a simple hash function on the name meant terraform wasn't picking up changes to the tables/functions inside the schema block. I tried this in combination withCustomizeDiffbut never could get it to properly detect the changes.I ended up taking inspiration from the
aws_security_groupresource in terraform-provider-aws, which also has to work around this issue of doing incremental updates to nested attributes in aschema.TypeSet. The terraform plan output makes it appear that we're completely dropping the schema from the datashare and then re-adding it (I'd hoped that hashing on the name would solve that problem, but instead terraform simply didn't detect changes to the schema configuration), but during the update there's a bunch of extra logic to figure out what tables/functions actually need adding/removing. It's uglier than I'd like, but it works when none of the other approaches I've taken did.