-
Notifications
You must be signed in to change notification settings - Fork 74
[cost monitoring] Configure and roll out cost-monitoring across AWS clusters #6515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Merging this PR will trigger the following deployment actions. Support deployments
Staging deployments
Production deployments
|
9732fad
to
419a510
Compare
helm-charts/support/values.jsonnet
Outdated
@@ -1,4 +1,6 @@ | |||
local cluster_name = std.extVar('2I2C_VARS.CLUSTER_NAME'); | |||
local provider_name = std.extVar('2I2C_VARS.PROVIDER'); | |||
local aws_account_id = std.extVar('2I2C_VARS.AWS_ACCOUNT_ID'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment on what this variable will resolve to if we aren't on AWS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment: "# undefined if provider_name != 'aws'"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be fine, since the variable is currently only needed if provider_name == 'aws'
in
'jupyterhub-cost-monitoring': if provider_name == 'aws' then configCostMonitoring else { enabled: false },
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually! I have changed the approach to include aws_account_id
as a top-level argument to handle the case where this is undefined for non-AWS cloud providers.
This is passed to the jsonnet via the deployer deploy-support
command based on the new key aws.active_cost_tags
under the cluster.yaml
file.
Two comments, otherwise this looks good! Do test with staging on both openscapes and ghg where we are doing a rename to make sure that they work correctly. |
2a45e30
to
9b7885d
Compare
@yuvipanda i have tested this on openscapeshub and nasa-ghg-hub and all looks good. If you decide to approve this, then I can slowly (and calmly :D) apply the terraform changes across all our AWS clusters over the weekend when user activity is low in case something gets broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks great! I want to expound a principle for our cluster.yaml and then ask for a specific change here.
The principle is that 'we should keep things in cluster.yaml as minimal as possible'. This means that every time we add something in there, we should check to see 'can this be done by another means?'. The primary goal here is to rely on things that are upstream (helm, z2jh) as much as possible. My intuition here makes me want to be particularly careful with things that are bools. billing.paid_by_us
is not really used anywhere for example.
In this case, it looks like the only point of active_cost_tags
is to enable or disable cost monitoring. If we were to apply the previous principle here, we can accomplish that goal with helm directly. support/values.jsonnet
takes precedence over any support.values.yaml
in a particular cluster. So I suggest we:
- Remove
billing.active_cost_tags
as a field - Document that for cases where we don't have cost tags enabled, we should set
jupyterhub-cost-monitoring.enabled
to false insupport.values.yaml
- Set this wherever needed.
This also means we don't have to have a field in cluster.yaml that's true
most of the time!
deployer/utils/jsonnet.py
Outdated
if provider is not None: | ||
command += ["--ext-str", f"2I2C_VARS.PROVIDER={provider}"] | ||
if aws_account_id is not None: | ||
command += ["--tla-str", f"aws_account_id={aws_account_id}"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use tla-str rather than ext-str here? If so, document it and if not let's use the same one everywhere.
Let's also use 2I2C_VARS.<ALL_CAPS>
as the format for variables we pass in? If that's not something we can do for tla let's find an equivalent one that very clearly marks it as a 2i2c specific variable we are passing in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tla-str can be referenced in the support/values.jsonnet
if they are undefined, which is the case for support charts for non-AWS clusters. I've wrapped everything but the global --ext-str variables into a function so that the tla-str can be passed in.
tla-str does not accept the format proposed, so I have tweaked our 2i2c variable names to: VAR_2I2C_<variable_name>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deployer/infra_components/cluster.py
Outdated
} | ||
if ( | ||
self.spec["provider"] == "aws" | ||
and self.spec["aws"]["billing"]["active_cost_tags"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pass this account id in unconditionally on all AWS things regardless of wether we enable cost monitoring or not. Let's generally try to keep the knowledge about what's in the cluster files and conditional logic as minimal as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helm-charts/support/values.jsonnet
Outdated
}, | ||
}, | ||
}, | ||
} | ||
'jupyterhub-cost-monitoring': if std.type(aws_account_id) != 'null' then configCostMonitoring(aws_account_id) else { enabled: false }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unconditionally enable this on AWS here by checking on provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I've checked that the jupyterhub-cost-monitoring.enabled: false
override in the support/values.yaml
works as expected and clobbers the jsonnet config.
…defined for other cloud providers
…_tags = true in cluster.yaml
a970265
to
b5406b4
Compare
Okay, requested changes made! Thanks for the review – I got too laser-focused on "how much can i generalise this with jsonnet?" and ended up with too much conditional logic like you said. Absolutely
Docs are updated in #6488 :) |
Closes #6519
Summary
aws-ce-grafana-backend
deployment into a helm dependency on a standalone repo: https://github.com/2i2c-org/jupyterhub-cost-monitoring/aws.active_cost_tags
key incluster.yaml
so we know when to deploy the resources for cost monitoring based on whether cost allocation tags are active (programatically in terraform for AWS accounts we manage, but manually on our behalf for AWS accounts where only communities have access to billing)jupyterhub-cost-monitoring
k8s service account annotation withsupport/values.jsonnet
Breaking changes
openscapes
toopenscapeshub
andnasa-ghg
tonasa-ghg-hub
for consistency with AWS cluster namesname
key in thecluster.yaml
did not match theaws.clusterName
keyaws.clusterName
value into thesupport/values.jsonnet
to configure the standalone cost monitoring applicationaws.clusterName
is immutable, we rename references in other configs for consistency~/.aws/config
and~/.aws/credentials
on their local machinestwo-eye-two-see-org-terraform-state/terraform/state/pilot-hubs
–tfstate
files need to be renamed after merging this PRjupyterhub-cost-monitoring
IAM roleaws.account
key incluster.yaml
must be populated with the 12 digit account ID