Skip to content

Commit 90184f0

Browse files
committed
[RFC-0010] Add workload identity support for remote generic clusters
Signed-off-by: Matheus Pimenta <[email protected]>
1 parent 315dad8 commit 90184f0

File tree

1 file changed

+110
-113
lines changed
  • rfcs/0010-multi-tenant-workload-identity

1 file changed

+110
-113
lines changed

rfcs/0010-multi-tenant-workload-identity/README.md

Lines changed: 110 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -641,36 +641,56 @@ itself while decrypting the secrets, so we don't need to introduce
641641

642642
The `Kustomization` and `HelmRelease` APIs have the field
643643
`spec.kubeConfig.secretRef` for specifying a Kubernetes `Secret` containing
644-
a static kubeconfig file for accessing a remote Kubernetes cluster. We
645-
propose adding the following new fields, mutually exclusive with
646-
`spec.kubeConfig.secretRef`, for supporting workload identity
647-
for managed Kubernetes services from the cloud providers:
648-
- `spec.kubeConfig.provider`: the cloud provider to use for obtaining
649-
the access token for the remote cluster, one of `aws`, `azure` or `gcp`.
650-
- `spec.kubeConfig.cluster`: the fully qualified name of the remote
651-
cluster resource in the respective cloud provider. This would be used
652-
to get the cluster CA certificate and the cluster API server address.
653-
- `spec.kubeConfig.address`: the optional address of the remote cluster
654-
API server. Some cloud providers may have a list of addresses for the
655-
remote cluster API server, so this field can be used to specify one
656-
of them. If not specified, the controller would use the first address
657-
in the list.
658-
- `spec.kubeConfig.serviceAccountName`: the optional Kubernetes
659-
`ServiceAccount` to use for obtaining the access token for the
660-
remote cluster, implementing object-level workload identity.
661-
662-
For remote cluster access, the configured cloud identity, be it controller-level
644+
a static kubeconfig for accessing a remote Kubernetes cluster. We propose
645+
adding `spec.kubeConfig.configMapRef` for specifying a Kubernetes `ConfigMap`
646+
that is mutually exclusive with `spec.kubeConfig.secretRef` for supporting
647+
workload identity for both managed Kubernetes services from the cloud
648+
providers and also a `generic` provider. The fields in the `ConfigMap`
649+
would be the following:
650+
- `data.provider`: The provider to use for obtaining the temporary
651+
`*rest.Config` for the remote cluster. One of `generic`, `aws`, `azure`
652+
or `gcp`. Required.
653+
- `data.cluster`: Used only by `aws`, `azure` and `gcp`. The fully qualified
654+
name of the cluster resource in the respective cloud provider API. Needed
655+
for obtaining the unspecified fields `data.address` and `data["ca.crt"]`
656+
(not required if both are specified).
657+
- `data.address`: The HTTPS address of the API server of the remote cluster.
658+
Required for `generic`, optional for `aws`, `azure` and `gcp`.
659+
- `data.serviceAccountName`: The optional Kubernetes `ServiceAccount` to use
660+
for obtaining access to the remote cluster, implementing object-level
661+
workload identity. If not specified, the controller identity will be used.
662+
- `data.audiences`: The audiences Kubernetes `ServiceAccount` tokens must
663+
be issued for as a list of strings in YAML format. Optional. Defaults to
664+
`data.address` for `generic`, and has hardcoded default/specific values for
665+
`aws`, `azure` and `gcp` depending on the provider.
666+
- `data["ca.crt"]`: The optional PEM-encoded CA certificate of the remote
667+
cluster.
668+
669+
For remote cluster access, the configured identity, be it controller-level
663670
or object-level, must have the necessary permissions to:
664671
- Access the cluster resource in the cloud provider API to get the
665-
cluster CA certificate and the cluster API server address (or list of
666-
addresses).
667-
- Apply resources in the remote cluster using the Kubernetes API, i.e.
668-
the required Kubernetes RBAC permissions must be granted to the
669-
cloud identity in the remote cluster.
670-
- When used with `spec.serviceAccountName`, the cloud identity must
671-
have the necessary Kubernetes RBAC permissions to impersonate this
672-
`ServiceAccount` in the remote cluster (related
673-
[bug](https://github.com/fluxcd/pkg/issues/959)).
672+
cluster CA certificate and the cluster API server address. This is
673+
only necessary if one of `data.address` or `data["ca.crt"]` is not
674+
specified in the `ConfigMap`. In other words, at least two of the
675+
three fields `data.address`, `data["ca.crt"]` and `data.cluster`
676+
must be specified. If both `data.address` and `data["ca.crt"]`
677+
are specified, then the `data.cluster` field *must not* be specified,
678+
the controller will error out if it is. If only `data.cluster` and
679+
`data.address` are specified, then `data.address` has to match at
680+
least one of the addresses of the cluster resource in the cloud
681+
provider API. If only `data.cluster` and `data["ca.crt"]` are
682+
specified, then the first address of the cluster resource in the
683+
cloud provider API will be used as the address of the remote cluster
684+
and the CA returned by the cloud provider API will be ignored.
685+
If only `data.cluster` is specified, then the first address
686+
of the cluster resource in the cloud provider API will be used.
687+
- The relevant permissions for applying and managing the target resources
688+
in the remote cluster. For cloud providers this means either Kubernetes
689+
RBAC or the cloud provider API permissions, as managed Kubernetes services
690+
support authorizing requests through both ways.
691+
- When used with `spec.serviceAccountName`, the authenticated identity must
692+
have the necessary permissions to impersonate this `ServiceAccount` in the
693+
remote cluster (related [bug](https://github.com/fluxcd/pkg/issues/959)).
674694

675695
To enable using the new `serviceAccountName` fields, we propose introducing
676696
a feature gate called `ObjectLevelWorkloadIdentity` in the controllers that
@@ -887,16 +907,19 @@ package auth
887907
// Options contains options for configuring the behavior of the provider methods.
888908
// Not all providers/methods support all options.
889909
type Options struct {
890-
Client client.Client
891-
Cache *cache.TokenCache
892-
ServiceAccount *client.ObjectKey
893-
InvolvedObject cache.InvolvedObject
894-
Scopes []string
895-
STSRegion string
896-
STSEndpoint string
897-
ProxyURL *url.URL
898-
ClusterAddress string
899-
AllowShellOut bool
910+
Client client.Client
911+
Cache *cache.TokenCache
912+
ServiceAccount *client.ObjectKey
913+
InvolvedObject cache.InvolvedObject
914+
Audiences []string
915+
Scopes []string
916+
STSRegion string
917+
STSEndpoint string
918+
ProxyURL *url.URL
919+
ClusterResource string
920+
ClusterAddress string
921+
CAData string
922+
AllowShellOut bool
900923
}
901924
902925
// WithServiceAccount sets the ServiceAccount reference for the token
@@ -911,25 +934,61 @@ func WithCache(cache cache.TokenCache, involvedObject cache.InvolvedObject) Opti
911934
// ...
912935
}
913936
937+
// WithAudiences sets the audiences for the Kubernetes ServiceAccount token.
938+
func WithAudiences(audiences ...string) Option {
939+
// ...
940+
}
941+
914942
// WithScopes sets the scopes for the token.
915943
func WithScopes(scopes ...string) Option {
916944
// ...
917945
}
918946
919-
// WithSTSEndpoint sets the endpoint for the STS service.
920-
func WithSTSEndpoint(stsEndpoint string) Option {
947+
// WithSTSRegion sets the region for the STS service (some cloud providers
948+
// require a region, e.g. AWS).
949+
func WithSTSRegion(stsRegion string) Option {
921950
// ...
922951
}
923952
924-
// WithSTSRegion sets the region for the STS service.
925-
func WithSTSRegion(stsRegion string) Option {
953+
// WithSTSEndpoint sets the endpoint for the STS service.
954+
func WithSTSEndpoint(stsEndpoint string) Option {
926955
// ...
927956
}
928957
929958
// WithProxyURL sets a *url.URL for an HTTP/S proxy for acquiring the token.
930959
func WithProxyURL(proxyURL url.URL) Option {
931960
// ...
932961
}
962+
963+
// WithCAData sets the CA data for credentials that require a CA,
964+
// e.g. for Kubernetes REST config.
965+
func WithCAData(caData string) Option {
966+
// ...
967+
}
968+
969+
// WithClusterResource sets the cluster resource for creating a REST config.
970+
// Must be the fully qualified name of the cluster resource in the cloud
971+
// provider API.
972+
func WithClusterResource(clusterResource string) Option {
973+
// ...
974+
}
975+
976+
// WithClusterAddress sets the cluster address for creating a REST config.
977+
// This address is used to select the correct cluster endpoint and CA data
978+
// when the provider has a list of endpoints to choose from, or to simply
979+
// validate the address against the cluster resource when the provider
980+
// returns a single endpoint. This is optional, providers returning a list
981+
// of endpoints will select the first one if no address is provided.
982+
func WithClusterAddress(clusterAddress string) Option {
983+
// ...
984+
}
985+
986+
// WithAllowShellOut allows the provider to shell out to binary tools
987+
// for acquiring controller tokens. MUST be used only by the Flux CLI,
988+
// i.e. in the github.com/fluxcd/flux2 Git repository.
989+
func WithAllowShellOut() Option {
990+
// ...
991+
}
933992
```
934993

935994
The `auth/aws/aws.go`, `auth/azure/azure.go` and
@@ -1098,75 +1157,9 @@ metadata:
10981157

10991158
#### Cache Key
11001159

1101-
The cache key must include the following components:
1102-
1103-
* The cloud provider name.
1104-
* The optional `ServiceAccount` reference and cloud provider identity.
1105-
The identity is the string representing the identity which the `ServiceAccount`
1106-
is impersonating, e.g. for `gcp` this would be a GCP IAM Service Account email,
1107-
for `aws` this would be an AWS IAM Role ARN, etc. When there is no identity
1108-
configured for impersonation, only the `ServiceAccount` reference is included.
1109-
* The optional scopes added to the token.
1110-
* The optional STS region used for issuing the token.
1111-
* The optional STS endpoint used for issuing the token.
1112-
* The optional proxy URL when the STS endpoint is present.
1113-
* The cache key extracted from the optional artifact repository.
1114-
* The cluster resource name and address if specified.
1115-
1116-
##### Justification
1117-
1118-
When single-tenant workload identity is being used, the identity associated with
1119-
the controller is the one represented by the token, so there is no identity or
1120-
`ServiceAccount` to identify in the cache key besides the implicit ones associated
1121-
with the controller. In this case, including only the cloud provider name in the
1122-
cache key is enough.
1123-
1124-
In multi-tenant workload identity, the reason for including both the `ServiceAccount`
1125-
and the identity in the cache key is to establish the fact that the `ServiceAccount`
1126-
had permission to impersonate the identity at the time when the token was issued.
1127-
This is very important. For the sake of the argument, suppose we include only the
1128-
identity. Then a malicious actor could specify any identity in their `ServiceAccount`
1129-
and get a token cached for that identity even if their `ServiceAccount` did not have
1130-
permission to impersonate that identity. We also need to include the identity in the
1131-
cache key because, otherwise, if including only the `ServiceAccount`, changes to the
1132-
`ServiceAccount` annotations to impersonate a different identity would not cause a
1133-
new token impersonating the new identity to be created since the cache key did not
1134-
change.
1135-
1136-
The scopes are included in the cache key because they delimit the permissions that
1137-
the token has. They don't *grant* the permissions, they just set an upper bound for
1138-
the permissions that the token can have. Providers requiring scopes unfortunately
1139-
benefit less from caching, e.g. a token issued for an Azure identity can't be
1140-
seamlessly used for both Azure DevOps and the Azure Container Registry, because the
1141-
respective scopes are different, so the issued tokens are different.
1142-
1143-
The STS region is included in the cache key because it could influence how the
1144-
token is fetched and ultimately issued. For example, in AWS the STS endpoint is
1145-
constructed using the region, so if the region is different, the endpoint is
1146-
different, and hence the cache key must be different as well.
1147-
1148-
The STS endpoint and proxy URL are included in the cache key because they could
1149-
influence how the token is fetched and ultimately issued. The proxy URL is included
1150-
only when the STS endpoint is present, because all the default STS endpoints are
1151-
HTTPS and belong to cloud providers, so they are all well-known, unique, and the
1152-
proxy is guaranteed not to tamper with the issuance of the token since it only
1153-
sees an opaque TLS session passing through.
1154-
1155-
In most cases container registry credentials require an additional token exchange
1156-
at the end. In order to benefit from caching the final token and freeing the
1157-
library consumers from this responsibility, we allow an image repository to
1158-
be included in the options and implement the exchange. Depending on the cloud
1159-
provider, a part of the image repository string is extracted and used to issue
1160-
the token, e.g. for ECR the region is extracted and used to configure the client,
1161-
and in the case of ACR the registry host is included in the resulting token.
1162-
Those parts of the image repository must be included in the cache key. This is
1163-
accomplished by the `Provider.ParseArtifactRepository()` method. In the case of GCP
1164-
container registries the image repository does not influence how the token is
1165-
issued.
1166-
1167-
The cluster resource name and address are included in the cache key because
1168-
they necessarily influence how the credentials are built and stored in the
1169-
cache.
1160+
The cache key *MUST* include *ALL* the inputs specified for acquiring the
1161+
temporary credentials, as they all obviously influence how the credentials
1162+
are created.
11701163

11711164
##### Format
11721165

@@ -1180,20 +1173,22 @@ scopes=<comma-separated-scopes>
11801173
stsRegion=<sts-region>
11811174
stsEndpoint=<sts-endpoint>
11821175
proxyURL=<proxy-url>
1176+
caData=<ca-data>
11831177
```
11841178

11851179
Multi-tenant/object-level access token cache key:
11861180

11871181
```
11881182
provider=<cloud-provider-name>
1189-
providerAudience=<cloud-provider-audience>
11901183
providerIdentity=<cloud-provider-identity>
11911184
serviceAccountName=<service-account-name>
11921185
serviceAccountNamespace=<service-account-namespace>
1186+
serviceAccountTokenAudiences=<comma-separated-audiences>
11931187
scopes=<comma-separated-scopes>
11941188
stsRegion=<sts-region>
11951189
stsEndpoint=<sts-endpoint>
11961190
proxyURL=<proxy-url>
1191+
caData=<ca-data>
11971192
```
11981193

11991194
Artifact registry credentials:
@@ -1206,7 +1201,9 @@ artifactRepositoryCacheKey=<'gcp'-for-gcp|registry-region-for-aws|registry-host-
12061201
REST config:
12071202

12081203
```
1209-
accessTokenCacheKey=sha256(<access-token-cache-key>)
1204+
accessToken1CacheKey=sha256(<cache-key-for-access-token-1>)
1205+
...
1206+
accessTokenNCacheKey=sha256(<cache-key-for-access-token-N>)
12101207
cluster=<cluster-resource-name>
12111208
address=<cluster-api-server-address>
12121209
```

0 commit comments

Comments
 (0)