A controlled network bridge that enables workloads running in air-gapped VPCs (no Internet Gateway, no NAT) to validate AWS Verified Access (AVA) signed JWTs.
AVA publishes its ES384 signing public keys on an internet-reachable endpoint. Workloads inside air-gapped VPCs cannot reach that endpoint directly. This sample provides a narrowly-scoped Lambda function with internet egress (the "fetcher") and a consumer-side Interface VPC Endpoint for the Lambda service, so that the workload can invoke the fetcher over AWS PrivateLink without any traffic leaving the AWS network from the consumer side.
The fetcher accepts only a UUID-shaped kid, constructs the AVA URL server-side, fetches the PEM, validates it is an ES384 public key, and returns it. It cannot be coerced into fetching anything else.
flowchart LR
subgraph consumer["Consumer Account"]
subgraph vpc["Air-Gapped VPC · no Internet route"]
workload["Workload
EC2 / ECS / Lambda"]
vpce["Lambda
VPC Endpoint"]
end
end
subgraph producer["Producer Account"]
fetcher["Fetcher Lambda
python3.14 · arm64"]
end
ava[/"AVA Public Keys
public-keys.prod.verified-access.
{REGION}.amazonaws.com"/]
workload -- "InvokeFunction {kid}" --> vpce
vpce -- "PrivateLink" --> fetcher
fetcher -- "HTTPS GET" --> ava
Key facts:
- The consumer VPC has no IGW and no NAT. Its only egress path for this purpose is the Lambda Interface VPC Endpoint.
- The VPC endpoint policy restricts
lambda:InvokeFunctionto the fetcher's ARN. Any other Lambda ARN is denied at the endpoint. - The fetcher runs in the Lambda-managed network (not VPC-attached). It is the only component with internet reachability.
- The fetcher has no function URL, no API Gateway, no ALB, and no event source mapping. Its only invocation path is
lambda:InvokeFunction. - Cross-account trust is explicit: the fetcher's resource policy allows only the declared consumer principal ARNs.
This sample supports the commercial AWS partition (aws) only.
GovCloud (aws-us-gov), China (aws-cn), and other non-commercial partitions are out of scope.
This system is a network bridge from an air-gapped VPC to the public internet. The design ensures it cannot be abused as a generic egress primitive or arbitrary-fetch gadget. Three primary threats drive the architecture:
| Threat | Description | Mitigation |
|---|---|---|
| T1 — SSRF via caller-supplied URL | A caller attempts to supply a URL or hostname to coerce the fetcher into reaching an attacker-controlled endpoint. | The URL is constructed entirely server-side from the region (parsed from the fetcher's own ARN) and the validated kid. The caller has no URL, scheme, host, or path input. |
| T4 — Fetcher used as generic egress | A compromised or malicious principal invokes the fetcher to use it as an arbitrary internet fetch tool. | The fetcher has exactly one behavior: fetch the AVA public-keys URL constructed from a UUID-shaped kid. There is no "arbitrary URL" code path. The resource policy further restricts invoke to designated consumer principals. |
| T6 — VPCE used to invoke other Lambdas | A workload in the consumer VPC attempts to use the Lambda VPC endpoint to invoke Lambda functions other than the fetcher. | The VPC endpoint policy scopes lambda:InvokeFunction to the fetcher ARN only. Invocation of any other ARN is denied at the endpoint. |
Additional threats (path injection via crafted kid, upstream response tampering, key staleness, log leakage, cross-account misconfiguration) are addressed in the design but are secondary to the three above.
This sample ships two equivalent IaC implementations. Choose whichever matches your toolchain:
| Option | Directory | Docs |
|---|---|---|
| CDK (Python) | infra/ |
Below |
| Terraform | terraform/ |
terraform/README.md |
Both share the same handler code in src/fetcher/.
- AWS CDK v2 installed
- uv installed (Python project manager)
- Two AWS accounts (or one account playing both roles): a producer account for the fetcher and a consumer account for the air-gapped VPC workload
- AWS CLI profiles configured for each account with credentials that have sufficient permissions to deploy CDK stacks
uv syncBoth accounts must be bootstrapped in the target region before the first deployment.
# Producer account
uv run cdk bootstrap aws://<PRODUCER_ACCOUNT_ID>/<REGION> \
--profile <PRODUCER_PROFILE>
# Consumer account
uv run cdk bootstrap aws://<CONSUMER_ACCOUNT_ID>/<REGION> \
--profile <CONSUMER_PROFILE>CDK resolves the account ID from the profile's credentials.
uv run cdk deploy FetcherStack \
--profile <PRODUCER_PROFILE> \
--region <REGION> \
--context consumer_principal_arns='["arn:aws:iam::<CONSUMER_ACCOUNT_ID>:role/<CONSUMER_ROLE_NAME>"]'Capture the FetcherFunctionArn output from the deployment.
uv run cdk deploy ConsumerStack \
--profile <CONSUMER_PROFILE> \
--region <REGION> \
--context fetcher_function_arn="<FETCHER_FUNCTION_ARN_FROM_STEP_3>" \
--context vpc_id="<YOUR_VPC_ID>" \
--context vpc_cidr="<YOUR_VPC_CIDR>" \
--context subnet_ids='["<SUBNET_1>","<SUBNET_2>"]'This creates the Lambda Interface VPC Endpoint (with a policy scoped to the fetcher ARN) and a security group allowing HTTPS (443) from the VPC CIDR to the endpoint ENIs.
Important: The ConsumerStack does not create an IAM role. Your workload's existing execution role must include the following policy statement to invoke the fetcher:
{
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": "<FETCHER_FUNCTION_ARN_FROM_STEP_3>"
}The following Python snippet shows how a workload inside the air-gapped VPC invokes the fetcher and verifies an AVA JWT. This is a reference only — it is not deployed as infrastructure.
import json
import os
from collections import OrderedDict
import boto3
import jwt
FETCHER_ARN = os.environ["AVA_FETCHER_ARN"]
EXPECTED_SIGNER = os.environ["AVA_INSTANCE_ARN"]
_lambda = boto3.client("lambda")
# Process-local LRU cache: kid -> PEM string.
# Bounded to _MAX_CACHE_SIZE entries so that long-running processes do not
# accumulate stale keys indefinitely after AVA rotates its signing key.
# Override via the AVA_PEM_CACHE_SIZE environment variable if needed.
_MAX_CACHE_SIZE = int(os.environ.get("AVA_PEM_CACHE_SIZE", "10"))
_cache: OrderedDict[str, str] = OrderedDict()
def _resolve_pem(kid: str) -> str:
"""Resolve a PEM for the given kid, using a bounded local cache."""
cached = _cache.get(kid)
if cached is not None:
_cache.move_to_end(kid) # mark as recently used
return cached
resp = _lambda.invoke(
FunctionName=FETCHER_ARN,
InvocationType="RequestResponse",
Payload=json.dumps({"kid": kid}).encode(),
)
body = json.loads(resp["Payload"].read())
if "error" in body:
raise RuntimeError(f"fetcher error: {body['error']}")
pem = body["pem"]
_cache[kid] = pem
if len(_cache) > _MAX_CACHE_SIZE:
_cache.popitem(last=False)
return pem
def verify_ava_jwt(token: str) -> dict:
"""Verify an AVA-signed JWT using the fetcher for key resolution."""
# 1. Extract kid from the JWT header (no signature check yet).
header = jwt.get_unverified_header(token)
kid = header["kid"]
# 2. Resolve PEM via the fetcher (cache hit or remote invoke).
pem = _resolve_pem(kid)
# 3. Verify signature. jwt.decode enforces exp by default.
claims = jwt.decode(
token,
pem,
algorithms=["ES384"],
options={"require": ["exp", "iss"]},
)
# 4. Verify the signer claim matches the expected AVA instance ARN.
if claims.get("signer") != EXPECTED_SIGNER:
raise jwt.InvalidIssuerError("signer ARN mismatch")
return claimsKey points:
kidis extracted from the JWT header viajwt.get_unverified_header.- The cache is keyed by
kidalone. Eachkididentifies a unique signing key — the fetcher already binds to a specific region via its own ARN, so there is no ambiguity. - The cache is bounded to
_MAX_CACHE_SIZEentries using anOrderedDictas a simple LRU. This prevents unbounded memory growth in long-running processes as AVA rotates signing keys (see below). - On cache miss, the fetcher is invoked via
boto3withInvocationType="RequestResponse"and a payload of{"kid": kid}. - JWT verification uses
jwt.decodewithalgorithms=["ES384"]. - The
signerclaim is verified against theAVA_INSTANCE_ARNenvironment variable, as required by the AWS Verified Access documentation.
AVA rotates its signing keys approximately every 7 days by issuing a new kid. The old key remains valid during an overlap period so that in-flight JWTs can still be verified. Because each kid maps to a unique, immutable public key, cached entries never become incorrect — they just become unused once AVA stops signing with that kid.
For long-running processes (web servers, ECS tasks, daemons), the bounded LRU cache ensures that old entries are evicted naturally as new kid values arrive. The default cache size of 10 is generous — under normal rotation cadence there are at most 2 active kid values at any time (current + previous overlap). Set the AVA_PEM_CACHE_SIZE environment variable to override the default if your workload handles traffic from multiple AVA instances or has unusual rotation patterns.
Short-lived processes (Lambda functions, batch jobs) do not need cache pruning because the cache is discarded when the process exits.
The fetcher returns structured error responses as {"error": "<code>", "message": "<detail>"}.
| Error Code | When Returned |
|---|---|
InvalidKid |
The kid field is missing, is not a string, or does not match the UUID shape ([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}). Also returned when the payload is not a JSON object. |
ExtraFields |
The payload contains keys other than kid. |
UpstreamNotFound |
The AVA public-keys endpoint returned HTTP 404 for the requested kid. The key does not exist or has been rotated out. |
UpstreamError |
The AVA public-keys endpoint returned a non-2xx, non-404 HTTP status. Retry with exponential backoff (max 2 retries). |
UpstreamTimeout |
The HTTPS request to the AVA public-keys endpoint timed out. Retry with exponential backoff (max 2 retries). |
InvalidPem |
The upstream response did not decode as UTF-8 or did not conform to the PEM SubjectPublicKeyInfo envelope. This is alarm-worthy — the upstream endpoint may be misbehaving. Do not retry. |
InvalidKeyType |
The upstream response parsed as a valid PEM public key but the algorithm/curve is not ES384 (SECP384R1). This is alarm-worthy. Do not retry. |
Error messages never include caller-supplied input values.
See LICENSE.