|
| 1 | +[//]: # (Copyright Amazon.com Inc. or its affiliates. All Rights Reserved.) |
| 2 | +[//]: # (SPDX-License-Identifier: CC-BY-SA-4.0) |
| 3 | + |
| 4 | +# The AWS Encryption SDK (ESDK) and the keyring trace |
| 5 | + |
| 6 | +## Background |
| 7 | + |
| 8 | +When we designed keyrings, |
| 9 | +we added a concept of a "keyring trace" |
| 10 | +that the keyring uses to communicate |
| 11 | +what actions it took. |
| 12 | +This is an evolution of earlier indicators |
| 13 | +in the decryption API that indicated which master key |
| 14 | +decrypted the data key. |
| 15 | +In both cases, we exposed the data to the caller |
| 16 | +but did not include any guidance on what they should do with it, |
| 17 | +how to interact with it, |
| 18 | +or why it is important. |
| 19 | +This is similar to how we treat encryption context |
| 20 | +in the encryption and decryption API results. |
| 21 | + |
| 22 | +## Goals |
| 23 | + |
| 24 | +Our goal is to determine how, or if, |
| 25 | +we should expose the keyring trace. |
| 26 | + |
| 27 | +## Success Measurements |
| 28 | + |
| 29 | +We will know we are succeeding if we can assemble |
| 30 | +multiple known customer problems that we think keyring traces solve |
| 31 | +and present examples that address each problem |
| 32 | +that _either_ demonstrate why keyring traces are needed |
| 33 | +and how they solve those problems |
| 34 | +_or_ demonstrate why keyring traces are not needed. |
| 35 | + |
| 36 | +## Out of Scope |
| 37 | + |
| 38 | +Anything that requires us to add API surface area, |
| 39 | +whether that is modifying existing APIs or interfaces, |
| 40 | +must be treated as new features. |
| 41 | +All new features must be |
| 42 | +reviewed through the specification modification process. |
| 43 | + |
| 44 | +## Issues and Alternatives |
| 45 | + |
| 46 | +As they exist today, |
| 47 | +keyring traces are not very usable, |
| 48 | +but more importantly |
| 49 | +we never explain or show why they should be used. |
| 50 | + |
| 51 | +Each following issue is dependent on |
| 52 | +answering the previous issue. |
| 53 | + |
| 54 | +_Preferred options are in italics._ |
| 55 | + |
| 56 | +**New feature requirements are in bold.** |
| 57 | + |
| 58 | +### Issue 0: Why should callers interact with the keyring trace? |
| 59 | + |
| 60 | +If we cannot define a clear purpose for the keyring trace |
| 61 | +that is not already met by other ESDK framework components, |
| 62 | +we should not expose it to callers. |
| 63 | +This needs to include not only |
| 64 | +an explanation of what problems the keyring trace solves, |
| 65 | +but also guidance on how to use the keyring trace |
| 66 | +to solve those problems |
| 67 | +and where in the framework those problems should be solved. |
| 68 | + |
| 69 | +- _Option: They shouldn't._ |
| 70 | + |
| 71 | + - If we cannot come up with any problems |
| 72 | + that the keyring trace solves in its current state, |
| 73 | + then we should not expose it to customers in any way |
| 74 | + and we should not mention it in any documentation or examples. |
| 75 | + It should remain an implementation detail |
| 76 | + until or unless we find a use for it. |
| 77 | + |
| 78 | +- Option: Asynchronous audit log. |
| 79 | + |
| 80 | + - Writing the keyring trace to an audit log |
| 81 | + would give customers useful metrics on |
| 82 | + how they are using the ESDK |
| 83 | + throughout their systems. |
| 84 | + |
| 85 | + - counter: This just moves the question of "why" down the road. |
| 86 | + |
| 87 | +- Option: Data protection controls. |
| 88 | + |
| 89 | + - Not all keyrings provide the same protections. |
| 90 | + One use of the keyring trace could be to |
| 91 | + validate that certain protections were applied to |
| 92 | + the encrypted data keys in an encrypted message. |
| 93 | + |
| 94 | + - ex: Require that all keyrings that encrypted the data key |
| 95 | + also signed the encryption context. |
| 96 | + |
| 97 | + - alternative: Inspect keyrings before use |
| 98 | + to check that they meet your requirements. |
| 99 | + |
| 100 | +- Option: Live usage audit. |
| 101 | + |
| 102 | + - Because keyring behaviors can get complex, |
| 103 | + a live audit of keyring actions |
| 104 | + could be useful to enforce wrapping key requirements. |
| 105 | + |
| 106 | + - ex: Allow only AWS KMS wrapping keys within a specific account |
| 107 | + on decryption. |
| 108 | + |
| 109 | + - alternative: Make a keyring that filters out undesirable EDKs. |
| 110 | + |
| 111 | + - If a customer accepts encrypted messages from unverified sources, |
| 112 | + they might want to not trust encrypted messages |
| 113 | + that contain EDKs for unknown wrapping keys |
| 114 | + and use unsigned algorithm suites. |
| 115 | + |
| 116 | + - alternative: Make a CMM that checks these requirements |
| 117 | + before attempting to decrypt any EDKs. |
| 118 | + |
| 119 | +- Option: Notification of failures and no-ops on decryption. |
| 120 | + |
| 121 | + - **Requires adding a new keyring trace action flag.** |
| 122 | + - Because CMMs and keyrings can be deeply nested |
| 123 | + and keyrings do not halt decryption |
| 124 | + if they encounter an error on decrypt, |
| 125 | + it can be difficult to determine |
| 126 | + why a decryption request failed. |
| 127 | + Requiring keyrings to add keyring trace entries |
| 128 | + that describe no-op and failure events |
| 129 | + would help a caller determine |
| 130 | + why no EDKs could be decrypted. |
| 131 | + |
| 132 | +### Issue 1: What should callers read from the keyring trace? |
| 133 | + |
| 134 | +The keyring trace is defined as a list of entries, |
| 135 | +each entry composed of |
| 136 | +one or more action flags that describe what a keyring did, |
| 137 | +as well as information that identifies the keyring that performed those actions. |
| 138 | + |
| 139 | +1. _Option: Both the action flag and the keyring identifier._ |
| 140 | + |
| 141 | + - If both the action taken and the keyring that took it are important, |
| 142 | + the caller MUST be able to connect a trace entry |
| 143 | + to a keyring instance. |
| 144 | + |
| 145 | +1. Option: Nothing. |
| 146 | + |
| 147 | + - If the keyring trace is intended solely for asynchronous audit, |
| 148 | + the caller should not interact with it at runtime. |
| 149 | + |
| 150 | +1. Option: Only the action flag values. |
| 151 | + |
| 152 | + - If the primary value is in the action taken |
| 153 | + rather than the keyring that took that action, |
| 154 | + the caller should not attempt to connect a trace entry |
| 155 | + to a keyring instance |
| 156 | + or to an EDK. |
| 157 | + |
| 158 | +1. Option: Only the keyring identifier. |
| 159 | + |
| 160 | + - Included for completeness. |
| 161 | + If the only thing that is important is which keyrings took any action, |
| 162 | + the keyring trace is already overly complicated. |
| 163 | + |
| 164 | +### Issue 2: How should callers interact with the keyring trace? |
| 165 | + |
| 166 | +More than one of these options might be necessary, |
| 167 | +depending on the answer to **Issue 1**. |
| 168 | + |
| 169 | +1. Option: Given an action flag, find all entries containing that flag. |
| 170 | + |
| 171 | + - This is straightforward and already possible |
| 172 | + with the current structure of the keyring trace entries. |
| 173 | + |
| 174 | +1. Option: Given a keyring, find all entries created by that keyring. |
| 175 | + |
| 176 | + - **This will likely require an addition to the keyring interface.** |
| 177 | + - Because keyrings can have more than one key namespace and key name, |
| 178 | + connecting a keyring to one or more trace entries can be difficult. |
| 179 | + |
| 180 | +### Issue 3: Where and when should callers interact with the keyring trace? |
| 181 | + |
| 182 | +1. _Option: Within cryptographic materials managers (CMMs)._ |
| 183 | + |
| 184 | + - All request and message values can be accessed at this level. |
| 185 | + - This should be sufficient for enforcing requirements |
| 186 | + either statically or based on the request or message metadata. |
| 187 | + |
| 188 | + - ex: A CMM that requires that |
| 189 | + all keyrings that encrypted the data key |
| 190 | + also signed the encryption context. |
| 191 | + - ex: A CMM that requires that an escrow keyring |
| 192 | + encrypted the data key for any messages |
| 193 | + whose encryption context contains a specific value. |
| 194 | + - ex: A CMM that writes the keyring trace to an audit log. |
| 195 | + |
| 196 | +1. _Option: Within keyrings._ |
| 197 | + |
| 198 | + - Not all request and message values can be accessed at this level. |
| 199 | + - This should be sufficient for keyrings that might choose |
| 200 | + to take (or not take) certain actions based on |
| 201 | + previous actions. |
| 202 | + |
| 203 | + - ex: A multi-keyring that keeps trying child keyrings |
| 204 | + until at least one keyring has |
| 205 | + verified the encryption context. |
| 206 | + |
| 207 | +1. Option: Outside of the ESDK. |
| 208 | + |
| 209 | + - **Requires adding output values to the API signatures.** |
| 210 | + - The keyring trace must be returned from the top-level APIs. |
| 211 | + - This should only be necessary if the requirements |
| 212 | + that we expect customers to want to enforce |
| 213 | + vary across messages |
| 214 | + or depend on details outside of |
| 215 | + the message and request metadata. |
| 216 | + |
| 217 | +1. Option: Within the ESDK client. |
| 218 | + |
| 219 | + - **Requires adding input values to the API signatures.** |
| 220 | + - **Requires adding a new conceptual feature.** |
| 221 | + - The caller providers per-request keyring trace checking requirements |
| 222 | + that the ESDK client performs after calling the CMM. |
| 223 | + |
| 224 | + - This is conceptually similar to previous ideas about |
| 225 | + how to give customers a way to check the encryption context |
| 226 | + before decrypting an encrypted message. |
| 227 | + - This should only be necessary if the requirements |
| 228 | + that we expect customers to want to enforce |
| 229 | + vary across messages |
| 230 | + or depend on details outside of |
| 231 | + the message and request metadata. |
| 232 | + |
| 233 | +### Issue 4: Which actions flags should a keyring trace entry allow? |
| 234 | + |
| 235 | +1. Option: Successful actions. |
| 236 | + |
| 237 | + - Any action that a keyring completes successfully. |
| 238 | + - This is what happens today for: |
| 239 | + - generate data key |
| 240 | + - encrypt data key |
| 241 | + - sign encryption context |
| 242 | + - decrypt data key |
| 243 | + - verify encryption context |
| 244 | + |
| 245 | +1. Option: Failure. |
| 246 | + |
| 247 | + - **Requires adding a new keyring trace action flag.** |
| 248 | + - Any action that a keyring attempted but failed to complete. |
| 249 | + - This is useful for debugging why an encrypt or decrypt request failed. |
| 250 | + |
| 251 | +1. Option: No-op. |
| 252 | + |
| 253 | + - **Requires adding a new keyring trace action flag.** |
| 254 | + - If a keyring chooses to do nothing. |
| 255 | + - This is useful for debugging why an encrypt or decrypt request failed. |
| 256 | + |
| 257 | +## One-Way Doors |
| 258 | + |
| 259 | +Any change that would add API surface area is a one-way door. |
| 260 | +Any such changes must be treated as new features |
| 261 | +and handled through the specification modification process. |
| 262 | + |
| 263 | +1. Adding functionality to the keyring interface. (**Issue 2**) |
| 264 | +1. Returning the keyring trace from the ESDK APIs. (**Issue 3**) |
| 265 | +1. Adding a "message requirements" system to the ESDK APIs. (**Issue 3**) |
| 266 | +1. Adding new keyring trace action flags. (**Issue 4**) |
| 267 | + |
| 268 | +## Impact |
| 269 | + |
| 270 | +1. All pending and future ESDK releases are blocked by these issues. |
| 271 | +1. Each of the one-way doors also represents a new feature |
| 272 | + that must be reviewed through the specification modification process. |
| 273 | + This will impact all projected ESDK development and release targets. |
| 274 | + |
| 275 | +## Open Questions |
| 276 | + |
| 277 | +- Is it important to be able to tie |
| 278 | + a successful keyring trace entry to an EDK? |
| 279 | +- Is the order of entries in the keyring trace important? If so, what order? |
| 280 | + |
| 281 | + - Absolute order? |
| 282 | + - Relative order? |
| 283 | + - State of materials beforehand? |
| 284 | + - What about concurrent actions? (ex: parallel multi-keyring) |
| 285 | + |
| 286 | +- "[..] the requirements |
| 287 | + that we expect customers to want to enforce |
| 288 | + vary across messages |
| 289 | + or depend on details outside of |
| 290 | + the message and request metadata." |
| 291 | + |
| 292 | + - Do these requirements exist |
| 293 | + and are they requirements that |
| 294 | + the ESDK should support solving? |
0 commit comments