-
Notifications
You must be signed in to change notification settings - Fork 10.3k
DataProtection - Wrong activation time #33071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's very odd. To clarify the other services sharing the keyring work just fine, with the same cookies? |
Yes, the 3 services are sharing the same keyring. It works for 2 services, and not for only one service, with the same cookies. |
If it's working for two apps, then the date doesn't have any effect. Do you have any way to repo this for us? |
Thanks for contacting us. We're moving this issue to the |
Sorry, I cannot be able to share the repo. But I have prepared the ConfigureServices details from startup, and data protection blob stored key.xml for your analysis. On further analysis, we found that, in one of our service has also faced this error for a second and it got recovered automatically. The other services didn't recover. After we manual refreshed the pod, it works as expected, and it occurred in our production, and we are down for more than an hour. We scared about this bug, that it may occur anytime again. We need your valuable suggestion or workaround to resolve this. Additional note: Also, I have noted one more thing, the default rotation time is 90 days, but it is rotated irrespective of that, please refer the blob key.xml for more details.
Key.XML
|
Due to no propagation time between the created and activation time, this issue occurs. I will explain the same with the below illustration. The new key is generated from POD 1. The User A logs in and visits the first pod, and he claims the new protected cookie. Again, the same user visits using POD 2 (Old key), now the request failed with this “Cookies was not authenticated. Failure message: Unprotect ticket failed” and redirects to login. User A again logs in and redirected to POD 1, it got successful, now again the consecutive request sends to POD 2. Again, it redirects to login. It became deadlock, POD 2 become useless, and the user facing repeated login. As explained in the documentation it should have a provided an enough propagation time to get sync'd with the new key for all POD’s. |
Thanks for reporting this one @georgelivingston. I believe I am experiencing the same problem on my .NET5 applications. I have two apps storing the Data Protection keys in Azure Blob storage where the activation date of the key is >1 second after the creation date. One app is responsible for creating a cookie, which is then sent to the second app to authenticate the user there. This issue presents as the second app failing to authenticate. Similarly the keys seem to be rotated extremely frequently (not every 90 days). This would happen at least once every 24-48 hours, after which time the second app would start to fail with an authentication error. Both apps use the same code in Startup.cs:
Unlike @georgelivingston, I'm protecting the keys via Azure Key Vault. MS support seemed fairly confident it wasn't related to the Azure Key Vault side of things. That would seem to be supported by the fact that we both have the same problem despite using different protection mechanisms. MS support is reportedly looking in to this for me now. Fingers crossed they identify the issue and fix shortly as this is causing real problems on my production application. In the meantime I have set up a auto-heal rule to restart the second app when it sees a few 401 errors in quick succession. Restarting the app appears to resolve the error, presumably because it retrieves the new key at startup. |
The support team was able to identify the issue which was that multiple apps were using the same storage location to persist the Data Protection keys. Both of my apps were storing these keys in the same Azure Storage blob. I figured this to be OK due to my reading of the docs relating to the @georgelivingston I'm unsure if this is the same setup you've got, but I'd recommend checking for this same thing. Using a different location to store the data protection keys immediately resolved all my issues. |
This is hard to investigate without a repro. If someone can provide one, we'd be happy to investigate. |
Any information on how to stop this issue from occurring currently? Running a .NET 6 app in production that has run into this issue. Creation date: 2022-02-16T00:19:05.0768936Z Not much info to share on consistently being able to reproduce the issue. We are using .NET 6, Kubernetes, and Redis (Elasticache) to persist the keys. We do set the application name to use the same keys across all instances Our dev environment has seen this behavior as well (with MANY keys): program.cs configuration for data protection:
|
Thanks for contacting us. |
I am also having a similar issue. I'm using .NET 6 with blob storage for persistence and key vault for wrapping the key. I have over 100 instances of my app running in Azure App Service Linux Docker containers. I am finding that a new key is randomly being added to the blob XML file way before the current key expires. This new key has an activation date of two minutes before the creation date. The previous default key is not expired or revoked. This causes all my other instances to not work for up to 24 hours until their key ring cache expires. There is this separate issue that mentioned when key vault fails to be accessed during app start-up a new key is just created instead. I'm trying to add more logging to capture this failure as app start-up logs in a Linux docker container hosted in App Service is not super easy to get logs for. Is there a setting to always reload the keyring if there is a keyring cache miss? I see there is a two minute window during start-up, can I extend this to be indefinite? aspnetcore/src/DataProtection/DataProtection/src/KeyManagement/KeyRingProvider.cs Line 59 in c37ee91
My sample keyring XML: <key id="ddcdd943-2a4a-41fa-814d-cefec5644516" version="1">
<creationDate>2022-12-20T20:09:42.6476227Z</creationDate>
<activationDate>2022-12-22T20:07:31.3953347Z</activationDate>
<expirationDate>2023-01-19T20:09:41.8118313Z</expirationDate>
<descriptor deserializerType="Microsoft.AspNetCore.DataProtection.AuthenticatedEncryption.ConfigurationModel.AuthenticatedEncryptorDescriptorDeserializer, Microsoft.AspNetCore.DataProtection, Version=6.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60">
<descriptor>
<encryption algorithm="AES_256_CBC"/>
<validation algorithm="HMACSHA256"/>
<encryptedSecret decryptorType="Azure.Extensions.AspNetCore.DataProtection.Keys.AzureKeyVaultXmlDecryptor, Azure.Extensions.AspNetCore.DataProtection.Keys, Version=1.1.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8" xmlns="http://schemas.asp.net/2015/03/dataProtection">
<encryptedKey xmlns="">
<!-- This key is encrypted with Azure Key Vault. -->
<kid>...</kid>
<key>...</key>
<iv>...</iv>
<value>...</value>
</encryptedKey>
</encryptedSecret>
</descriptor>
</descriptor>
</key>
<key id="759405fc-01a6-492f-a06b-67ced9f05e6e" version="1">
<creationDate>2023-01-05T17:59:15.444065Z</creationDate>
<activationDate>2023-01-05T17:57:45.8103086Z</activationDate>
<expirationDate>2023-02-04T17:57:45.8103086Z</expirationDate>
<descriptor deserializerType="Microsoft.AspNetCore.DataProtection.AuthenticatedEncryption.ConfigurationModel.AuthenticatedEncryptorDescriptorDeserializer, Microsoft.AspNetCore.DataProtection, Version=6.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60">
<descriptor>
<encryption algorithm="AES_256_CBC"/>
<validation algorithm="HMACSHA256"/>
<encryptedSecret decryptorType="Azure.Extensions.AspNetCore.DataProtection.Keys.AzureKeyVaultXmlDecryptor, Azure.Extensions.AspNetCore.DataProtection.Keys, Version=1.1.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8" xmlns="http://schemas.asp.net/2015/03/dataProtection">
<encryptedKey xmlns="">
<!-- This key is encrypted with Azure Key Vault. -->
<kid>...</kid>
<key>...</key>
<iv>...</iv>
<value>...</value>
</encryptedKey>
</encryptedSecret>
</descriptor>
</descriptor>
</key>
<key id="1e930dbd-b293-4001-981d-233315ae3826" version="1">
<creationDate>2023-01-23T21:48:10.1974563Z</creationDate>
<activationDate>2023-01-23T21:46:31.3660167Z</activationDate>
<expirationDate>2023-02-22T21:46:31.3660167Z</expirationDate>
<descriptor deserializerType="Microsoft.AspNetCore.DataProtection.AuthenticatedEncryption.ConfigurationModel.AuthenticatedEncryptorDescriptorDeserializer, Microsoft.AspNetCore.DataProtection, Version=6.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60">
<descriptor>
<encryption algorithm="AES_256_CBC"/>
<validation algorithm="HMACSHA256"/>
<encryptedSecret decryptorType="Azure.Extensions.AspNetCore.DataProtection.Keys.AzureKeyVaultXmlDecryptor, Azure.Extensions.AspNetCore.DataProtection.Keys, Version=1.1.0.0, Culture=neutral, PublicKeyToken=92742159e12e44c8" xmlns="http://schemas.asp.net/2015/03/dataProtection">
<encryptedKey xmlns="">
<!-- This key is encrypted with Azure Key Vault. -->
<kid>...</kid>
<key>...</key>
<iv>...</iv>
<value>...</value>
</encryptedKey>
</encryptedSecret>
</descriptor>
</descriptor>
</key> |
This issue also occurs when the AzureBlobXmlRepository suffers from network issues. I've found the code path causing it. The |
We have also experienced similar case in our production system. From the telemetry we can conclude that:
Would that be possible to add a an option to tell what should happen in that case? Issuing a new key with immediate activation is breaking all nodes in the system. |
We have experienced this as well, three seperate times, three seperate environements, all related to transient connectivity issues in East US. Statistically the fact that these specific key operations have failed more than once when we run several million transactions through key vault monthly is staggering. This is causing us to reconsider how we are persisting and protecting keys that, by design, are meant to be self managed. |
To learn more about what this message means, what to expect next, and how this issue will be handled you can read our Triage Process document. |
Does anyone have a good mitigation strategy they are using here? Also what has been the best resolution for this when it occurs? We just had it happen in a lower environment and redeploying the application seems to have fixed the issue but it's strange because there is still a key in the ring with activation code prior to creation date which, if I'm following correctly, is what causes the issue to begin with. |
@JarrodJ83 Sorry for the late reply - this wasn't the main tracking issue for Data Protection races. There are multiple races in this area, especially prior to 9.0, and most effort to date has been focused on making these exceptions rarer (vs eliminating them). An approach we've been exploring for addressing the root cause has been to have a separate component that is responsible for all key generation and disabling key generation in app instances. If there's only one writer, there's no race. Unfortunately, the approach isn't mature enough for us to have a doc or a sample at this point, but let me know if you have questions. |
I'm going to close this issue. Please post additional feedback in #36157 so we can track it centrally and don't drop any. |
Describe the bug
Activation time is ahead of the creation time and causing (Cookies was not authenticated. Failure message: Unprotect ticket failed)
To Reproduce
Multiple service is deployed with the below code to protect the cookies. Everything works as expected. From yesterday, suddenly facing this error (Cookies was not authenticated. Failure message: Unprotect ticket failed) in one service, were the other services works as expected. On further analysis, found activate date is ahead of the creation date in the blob stored XML.
Below is the active key in the blob stored xml file.
Exceptions (if any)
Microsoft.AspNetCore.Authentication.Cookies.CookieAuthenticationHandler[7]
Cookies was not authenticated. Failure message: Unprotect ticket failed
Further technical details
The text was updated successfully, but these errors were encountered: