Skip to content

Add data-protected browser storage library #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 12, 2019

Conversation

SteveSandersonMS
Copy link
Member

This comes out of work on dotnet/aspnetcore#11793

Much of our guidance for handling transient user state revolves around storing it in localStorage/sessionStorage. There are existing libraries for using these browser-based stores from Blazor apps, but as far as I'm aware, none of them integrate with data protection.

This PR adds a simple package containing services ProtectedLocalStorage and ProtectedSessionStorage. These transparently perform data protection for the stored data, which deals with one of the major caveats about browser storage that developers might not anticipate: users can both see and modify the data stored locally. Preventing tampering/disclosure is important for server-side Blazor in general, as developers may assume that any stored data can be trusted to have been set by their own application code.

One benefit of the approach used here is that "purpose" strings are automatically decided based on the storage location. This fixes another caveat that developers might not anticipate: that a user might manually copy the value stored in one slot to a different slot.

Overall, this is in AspLabs because we want a reasonably well verified implementation that we can reference from guidance materials, but we're not anywhere near nailing down any design permanently enough for a shipped and supported library. For example, maybe we would want to let developers pass further "purpose" strings of their own in case they want to control whether or not values are scoped to the current user.

@blowdart
Copy link
Member

@GrabYourPitchforks as well


namespace Microsoft.AspNetCore.ProtectedBrowserStorage
{
// Open design question: should the data protection purpose be scoped to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We link CSRF tokens to users for reasons, but yes, when users transition it breaks the crap out of things and has been a cause of pain. Maybe an overload to allow some items to be user locked?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've added an overload that lets developers take control of the purpose explicitly if they want to. This allows for scenarios like "lock to user ID" for developers who know they want that.

Later we might still want to change the default behavior (when no purpose is specified) to automatically lock to user ID, but we would need to allow for "anonymous -> logged in" transitions, such as described in option (2) here. I don't want to expand the scope of this PR to include that before we've even had any user feedback.

There is no reason to hold back on things like breaking changes; this is asplabs.

// or via malicious actions from user B). However it's not wise to just embed
// User.Identity.Name into the purpose string, because there are cases where the
// value has to be retained across a login flow, such as in the Blazing Pizzas
// case where a logged-out user builds an order then logs in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that we're offering users on public computers some degree of protection for any private information that's stored in local storage?

It makes sense to me, I remember from our discussion before we weren't sure if the data protection bit would be required or not, but when I think about the likeliness of someone using local storage haphazardly I start to believe it's important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance CSRF uses the users unique ID claim (with fallback to hashing all claims), to try and uniquify users - but we don't use that in the cookie name, we use it in the form value.

So I guess the thing about this strategy for CSRF is that we don't accumulate lots of cookies by varying the name. We could apply a similar strategy here by storing a token related to the user's identity in storage, and validating it on reads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that we're offering users on public computers some degree of protection for any private information that's stored in local storage?

That's not what I was thinking, but it's interesting to consider if we could do something to help with that. Here are some possible designs:

  1. We overload SetAsync and GetAsync to allow some extra flag for "scope to user", or even just tell developers to pass a custom purpose parameter that contains the user ID. Then it becomes impossible to reuse the data across separate user IDs.
    • Pro: easy to understand
    • Con: no way to migrate stored data to a new user ID, e.g., in the common case where a logged-out user builds up some state then logs in
  2. Same as (1) above, but when we read the data back, we make two attempts to unprotect it. First with the logged-in user's ID in the purpose; second with "anonymous" in the purpose. Now it's possible to migrate state from "logged out user" to "logged in user", but not in the other direction or between two different logged-in users.
  3. When we store data, we don't just store protect(plaintext), but instead we store [protect(plaintext), hash(plaintext | userid | somesecret)] or whatever our crypto-people say is the right way of combining hashing and encryption. Now the server can read back and unprotect data and independently check whether it was created for any given user ID.
    • Pro: Now there can be an overload of GetAsync that takes a "validate user ID" flag, and developers can choose whether or not to use that for any given read

Overall if the "anonymous -> logged in" transition is the only case where we'd want to allow preservation of state, then option 2 seems good. Developers might still want to have the choice to clearing all stored data at the point where a user logs in, in case an earlier anonymous user left behind some state that the logged-in user wouldn't want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But overall, I think this level of further functionality should be well out of scope for this PR. We can continue to develop our ideas here as time goes on. That's the whole idea of it being in asplabs.

// for a given key versus no value being stored for that key. However we should
// still data-protect the 'null' value so it's indistinguishable to the end
// user from non-null values.
if (protectedJson == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually feel like these details are more trouble than they're worth 😕 - people that are in love with complexity tend to disagree with me on this 😆

Do you have a scenario in mind where "present but null" is really important?

Do we need a way to clear data if we're storing "encrypted null"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you also have delete 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I don't have a clear scenario for the null/undefined distinction. It's just that the distinction does exist in the underlying stores in JavaScript, and in IDictionary in .NET.

If we unilaterally decide that both of these are wrong and the distinction shouldn't exist, sooner or later people will say they rely on that distinction for some reason.

// IDataProtect isn't disposable, so we're fine holding these indefinitely.
// Only a bounded number of them will be created, as the 'key' values should
// come from a bounded set known at compile-time. There's no use case for
// letting runtime data determine the 'key' values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem obvious given the API that's defined here. SetAsync(string key). If the key had to be passed in the constructor, I would agree with you.

Is it really important that the data protection purpose vary by key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of varying by key is for scenarios like: you are storing accountBalanceInUSD in one slot, and lastLoginTimestampTicks in a different slot. If a user was able to take the value from the second slot and copy it into the first slot, that could be an issue :)

Copy link
Member Author

@SteveSandersonMS SteveSandersonMS Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, depending on user feedback, we might end up also adding an overload like SetAsync(string purpose, string key, object value) in which the purpose does not vary by key (but just varies by the specified purpose). This gives the developer total control, for example if they want to be able to have categories of keys that map to a single purpose, such as if those keys themselves come from runtime data.

Then SetAsync(key, value) would be a shorthand for "derive a purpose from the key", because this is likely the safe choice when you don't have any reason not to vary by key.

To improve things further, we might want to do something like change how the IDataProtector instances are cached so it's not unlimited in size, but rather is something like a LRU cache with max 1000 entries or so. This protects against unlimited growth without changing anything for normal apps that have a small bounded number of keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've added overloads of GetAsync/SetAsync that accept a custom purpose value. This allows for more advanced scenarios like "lock to user ID" and "share purposes across all these dynamic key values".

{
public List<string> ProtectorsCreated { get; } = new List<string>();


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this to my attention. It has been noted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

James always helps people when they have an extra blank line. He's the expert on this (among other things).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Thanks for adding value, everyone!

Copy link
Member

@rynowak rynowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples look great!

else
{
endpoints.MapBlazorHub<App>("app");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any harm in just using this one all of the time (L56) it supports both prerendered and not right?

Copy link
Member Author

@SteveSandersonMS SteveSandersonMS Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish that were true, but unfortunately it isn't (cc @javiercn).

If you combine both explicit component registration (like L56) and prerendering, then you end up with two copies of the component trying to render themselves to the same location, which gives an error and kills the circuit.

This makes it very difficult to have a site where you prerender for some URLs but not for all URLs. You would have to have two different circuit hub endpoints, each configured differently (one like L52; the other like L56). That difficulty is why I ended up putting in the static flag on L11 to toggle EnablePrerendering.

I guess the fundamental difficulty is: how should the circuit know whether a certain prerendered component should be the "same" instance as an explicitly-rendered one? It can't figure it out based on CSS selectors, because prerendering doesn't know where in the document its output will be put. We could let pass some kind of optional identifier (e.g., an otherwise-pointless CSS selector to Html.RenderComponentAsync), but that's hard to discover and still confusing.

If either of you ( @rynowak / @javiercn ) thinks this is problematic enough that we need to solve it, then we should talk about it soon as it may be nontrivial and would very likely be a breaking change in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But obviously it's a totally separate matter from this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are talking about a non-prerendered component creating a new and separate circuit, I already fix that as part of the circuit id tweaks PR. I'm going to remove the cookie pieces from it and I plan to take it forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javiercn Does your change mean that it's now safe to use endpoints.MapBlazorHub<App>("app"); even if you're also prerendering that component? If so, how does the system figure out that the prerendered instance is the same one that's being registered manually?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, it means that if you have component A and component B, A prerendered and B not, they will play nicely. If you have two A components and one is prerendered inside a "comp" tag and you choose to map "A" (non-prerendered) to the "comp" tag, in that case you are "hurting yourself.

In the sense that you are trying to render a component within an already prerendered component. I'm not sure what we can do there, other than tell you not to do that.

We could do a few things, with the way we do things now, we reconnect prerendered components before asking to render the additional ones.

  • We could ask the browser for all the selectors to check which ones don't have a component inside, and if there is at least one, then render the component, otherwise skip it as we consider it was prerendered. There are 3 cases:

    • Prerendered components only. We don't have selectors so we don't do anything.
    • Non-prerendered components only, we won't have a circuit so we simply render the components as normal.
    • Prerendered + non prerendered, we first reconnect prerendered components. When the client calls StartCircuit, the server can do a JS interop call to ask the client which selectors don't have a component inside (prerendered) then for those that don't, the server renders the component and ask the client to attach the component.
  • We could change the way start circuit works to NOOP when there are prerendered components (do not create a circuit) and simply return a list of selectors for non-prerendered components.

    • The client can then look-up the selectors and filter those that have a component rendered inside (it was prerendered)
    • The client then sends a StartComponentsRender message to the server with the filtered list of selectors and the server simply renders that subset.
  • We could simply check the list of component descriptors and assume that if there is a component with the same component type as a non prerendered component, we can assume it was prerendered. This takes the ability to render the same component as prerendered and non-prerendered at the same time (as siblings) but its uber cheap and I don't think ultimately is a fatal drawback.

My preference is 3 as its uber-cheap

// for a given key versus no value being stored for that key. However we should
// still data-protect the 'null' value so it's indistinguishable to the end
// user from non-null values.
if (protectedJson == null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you also have delete 👍

@SteveSandersonMS
Copy link
Member Author

Merging as the discussion seems settled, but if @blowdart or @GrabYourPitchforks have any further feedback please let me know!

@SteveSandersonMS SteveSandersonMS merged commit 5a4d5b9 into master Jul 12, 2019
@ghost ghost deleted the stevesa/add-protected-browser-storage branch July 12, 2019 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants