Skip to content

Verifiable Credential Batch HTTP APIs are an Anti-pattern #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msporny opened this issue Feb 4, 2021 · 7 comments
Closed

Verifiable Credential Batch HTTP APIs are an Anti-pattern #100

msporny opened this issue Feb 4, 2021 · 7 comments
Labels
documentation Improvements or additions to documentation v2 Targeted for version 2 of the API

Comments

@msporny
Copy link
Contributor

msporny commented Feb 4, 2021

In general, batch HTTP APIs are an anti-pattern, and if you think you need one (e.g., PR #95), you may want to think through the following architectural considerations and questions:

  1. Why can't your software scale horizontally? As an example, the current VC HTTP API is capable of scaling to 1,000s of verifications per second. What is preventing you from scaling horizontally?
  2. Batching anything can be used as a Denial-of-Service attack vector that requires implementations to put DoS mitigations in their application code (which you really want to avoid doing, as it increases your system complexity and should be used as a last line of defense). Applications should protect themselves at the edges of a network, where bailing early on processing saves computing resources and thus increases throughput in the system, even during an attack/heavy usage.
  3. Batching code requires more-complex-than-necessary input and output processing. For example, batch processing return values are typically an array which then have to be matched up and aligned with input... which means the client has to hold on to either hold on to all the data (thousands of VCs) or the server has to hold on to and return all the data (thousands of VCs).

I expect there to be strong push back in a future WG that attempts to standardize any sort of Batch HTTP API. You will note that Twitter, Facebook, Github, Pinterest, etc. don't provide batch APIs for their most heavily utilized endpoints -- there are very good architectural reasons for this. HTTP/2 and HTTP/3 make it trivial to parallelize many requests per second. There are a handful of cases where batch APIs make sense (like deleting a bunch of IDs), but in general, expect there to be heavy push back against large payload/return value or long processing HTTP APIs.

image

@msporny msporny added the documentation Improvements or additions to documentation label Feb 4, 2021
@mprorock
Copy link
Contributor

mprorock commented Feb 4, 2021

In general I agree. In the proposed case via verify presentation however, a very specific use case is accounted for: the natural and intuitive presentation of multiple credentials in a manner that is consistent with the way in which multiple credentials are attached to a given shipment or container of goods. This use case does indeed have some scaling implications, but as important, is the natural collection and presentation of items related to one another (and in our case sharing a parent class) in a single call, and by which a single response is received. If we think from the perspective of a developer who might want to provide an api call or interface such as https://example.org/traceability/shipment/verify it would be very natural to provide a corresponding call on the vc-http-api that can handle a presentation related to the VCs attached to that shipment (or container, or whatever) whether or not there are proofs attached to that presentation. @dlongley and @OR13 discussed options to handle this use case in #88 - as noted by @bumblefudge this appears to be a common use case across multiple supply chain companies working with and testing against the vc-http-api. In this case the "batch" is introduced as a part of responding to a single item; a presentation of credentials related to a uniquely identifiable item such as a container, pallet, or crate of goods.

This exact scenario appears to be covered well in the VC Specification around presentations where multiple times it references both holders and proofs being handled "If present" and that:

Presentations MAY be used to combine and present credentials. They can be packaged in such a way that the authorship of the data is verifiable. The data in a presentation is often all about the same subject, but there is no limit to the number of subjects or issuers in the data. The aggregation of information from multiple verifiable credentials is a typical use of verifiable presentations.

My read on this and the reset of the spec around presentations is that the changes proposed in pr #95 as they stand now to allow /presentations/verify to also accept a presentation without proofs simply brings the vc-http-api in line with the spec.

@mprorock
Copy link
Contributor

mprorock commented Feb 5, 2021

adding an additional note on this, I think as a rule, we should always permit rate limiting and size limits for any implementation against this API, for example with a 413 or 429 error code, especially for APIs endpoints which may take an array of objects, where in some cases for cost, scaling, or other reasons, the implementer wishes to restrict that particular endpoint (where appropriate) to a single object.

@msporny
Copy link
Contributor Author

msporny commented Feb 5, 2021

@peacekeeper wrote:

I tend to agree with @mprorock that verifying a presentation with multiple credentials isn't really "batch processing",

Well, yeah... totally agree with that -- but that's not where this PR #95 started (see the state of the PR when I first reviewed it). This PR started as sending in an unbounded array of VCs and getting an unbounded set of results. -1 to that.

If what we're talking about here is allowing one presentation containing a number of embedded VCs to be submitted and getting back whether or not all/some failed (as a list of errors, which I think we support) -- then 100% onboard with that. +1 to being able to verify VPs that contain multiple VCs (even if the VP isn't signed). Presentations don't need to be signed... they only need to be signed to be "Verifiable", so we can add an option to note that VP proof check can be skipped.

We do want to state that implementations may impose a size/number restriction on the VP being provided... For example, "If you submit a 1MB VP, or one that contains 100s of VCs, go pound sand -- we're not going to process that, even if you're authorized." ... but that can be up to implementations to draw that line.

@msporny
Copy link
Contributor Author

msporny commented Feb 5, 2021

@mprorock wrote:

especially for APIs endpoints which may take an array of objects

It's that kind of statement that gets my spidey-sense tingling... API endpoints that take an array of objects don't pass the sniff test for me... they smell bad -- something's not right. Folks should expect natural push back on API endpoints that do that.

There are exceptions to the rule... query return values can be array oriented. Just trying to point out that there will always be skepticism wrt. arrays as input values.

@mprorock
Copy link
Contributor

mprorock commented Feb 5, 2021

totally understand. we live more in a data and analysis world on our side, so i think we encounter this more often, e.g. "please post all disease reports and samples collected over the last 30 days to X endpoint in Z format", e.g. also a lot of stuff like this queuing service for loading data into snowflake, where typically an array of files to process are loaded in one pass. you can see the notes around size limits, etc in the api (e.g. they cap at 5k per request)

@peacekeeper peacekeeper mentioned this issue Feb 19, 2021
@msporny msporny added the v2 Targeted for version 2 of the API label Nov 16, 2021
@msporny
Copy link
Contributor Author

msporny commented Nov 16, 2021

The group discussed this on 2021-11-16 and decided that batching is out of scope for this version of the API. Closing.

@msporny msporny closed this as completed Nov 16, 2021
@TallTed
Copy link
Collaborator

TallTed commented Nov 16, 2021

@msporny -- Might it make sense to remove the meme, and replace it with the relevant text, in the description of this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation v2 Targeted for version 2 of the API
Projects
None yet
Development

No branches or pull requests

3 participants