-
Notifications
You must be signed in to change notification settings - Fork 569
bundle: trade offs of schemes for bundle digest #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree oc should support creating a consistent digest. And yes, I think we can generate a set of recognized versioned digest formats and provide an example implementation to test against. Here are my questions about that:
To sum this up against the goals:
|
I think there's at least 2 reasons to keep them separated:
|
The issue with all of these formats is that they do not meet the stability requirement. If one tars and gzips the same filesystem layout twice, the generated bytes and their resulting hash are different. If there is a format that can generate a stable hash, please suggest it.
Part of this issue is deciding whether the trade offs that this digest system presents are valuable enough to specify at the runtime level. It would be good to remember that the scope of this effort is the container's on-disk format. The main question we must answer is whether or not we'd like a common digest that can operate regardless of transport mechanism. The main benefit of separating transport and runtime format is that it allows one to be flexible in distribution and assembly of the target root filesystem. A valid example is distributing a container image through
Let's start by reviewing the third item under the scoping of this feature:
This part of the specification is not intended as a replacement for transport verification. Indeed, this digest algorithm would be redundant to those used during transport but it would provide a common base which works the same regardless of the transport. Going back to the
I am not sure what you mean by this. Could you please clarify? Perhaps, an example would help. There may be some confusion here, since the proposal is not to enforce lexical naming, but rather to have a stable ordering of hashed elements based on the filename. |
On Fri, Jun 19, 2015 at 8:05 PM Stephen Day [email protected]
tar can meet the stability requirement given practical constraints. The Ordering: the string of bytes is always serialized in the same order.
Filesystem stability: every serialized property exactly reflects the
For all of these cases I think we can follow the lead of linux package The basic idea is that you store the canonical metadata next to the To address this filesystem stability problem in appc our idea is to split
Yes, I agree. What we are trying to arrive at here is a way of serializing
The first method is simpler and gives us the same property. The advantage Does this all make sense? |
Given that serializing a local filesystem is (a) going to be tricky and (b) completely independent of launching containers, can it be a separate layer from the container-lifecycle stuff handled by runc and the OC spec? Looking through the standard operations, it seems like they group by
I'm not sure what “tagged” is about. Maybe discoverability? Maybe auth? Anyhow, those sections pull apart pretty well, so in the spirit of building a toolbox with interchangable parts (why we have an OC project in the first place) I think we should have separate specs/tooling for each. Then folks launching a container from a FUSE-mounted |
Another option here is to explore deferring to the operating system to provide the verification. The solution would be to provide an OS-agnostic way of providing this. |
On Thu, Jun 25, 2015 at 01:21:25PM -0700, Stephen Day wrote:
Once you're comfortable offloading this check to external software
Why not just leave it to users to install a pre-create hook (if they With IPFS-FUSE or IMA you wouldn't even need the pre-create hook. The |
ima is unlikely to be enabled in most Linux deployments because of associated performance overhead. |
On Thu, Jun 25, 2015 at 02:10:25PM -0700, mjg59 wrote:
Does that matter for us? Folks who are running it can use it to |
closing in favor of the discussion in #11 |
The current version of the specification proposes a signature system based on
a verifiable executable, allowing agility in the calculation of cryptographic
content digests. A more stable approach would be to define a specific
algorithm for walking the container directory tree and calculating a digest.
We need to compare and contrast these approaches and identify one that can
meet the requirements.
The goal of this issue is identify the full benefits of this approach and
decide on the level flexibility we should provide in the specification. Such a
calculation would involve content in the container root, including the
filesystem and configuration.
Benefits and Cost
Let's review the features we get from digesting a container:
be invariant to distribution methods. Any implementation that creates a container
distributed in any manner (tar, rsync, docker, rkt, etc.) will have a common
identifier to verify and sign.
implementations. Signing the digest should be sufficient to verify that a
container root file system has not been tampered. We provide a common
base to provide pre-run verification.
root. Such a system is not a replacement for validation of content from an
untrusted source. Ensuring trust and content integrity are left to the content
distribution system.
We need to consider the following properties of any approach to achieve these goals:
be useful. Content should be well-identified by its hash.
and wrecks the buffer cache. Minimizing this IO or not doing it all is
ideal. We need consider the cost against the benefits.
layout is not changing. It also needs to be reproducible across runtime
environments.
Requirements
We can take the above to define specific requirements for the digest:
container.
order of the relative container path of the resource ensuring stability under
additions and deletions.
files.
The Straw Man
The specification currently proposes the following approach to provide
a common "script" location for containers to provide a digest. It is included
here for reference.
Digest
The purpose of the "digest" step is to create a stable, summary of the
content, invariant to irrelevant changes yet strong enough to avoid tampering.
The algorithm for the digest is defined by an executable file, named “digest”,
directly in the container directory. If such a file is present, it can be run
with the container path as the first argument:
The nature of this executable is not important other than that it should run
on a variety of systems with minimal dependencies. Typically, this can be a
bourne shell script. The output of the script is left to the implementation
but it is recommend that the output adhere to the following properties:
tampering
root and any other attributes that must be static for the container to
operate correctly
avoid the act of signing preventing the content from being verified
The following is a naive example:
The above is still pretty naive. It does not include permissions and users and
other important aspects. This is just a demo. Part of the specification
process would be producing a rock-solid, standard version of this script. It
can be updated at any time and containers can use different versions depending
on the use case.
Goals
Let's use this issue to decide the following:
approach or should we define a very specific algorithm?
The text was updated successfully, but these errors were encountered: