Skip to content

bundle: trade offs of schemes for bundle digest #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
stevvooe opened this issue Jun 19, 2015 · 10 comments
Closed
3 tasks

bundle: trade offs of schemes for bundle digest #5

stevvooe opened this issue Jun 19, 2015 · 10 comments

Comments

@stevvooe
Copy link

The current version of the specification proposes a signature system based on
a verifiable executable, allowing agility in the calculation of cryptographic
content digests. A more stable approach would be to define a specific
algorithm for walking the container directory tree and calculating a digest.
We need to compare and contrast these approaches and identify one that can
meet the requirements.

The goal of this issue is identify the full benefits of this approach and
decide on the level flexibility we should provide in the specification. Such a
calculation would involve content in the container root, including the
filesystem and configuration.

Benefits and Cost

Let's review the features we get from digesting a container:

  1. Provide a common digest based on the on disk container image. It should
    be invariant to distribution methods. Any implementation that creates a container
    distributed in any manner (tar, rsync, docker, rkt, etc.) will have a common
    identifier to verify and sign.
  2. The digest should be cryptographically secure and can be verified across
    implementations. Signing the digest should be sufficient to verify that a
    container root file system has not been tampered. We provide a common
    base to provide pre-run verification.
  3. Such a digest should only be used to verify after building a container
    root. Such a system is not a replacement for validation of content from an
    untrusted source. Ensuring trust and content integrity are left to the content
    distribution system.

We need to consider the following properties of any approach to achieve these goals:

  1. Security - Such a system needs to provide a sufficient level of security to
    be useful. Content should be well-identified by its hash.
  2. Cost - Walking a filesystem tree is slow and hashing all files is expensive
    and wrecks the buffer cache. Minimizing this IO or not doing it all is
    ideal. We need consider the cost against the benefits.
  3. Stability - The digest needs to be calculated at a time when the container
    layout is not changing. It also needs to be reproducible across runtime
    environments.

Requirements

We can take the above to define specific requirements for the digest:

  1. The digest will be made up of the hash of hashes of each resource in the
    container.
  2. The order of the additions to the digest should be based on the lexical sort
    order of the relative container path of the resource ensuring stability under
    additions and deletions.
  3. Each resource should only be stat’ed and read once during a digesting process.
  4. Unless specifically omitted, the digest should include the following resource types:
    1. files
    2. directories
    3. hard links
    4. soft links
    5. character devices
    6. block devices
    7. named fifo/pipes
    8. sockets
  5. The digest of each resource must fix the following attributes:
    1. File contents
    2. File path relative to the container root.
    3. Owners (uid or names?)
    4. Groups (gid or names?)
    5. File mode/permissions
    6. xattr
    7. major/minor device numbers for block/char devices
    8. link target names for hard/soft links
  6. The digest should be re-calculable using information about only changed
    files.

The Straw Man

The specification currently proposes the following approach to provide
a common "script" location for containers to provide a digest. It is included
here for reference.

Digest

The purpose of the "digest" step is to create a stable, summary of the
content, invariant to irrelevant changes yet strong enough to avoid tampering.
The algorithm for the digest is defined by an executable file, named “digest”,
directly in the container directory. If such a file is present, it can be run
with the container path as the first argument:

$ $CONTAINER_PATH/digest $CONTAINER_PATH

The nature of this executable is not important other than that it should run
on a variety of systems with minimal dependencies. Typically, this can be a
bourne shell script. The output of the script is left to the implementation
but it is recommend that the output adhere to the following properties:

  • The script itself should be included in the output in some way to avoid
    tampering
  • The output should include the content, each filesystem path relative to the
    root and any other attributes that must be static for the container to
    operate correctly
  • The output must be stable
  • The only constraint is that the signatures directory should be ignored to
    avoid the act of signing preventing the content from being verified

The following is a naive example:

#!/usr/bin/env bash

set -e

# emit content for building a hash of the container filesystem.

content() {

    root=$1
    if [ -z "$root" ]; then
        echo "must specify root" 1>&2;
        exit 1;
    fi

    cd $root

    # emit the file paths, stat and their content hash
    find . -type f -not -path './signatures/*' -exec shasum -a256 {} \; | sort

    # emit the script itself to prevent tampering
    cat $scriptpath
}

scriptpath=$( cd $(dirname $0) ; pwd -P )/$(basename $0)

content $1 | shasum -a256

The above is still pretty naive. It does not include permissions and users and
other important aspects. This is just a demo. Part of the specification
process would be producing a rock-solid, standard version of this script. It
can be updated at any time and containers can use different versions depending
on the use case.

Goals

Let's use this issue to decide the following:

  • Do we all agree on the benefits of generating a common digest and signature scheme for containers at the runtime level?
  • Are there any benefits, trade offs or considerations missed above?
  • Should we provide algorithmic flexibility with a verified "script"
    approach or should we define a very specific algorithm?
@philips
Copy link
Contributor

philips commented Jun 19, 2015

I agree oc should support creating a consistent digest.

And yes, I think we can generate a set of recognized versioned digest formats and provide an example implementation to test against. Here are my questions about that:

  • Is there an existing algorithm, or set of algorithms, that are already widely deployed that we can use to turn on-disk file systems into a stream of bytes? The advantage is these formats likely have already thought through the corner cases and encapsulate all of the filesystem features. Are there disadvantages of standardizing on one per-OS? e.g. tar for posix and zip for windows?
  • Is there a clear benefit from distinctly separating the runtime format from the transport format for digest calculation? The transport format necessarily needs to encapsulate the same information that the runtime format defines; having the two separated will result in redundancy in calculation and verification of content integrity and trust.
    • An advantage of defining and using the same format is that the digest can be calculated and verified before being placed onto a "live" filesystem. To avoid the problem of interpreting the file before writing it to disk we can either
      1. treat the entire stream-of-bytes as an opaque blob, or
      2. use a hash list and a signature can say where each file extent lives in a comment. This would increase the safety/security properties of the digest as interpretation of the stream isn't necessary. Also, instead of relying on the lexical naming of files we could define the root hash as a sorted hash list, thus individual file records could be processed in any order and potentially in parallel.

To sum this up against the goals:

  • Yes, I think there is a benefit to a digest for identity and signatures.
  • We should consider providing something that has expanded utility besides verify on-disk expansions only to reduce duplication of effort and increase security.
  • I think we should define a specific set of versioned algorithms and an example implementation of that algorithm to test other implementations against; if we do it right the implementation method is irrelevant.

@shykes
Copy link

shykes commented Jun 20, 2015

Is there a clear benefit from distinctly separating the runtime format from the transport format for digest calculation?

I think there's at least 2 reasons to keep them separated:

  1. To support the many different distribution methods that exist today and will potentially soon support ocf payloads. Different versions of Docker registry, current appc implementations, rpm, deb, bittorrent, and of course the plethora of custom download+checksum tools. These tools will not have compatible (or even comparable) checksums.

  2. The requirements are different. For example checksums in the context of secure distribution requires verification before any of the inner content is processed (a common complaint against earlier versions of checksum in docker pull). By contrast, we don't need that in the ocf checksum. It is a means of unique identification of a runnable object, which is already unpacked on the local filesystem.

@stevvooe
Copy link
Author

@philips

Is there an existing algorithm, or set of algorithms, that are already widely deployed that we can use to turn on-disk file systems into a stream of bytes? The advantage is these formats likely have already thought through the corner cases and encapsulate all of the filesystem features. Are there disadvantages of standardizing on one per-OS? e.g. tar for posix and zip for windows?

The issue with all of these formats is that they do not meet the stability requirement. If one tars and gzips the same filesystem layout twice, the generated bytes and their resulting hash are different. If there is a format that can generate a stable hash, please suggest it.

Is there a clear benefit from distinctly separating the runtime format from the transport format for digest calculation? The transport format necessarily needs to encapsulate the same information that the runtime format defines; having the two separated will result in redundancy in calculation and verification of content integrity and trust.

Part of this issue is deciding whether the trade offs that this digest system presents are valuable enough to specify at the runtime level. It would be good to remember that the scope of this effort is the container's on-disk format. The main question we must answer is whether or not we'd like a common digest that can operate regardless of transport mechanism.

The main benefit of separating transport and runtime format is that it allows one to be flexible in distribution and assembly of the target root filesystem. A valid example is distributing a container image through rsync. If we tie the digest calculation to a transport format, such a digest mechanism would require the daemon to repack the content in a format that is not related to the runtime or distribution mechanism.

  • An advantage of defining and using the same format is that the digest can be calculated and verified before being placed onto a "live" filesystem. To avoid the problem of interpreting the file before writing it to disk we can either
  • treat the entire stream-of-bytes as an opaque blob, or

Let's start by reviewing the third item under the scoping of this feature:

Such a digest should only be used to verify after building a container root. Such a system is not a replacement for validation of content from an untrusted source. Ensuring trust and content integrity are left to the content distribution system.

This part of the specification is not intended as a replacement for transport verification. Indeed, this digest algorithm would be redundant to those used during transport but it would provide a common base which works the same regardless of the transport.

Going back to the rsync example, we get transport verification through rsync's file sync algorithm and ssh. Afterwards, one might want to confirm that the container was represented accurately on the target host.

  • use a hash list and a signature can say where each file extent lives in a comment. This would increase the safety/security properties of the digest as interpretation of the stream isn't necessary. Also, instead of relying on the lexical naming of files we could define the root hash as a sorted hash list, thus individual file records could be processed in any order and potentially in parallel.

I am not sure what you mean by this. Could you please clarify? Perhaps, an example would help. There may be some confusion here, since the proposal is not to enforce lexical naming, but rather to have a stable ordering of hashed elements based on the filename.

@philips
Copy link
Contributor

philips commented Jun 22, 2015

On Fri, Jun 19, 2015 at 8:05 PM Stephen Day [email protected]
wrote:

@philips https://github.com/philips

Is there an existing algorithm, or set of algorithms, that are already
widely deployed that we can use to turn on-disk file systems into a stream
of bytes? The advantage is these formats likely have already thought
through the corner cases and encapsulate all of the filesystem features.
Are there disadvantages of standardizing on one per-OS? e.g. tar for posix
and zip for windows?

The issue with all of these formats is that they do not meet the stability
requirement. If one tars and gzips the same filesystem layout twice, the
generated bytes and their resulting hash are different. If there is a
format that can generate a stable hash, please suggest it.

tar can meet the stability requirement given practical constraints. The
goal here is to filter a list of objects with properties and into a byte
stream; this is what tar does. Now I see there are two concerns in making
this byte stream in practice:

Ordering: the string of bytes is always serialized in the same order.
There are two approaches:

  1. enforce serializing in lexicographical order of the filesystem directory
    entries
  2. hash each entry individually and use a hash list, see below.

Filesystem stability: every serialized property exactly reflects the
on-disk version. Now, there are several ways the on-disk metadata may get
changed or where an implementation will do tricks leading to a none-stable
image:

  1. A system administrator uses SELinux, ACLs, etc to set additional local
    constraints on the container filesystem. In some cases these properties
    might also come along with the original image.
  2. Through some very common actions a filesystem may get modified on disk
    via a accidental chown -R, find . -exec {}, etc, touch
  3. Smart filesystem handling of the container runtime that wants to be
    flexible with timestamps or uid/gid owner of r/o files like shared
    libraries and binaries to reduce on-disk usage.
  4. Filesystems that don't support attr/acl, etc that need to still verify
    the digest e.g. NFS

For all of these cases I think we can follow the lead of linux package
management systems and store filesystem metadata adjacent so that it can be
reassembled, enforced and verified later. For example rpm -a --setperms
and rpm -V.

The basic idea is that you store the canonical metadata next to the
filesystem in a separate file so that if any filesystems properties get
lost, corrupted, or changed the user can be notified and the digest
stability remains true.

To address this filesystem stability problem in appc our idea is to split
out the relatively small metadata sections and put them aside, this is part
of why @vbatts built tar-split: https://github.com/vbatts/tar-split

The main benefit of separating transport and runtime format is that it
allows one to be flexible in distribution and assembly of the target root
filesystem. A valid example is distributing a container image through
rsync. If we tie the digest calculation to a transport format, such a
digest mechanism would require the daemon to repack the content in a format
that is not related to the runtime or distribution mechanism.

Yes, I agree. What we are trying to arrive at here is a way of serializing
filesystems into a string of bytes to run through a hash function. scp,
rsync, etc can get the files to disk but once there and we need to take
that on-disk representation and turn it into a string of bytes to run
through a hash function. I am suggesting that instead of inventing some new
record format we use something that is known to be able to serialize
filesystems: archive formats.

This part of the specification is not intended as a replacement for
transport verification. Indeed, this digest algorithm would be redundant to
those used during transport but it would provide a common base which works
the same regardless of the transport.

Yes, I agree.

Going back to the rsync example, we get transport verification through
rsync's file sync algorithm and ssh. Afterwards, one might want to
confirm that the container was represented accurately on the target host.

  • use a hash list and a signature can say where each file extent lives
    in a comment. This would increase the safety/security properties of the
    digest as interpretation of the stream isn't necessary. Also, instead of
    relying on the lexical naming of files we could define the root hash as a
    sorted hash list, thus individual file records could be processed in any
    order and potentially in parallel.

    I am not sure what you mean by this. Could you please clarify? Perhaps,
    an example would help. There may be some confusion here, since the proposal
    is not to enforce lexical naming, but rather to have a stable ordering of
    hashed elements based on the filename.

As discussed above I am saying there are perhaps two ways to solve the
"sort order" problem:

  1. Enforce lexicographical ordering of the entries and run them through the
    hash function.
  2. Given a filesystem with files f1 & f2 with contents c1 & c2 and
    serialized file metadata m1 & m2. We calculate the digest of c1+m1 and
    c2+m2, arriving at a list of digests [d1, d2]. Then calculate the digest of
    hash(sort([d1, d2])) this will give us the root identifying hash of the fs.

The first method is simpler and gives us the same property. The advantage
of the hash method is that adding a file doesn't require running the entire
byte array through the hash function just the one additional file. The
downside is complexity of implementation.

Does this all make sense?

@philips philips changed the title Trade offs of digest scheme for container root bundle: trade offs of schemes for bundle digest Jun 25, 2015
@wking
Copy link
Contributor

wking commented Jun 25, 2015

Given that serializing a local filesystem is (a) going to be tricky and (b) completely independent of launching containers, can it be a separate layer from the container-lifecycle stuff handled by runc and the OC spec? Looking through the standard operations, it seems like they group by

  • Using a container locally (started, stopped, snapshotted?)
  • Authenticating an unpacked local container (not listed in the standard operations, but the point of this issue).
  • Transport (copied, downloaded, uploaded)

I'm not sure what “tagged” is about. Maybe discoverability? Maybe auth? Anyhow, those sections pull apart pretty well, so in the spirit of building a toolbox with interchangable parts (why we have an OC project in the first place) I think we should have separate specs/tooling for each. Then folks launching a container from a FUSE-mounted /ipfs/{container-root-multihash}/ don't have to get involved in the auth discussion, and don't have to worry about weird auth side effects tricking into the launcher or spec ;).

@stevvooe
Copy link
Author

Another option here is to explore deferring to the operating system to provide the verification. The solution would be to provide an OS-agnostic way of providing this.

http://sourceforge.net/p/linux-ima/wiki/Home/

@wking
Copy link
Contributor

wking commented Jun 25, 2015

On Thu, Jun 25, 2015 at 01:21:25PM -0700, Stephen Day wrote:

Another option here is to explore deferring to the operating system
to provide the verification.

Once you're comfortable offloading this check to external software
(and I am), I don't think it matters to OC whether that verifier is
OS-level tooling or not.

The solution would be to provide an OS-agnostic way of providing this.

Why not just leave it to users to install a pre-create hook (if they
want one) using whichever tooling they like? Having a list of such
pre-create hooks in a contrib/ directory would be nice, but I don't
see the need for binding this more tightly to OC than that.

http://sourceforge.net/p/linux-ima/wiki/Home/

With IPFS-FUSE or IMA you wouldn't even need the pre-create hook. The
verification would be handled automatically per-file accessed.

@mjg59
Copy link

mjg59 commented Jun 25, 2015

ima is unlikely to be enabled in most Linux deployments because of associated performance overhead.

@wking
Copy link
Contributor

wking commented Jun 25, 2015

On Thu, Jun 25, 2015 at 02:10:25PM -0700, mjg59 wrote:

ima is unlikely to be enabled in most Linux deployments because of
associated performance overhead.

Does that matter for us? Folks who are running it can use it to
verify their container trees. Folks who choose to run from IPFS can
use it's hashing/signature model to verify their container trees.
Folks who want to serialize to a reproducible tarball and check an
OpenPGP signature against that tarball can do that. Folks who just
want to trust their local filesystem can do that. The point is that
all of these folks might want to use an OC implementation to
create/start/stop their container. We just need to give them hooks
where they can attach any verification logic they like, and then get
out of the way.

@crosbymichael
Copy link
Member

closing in favor of the discussion in #11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants