Skip to content

Vault and File Schema - Ingress and Egress Schemas #222

Open
@CMCDragonkai

Description

@CMCDragonkai

Specification

Egress Schema

Schema

The schema is a .json.schema file which can have an arbitrary name. This schema file should allow importing other schema files to create a hierarchy of schemas, ensuring minimal repetition. It would also allow for schema composition by creating sub-schemas like one for credit cards and one for login information can be composed to make a AWS credential schema. The ingress schema must be present in the vault and the egress schema must be present outside it.

The library https://github.com/bcherny/json-schema-to-typescript can be used to generate typescript interfaces from aJSON schema file. However, this particular library requires the project to be ESM, which Polykey is currently not. It need js-db and js-workers to be migrated before Polykey can be migrated to ESM too.

The selected library has a restriction of not being able to use arbitrary extensions for the schema, so that should be a file name restriction for the schema. Moreover, the schema should also be a valid schema, otherwise the command will error out.

Ingress

Ingress schemas control inputs and mutations to vaults. Mutations can all be considerd inputs. They thus maintain guarantees on how the vault can change over commits. These schemas are saved along with the vaults and are shared with them too.

If a schema is specified, then make sure the incoming data conforms to the requirements as specified from that schema. If the new data does not conform, then an error can be thrown. If the schema does not allow the file to exist, then another error will be thrown. Basically, the schema will ensure the secrets inside a vault are maintained with respect to a schema.

This statement is not fully correct, as secrets which already exist in a vault will not be affected. The schema implementation will only affect new incoming data and when the schema flag is used. So, the schema does not describe the state of a vault, but rather describes constraints for data that can be optionally applied.

Implementing this would be challenging as it would need the implementation of a pre-write or a post-write and pre-commit hook which applies this schema. However, this can be worked around where secret data is streamed, as the data can be checked before being written to the EFS.

Egress

Egress schemas are interpreted by Polykey during egress commands. They allow the user to document in a machine parseable and human readable way (thus JSON schema is best for this), what the expected egress output should look like, and this is useful for when Polykey performs an egress action, and it can interpret the action to check if what it is egressing is going to match what is expected, and fail if it doesn't and explain to the user WHY it is failing.

This is useful because it allows Polykey to provide a dynamic fine-grained access control context, where the vault path references are resolved contextually. So that gestalt A's vault contents is not necessarily the same as gestalt B's vault contents. This allows easier separation of dev, staging, prod as well per-user secrets.

The main reason for the existence of egress schemas is to enforce the Principle of Least Authority. A vault containing secrets for a repository doesn't need to export all at once. The development workflow might need only half the secrets, and the production might need the other half. Constantly exporting all secrets could be a source of authority leaks. As such, a schema would restrict the exported secrets to only the ones specified, tightening the security as much as possible around the egress points.

This will probably be the most common use-case of the schemas. This would ensure that that data egresses in a particular format and only for the files specified, and only in the specified types. If a secret does not match the specified type, it would throw an error. Perhaps we can attempt a conversion, like converting a number to a string, but it would be inconsistent, so it should instead throw errors.

Thanks to egress schemas, if the available secrets comply with the specified schema, then it would run just fine, but the command will fail to run if the secrets fail to match the schema. This would make sure that the environment will be correctly set up with the secrets, or the command will fail. Of course, this setup will be correct only of the schema was correct in the first place.

The focus should be on egress points first, as that will be useful sooner than ingress points, and is also easier to implement and test out.

Additional context

Also see zeta.house and Zeta-House-Docs for latest usage of composed schemas.

The following links point to the legacy codebase on GitLab, which does not exist anymore

Tasks

  1. Develop the JSON structure of schema
  2. Apply validation logic of schema to the vault contents at ingress points
  3. Apply schema enforcing at egress points
  4. Integrate into creation of vault

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions