ADR: NanoTDF Attribute Storage Optimization in 255-bytes

# ADR: NanoTDF Attribute Storage Optimization in 255-bytes

## Context and Problem Statement

We need to store attributes in the nanotdf policy header, but it has a maximum length of 255 bytes. The current attribute value in ztdf is in the form of a Fully Qualified Name (FQN) as JSON. Even removing the JSON overhead, the FQN is still too verbose to store multiple attributes within the 255-byte limit.

Example:

```text
https://namespace.com/attr/classification/val/topsecret
```

The goal is to define a syntax that will compress the data to allow for efficient storage of multiple attributes within the 255-byte limit.

## Considered Options

1. **Schema-Based Syntax with Full URLs**
2. **Index-Based Syntax**
3. **Protobuf Compression**

## Decision Outcome

We have decided to use the **Schema-Based Syntax with Full URLs**. This decision was made based on the need for a federatable and customer-friendly approach that retains full attribute names and avoids using indexes.

We also considered **Protobuf Compression** for further optimization, however this makes ease of debugging more difficult
since the data cannot be easily read without a protobuf decoder.

## Options

### Option 1: Schema-Based Syntax with Full URLs

**Format:**

```text
{schema}|{base_url}|{attribute}:{value,{...value}}\n{attribute}:{value,{...value}};...
```

**Components:**

- **Schema (`schema`)**: A digit representing the URL schema (0 for HTTP, 1 for HTTPS).
- **Base URL (`base_url`)**: The full namespace URL without the schema.
- **Attributes**: `{attribute}:{value}` pairs separated by semicolons (`;`). Multiple values within an attribute are separated by commas (`,`).

**Example:**

```text
1|namespace.com|classification:topsecret;relto:usa,gba,cda
1|ns.namespace.com|group:a
```

**Advantages:**

- Retains full attribute names and base URLs, making it customer-friendly and federatable.
- Clear and easy to parse structure.

**Disadvantages:**

- Attributes starting with numbers (0 or 1) need careful handling to avoid confusion with schema indicators.
- Slightly more verbose due to retaining full URLs.

#### Approximate Range of Attributes

Given the 255-byte limit, the number of attributes that can be stored depends on the length of the base URLs and attribute names. For estimation:

- Assume average domain name (.com) length: 13 bytes
- Average attribute name length: 5-15 bytes
- Average value length: 1-10 bytes
- Delimiters and schema indicators: 3-10 bytes

Example calculation for a single attribute set:

```text
1|namespace.com|classification:topsecret
```

This example is about 40 bytes.

For multiple attributes:

```text
1|namespace.com|classification:topsecret;relto:usa,gba,cda
```

This example is about 60 bytes.

For multiple attributes across multiple namespaces:

```text
1|namespace.com|classification:topsecret;relto:usa,gba,cda
1|namespace2.com|classification:topsecret;relto:usa,gba,cda
1|namespace3.com|classification:topsecret;relto:usa,gba,cda
1|namespace4.com|classification:topsecret;relto:usa,gba,cda
```

This example is about 240 bytes.

Therefore, approximately 15-20 attributes of similar length can be stored within the 255-byte limit.

#### Example

See playground [https://go.dev/play/p/M9s8QOtTn4Y](https://go.dev/play/p/M9s8QOtTn4Y)

### Option 2: Index-Based Syntax

**Format:**

```text
{schema}|{index}|{attribute_index}:{value_index};{attribute_index}:{value_index};...
```

**Components:**

- **Schema (`schema`)**: A digit representing the URL schema (0 for HTTP, 1 for HTTPS).
- **Index (`index`)**: A numeric index representing the base URL.
- **Attributes**: `{attribute_index}:{value_index}` pairs separated by semicolons (`;`). Multiple values within an attribute are separated by commas (`,`).

**Example:**

```text
1|1|1:1;2:2,3,4
```

**Advantages:**

- Extremely compact representation.
- Potentially allows storing a higher number of attributes within the 255-byte limit.

**Disadvantages:**

- Requires a predefined mapping of indexes to base URLs and attributes, which is not federatable.
- Harder to manage and less transparent to customers.

### Option 3: Protobuf Compression

Protobuf can serialize the data into a compact binary format, potentially reducing the size further than ASCII or other text-based formats.

**Advantages:**

- Compact binary format that is efficient for storage and transmission.
- Strongly typed data ensures consistency and integrity.
- Supports multiple programming languages and versioning.

**Disadvantages:**

- Requires additional tooling and setup to define and compile Protobuf schemas.
- Requires additional tooling to decode and read the binary data.
- May not provide significant savings over the schema-based approach for small datasets.

#### Protobuf Example

```proto
syntax = "proto3";

enum Schema {
  HTTP = 0;
  HTTPS = 1;
}

message Attribute {
  string name = 1;
  repeated string values = 2;
}

message AttributeSet {
  Schema schema = 1;
  string base_url = 2;
  repeated Attribute attributes = 3;
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ADR: NanoTDF Attribute Storage Optimization in 255-bytes #917

ADR: NanoTDF Attribute Storage Optimization in 255-bytes

Context and Problem Statement

Considered Options

Decision Outcome

Options

Option 1: Schema-Based Syntax with Full URLs

Approximate Range of Attributes

Example

Option 2: Index-Based Syntax

Option 3: Protobuf Compression

Protobuf Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ADR: NanoTDF Attribute Storage Optimization in 255-bytes #917

Description

ADR: NanoTDF Attribute Storage Optimization in 255-bytes

Context and Problem Statement

Considered Options

Decision Outcome

Options

Option 1: Schema-Based Syntax with Full URLs

Approximate Range of Attributes

Example

Option 2: Index-Based Syntax

Option 3: Protobuf Compression

Protobuf Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions