Build new `TableMetadata` without reassigning field IDs

_This is in part a question and open for discussion._

When building `TableMetadata` through the `TableMetadataBuilder`, all options of building "from scratch" force a reassignment of field IDs:

- Using [`TableMetadataBuilder::new`](https://docs.rs/iceberg/latest/src/iceberg/spec/table_metadata_builder.rs.html#87-89)
- Using [`TableMetadataBuilder::from_table_creation`](https://docs.rs/iceberg/latest/src/iceberg/spec/table_metadata_builder.rs.html#184-185), as this is a wrapper over `TableMetadataBuilder::new` using the `TableCreation` struct.


I noticed that it would be possible to get any type of `TableMetadata` that was desired through using the object directly, _but_ all of the fields are restricted to [`pub(crate)` scope](https://docs.rs/iceberg/latest/src/iceberg/spec/table_metadata.rs.html#106-174). I suspect the reason for this is safety, i.e. ensuring that creation occurs through the builder pattern where the relevant checks are performed on call to `build()`.

Questions:


1. **Would it be problematic to lift the restriction on the `TableMetadata` fields to be `pub`[^1] or allow the creation of `TableMetadata` without reassigning field IDs?**
2. **If the above is not possible, is there an example of creating the iceberg metadata file hierarchy in the correct way?**

[^1]: Considering this conflicts with the native Java implementation, I would also suspect it is problematic to do in the Rust version.

---

For extra context, we're currently constructing Iceberg metadata around pre-existing parquet files written by another system; however, there is no Iceberg catalog or prior metadata JSON. I noticed there is also a [`StaticTable`](https://docs.rs/iceberg/latest/iceberg/table/struct.StaticTable.html); however, this requires either pre-existing JSON from FileIO or an input `TableMetadata`, this 2nd option brings us back to the above issue.

This assignment leads to a mismatch in what is shown in the table metadata JSON vs the actual parquet file:

<details>

<summary> parquet schema </summary>

```
required group field_id=-1 arrow_schema {
  optional binary field_id=2 cpu (String);
  optional binary field_id=3 host1 (String);
  optional int64 field_id=1 time (Timestamp(isAdjustedToUTC=false, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false));
}
```

</details>

<details>

<summary> iceberg metadata JSON schema snippet </summary>


This reassignment occurs to the order that they appear within the parquet/arrow `Schema`, rather than the given field IDs.

```
  "schemas": [
    {
      "schema-id": 0,
      "type": "struct",
      "fields": [
        {
          "id": 1, <-- field_id=2 in parquet
          "name": "cpu",
          "required": false,
          "type": "string"
        },
        {
          "id": 2, <-- field_id=3 in parquet
          "name": "host1",
          "required": false,
          "type": "string"
        },
        {
          "id": 3, <-- field_id=1 in parquet
          "name": "time",
          "required": false,
          "type": "timestamp"
        }
      ]
    }
  ],
```

</details>

---

_This is also referenced by a question in the [iceberg slack](https://apache-iceberg.slack.com/archives/C05HTENMJG4/p1738750037294549)_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build new `TableMetadata` without reassigning field IDs #919

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build new TableMetadata without reassigning field IDs #919

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Build new `TableMetadata` without reassigning field IDs #919