-
Notifications
You must be signed in to change notification settings - Fork 286
feat: re-export name mapping #1116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
86db09c
7cc1977
b57742d
6bdf66e
1c4cec0
abd4e51
85b024e
3e766cc
dea509b
ea31484
886ba3f
d22edce
140a42b
c18338a
e93d846
0cfa5f7
3bda87a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! Iceberg mapped fields. | ||
|
||
use std::collections::HashMap; | ||
|
||
use serde::{Deserialize, Serialize}; | ||
|
||
use crate::spec::MappedField; | ||
use crate::Error; | ||
|
||
/// Utility mapping which contains field names to IDs and | ||
/// field IDs to the underlying [`MappedField`]. | ||
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)] | ||
pub struct MappedFields { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given we are going to add a lot of visitors for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not quite sure what you mean, there is already a mod name_mapping {
// contents of name_mapping.rs here
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes mapped field should also be included in name_mapping.rs. You can move everything from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still don't understand why we need to have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBH I haven't come up with a case where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oh sorry for misclarification. I mean we could create following file structures:
In rust, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In PyIceberg we've have a visitor to traverse the name-mapping and schema at the same time: https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/name_mapping.py#L251-L372 This is because when looking up by column name, the
|
||
fields: Vec<MappedField>, | ||
name_to_id: HashMap<String, i32>, | ||
id_to_field: HashMap<i32, MappedField>, | ||
} | ||
|
||
impl MappedFields { | ||
/// Create a new [`MappedFields`]. | ||
/// | ||
/// # Errors | ||
/// | ||
/// This will produce an error if the input fields contain either duplicate | ||
/// names or field IDs. | ||
pub fn try_new(fields: Vec<MappedField>) -> Result<Self, Error> { | ||
let mut name_to_id = HashMap::new(); | ||
let mut id_to_field = HashMap::new(); | ||
|
||
for field in &fields { | ||
if let Some(id) = field.field_id() { | ||
if id_to_field.contains_key(&id) { | ||
return Err(Error::new( | ||
crate::ErrorKind::DataInvalid, | ||
format!("duplicate id '{id}' is not allowed"), | ||
)); | ||
} | ||
|
||
id_to_field.insert(id, field.clone()); | ||
jdockerty marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for name in field.names() { | ||
if name_to_id.contains_key(name) { | ||
return Err(Error::new( | ||
crate::ErrorKind::DataInvalid, | ||
format!("duplicate name '{name}' is not allowed"), | ||
)); | ||
} | ||
name_to_id.insert(name.to_string(), id); | ||
} | ||
} | ||
} | ||
|
||
Ok(Self { | ||
fields, | ||
name_to_id, | ||
id_to_field, | ||
}) | ||
} | ||
|
||
/// Get a reference to the underlying fields. | ||
pub fn fields(&self) -> &[MappedField] { | ||
&self.fields | ||
} | ||
|
||
/// Get a field, by name, returning its ID if it exists, otherwise `None`. | ||
pub fn id(&self, field_name: &str) -> Option<i32> { | ||
self.name_to_id.get(field_name).copied() | ||
} | ||
|
||
/// Get a field, by ID, returning the underlying [`MappedField`] if it exists, | ||
/// otherwise `None`. | ||
pub fn field(&self, id: i32) -> Option<&MappedField> { | ||
self.id_to_field.get(&id) | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
mod test { | ||
|
||
use pretty_assertions::assert_eq; | ||
|
||
use super::*; | ||
use crate::ErrorKind; | ||
|
||
#[test] | ||
fn mapped_fields() { | ||
let my_fields = vec![ | ||
MappedField::new(Some(1), vec!["field_one".to_string()], Vec::new()), | ||
MappedField::new( | ||
Some(2), | ||
vec!["field_two".to_string(), "field_foo".to_string()], | ||
Vec::new(), | ||
), | ||
]; | ||
|
||
let mapped_fields = MappedFields::try_new(my_fields).expect("valid mapped fields for test"); | ||
|
||
assert!( | ||
mapped_fields.field(1000).is_none(), | ||
"Field with ID '1000' was not inserted to the collection" | ||
); | ||
assert_eq!( | ||
*mapped_fields.field(1).unwrap(), | ||
MappedField::new(Some(1), vec!["field_one".to_string()], Vec::new()), | ||
); | ||
assert_eq!( | ||
*mapped_fields.field(2).unwrap(), | ||
MappedField::new( | ||
Some(2), | ||
vec!["field_two".to_string(), "field_foo".to_string()], | ||
Vec::new(), | ||
) | ||
); | ||
|
||
assert!( | ||
mapped_fields.id("not_exist").is_none(), | ||
"Field was not expected to exist in the collection" | ||
); | ||
assert_eq!(mapped_fields.id("field_two"), Some(2)); | ||
assert_eq!( | ||
mapped_fields.id("field_foo"), | ||
Some(2), | ||
"Field with another name shares the same ID of '2'" | ||
); | ||
assert_eq!(mapped_fields.id("field_one"), Some(1)); | ||
} | ||
|
||
#[test] | ||
fn invalid_mapped_fields() { | ||
let duplicate_ids = vec![ | ||
MappedField::new(Some(1), vec!["field_one".to_string()], Vec::new()), | ||
MappedField::new(Some(1), vec!["field_two".to_string()], Vec::new()), | ||
]; | ||
|
||
assert_eq!( | ||
MappedFields::try_new(duplicate_ids).unwrap_err().kind(), | ||
ErrorKind::DataInvalid | ||
); | ||
|
||
let duplicate_names = vec![ | ||
MappedField::new(Some(1), vec!["field_one".to_string()], Vec::new()), | ||
MappedField::new(Some(2), vec!["field_one".to_string()], Vec::new()), | ||
]; | ||
assert_eq!( | ||
MappedFields::try_new(duplicate_names).unwrap_err().kind(), | ||
ErrorKind::DataInvalid | ||
); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,11 +20,26 @@ | |
use serde::{Deserialize, Serialize}; | ||
use serde_with::{serde_as, DefaultOnNull}; | ||
|
||
/// Property name for name mapping. | ||
pub const DEFAULT_SCHEMA_NAME_MAPPING: &str = "schema.name-mapping.default"; | ||
|
||
/// Iceberg fallback field name to ID mapping. | ||
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)] | ||
#[serde(transparent)] | ||
pub struct NameMapping { | ||
pub root: Vec<MappedField>, | ||
root: Vec<MappedField>, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should contains a |
||
} | ||
|
||
impl NameMapping { | ||
/// Create a new [`NameMapping`] given a collection of mapped fields. | ||
pub fn new(fields: Vec<MappedField>) -> NameMapping { | ||
Self { root: fields } | ||
} | ||
|
||
/// Get a reference to fields which are to be mapped from name to field ID. | ||
pub fn root(&self) -> &[MappedField] { | ||
&self.root | ||
} | ||
} | ||
|
||
/// Maps field names to IDs. | ||
|
@@ -33,12 +48,38 @@ pub struct NameMapping { | |
#[serde(rename_all = "kebab-case")] | ||
pub struct MappedField { | ||
#[serde(skip_serializing_if = "Option::is_none")] | ||
pub field_id: Option<i32>, | ||
pub names: Vec<String>, | ||
field_id: Option<i32>, | ||
names: Vec<String>, | ||
#[serde(default)] | ||
#[serde(skip_serializing_if = "Vec::is_empty")] | ||
#[serde_as(deserialize_as = "DefaultOnNull")] | ||
pub fields: Vec<MappedField>, | ||
fields: Vec<MappedField>, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By changing this one to I assume that is expected for now and I'll update the other tests 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't change json output since it's iceberg spec. I think we could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @liurenjie1024 I've been taking another look at this again, although including Do you mind elaborating more on this? As I don't quite understand how to stop the field being serialised incorrectly when this is altered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify here, do you mean pull the information that used to be in the E.g. the It looks like that exists in pyiceberg via the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Exactly. |
||
} | ||
|
||
impl MappedField { | ||
/// Create a new [`MappedField`]. | ||
pub fn new(field_id: Option<i32>, names: Vec<String>, fields: Vec<MappedField>) -> Self { | ||
Self { | ||
field_id, | ||
names, | ||
fields, | ||
} | ||
} | ||
|
||
/// Iceberg field ID when a field's name is present within `names`. | ||
pub fn field_id(&self) -> Option<i32> { | ||
self.field_id | ||
} | ||
|
||
/// Get a reference to names for a mapped field. | ||
pub fn names(&self) -> &[String] { | ||
&self.names | ||
} | ||
|
||
/// Get a reference to the field mapping for any child fields. | ||
pub fn fields(&self) -> &[MappedField] { | ||
&self.fields | ||
} | ||
} | ||
|
||
#[cfg(test)] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put this in the name_mapping dir?🤔
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this was mentioned above as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this should be removed, see #1116 (comment)