-
Notifications
You must be signed in to change notification settings - Fork 583
Support for Unknown Fields (Rebase of #574) #1340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support for Unknown Fields (Rebase of #574) #1340
Conversation
Does not compile, incomplete.
- Rename UnknownFieldSet to UnknownFieldList, as requested by maintainer
caspermeijn
- Change fixed32 and fixed64 representation to be consistent with C++
implementation, as requested by caspermeijn
- start resolving build-time errors
- Currently failing with reexported crate issues
- TODO: What endianness is protobuf?
This section should be added back (and/or modified to fit new standards), but commenting out for now to reduce points of failure to resolve creating bad Copy derives
Instead of checking against empty, which can fail differently. Attempting to force fields with unknown to not have a derived Copy.
While this is a change made by rustfmt/clippy, it's outside the scope of this PR, so it's been removed
Seems like local clippy check runs in rust edition 2021, CI runs in 2024
|
@caspermeijn Unfortunately, Github doesn't have a clear way to request reviews on a PR for someone outside the org and/or not a direct developer on the project, otherwise I would have done so. Apologies for the ping if that isn't the right channel. |
|
@blizzardfinnegan AFAIK you could remove or comment out the ignored tests here in order to verify the implementation: https://github.com/blizzardfinnegan/prost/blob/feature/unknown_fields_support/conformance/failing_tests.txt |
|
@alebar42 Thank you for the correction and clarification! Either way, the tests do in fact pass Edit: Unclear what was the issue, but said changes are now in the PR, which is the important part. |
|
@caspermeijn |
|
Ran into an issue using pbjson_build with this. Because when you generate unknown fields, pbjson_build doesn't know what to do with them and it errors: |
caspermeijn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking this forward. It is a large PR, so it takes some time for me to respond.
I miss user facing documentation. Could you add a section to the README that describes this new functionality?
|
|
||
| for attr in attrs { | ||
| if word_attr("unknown_fields", attr) { | ||
| set_bool(&mut unknown, "duplicate message attribute")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message is incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? If the same attribute (unknown_fields) is found twice, that should be a duplicate attribute. I do think that it should probably have different behaviour though; it should probably just pull all the unknown fields in, I would assume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the miscommunication, I do agree the error should be here.
The error text doesn't match the error situation. I think it should be something like: "multiple unknown_fields in one message". Do you have an example of what the complete error message that the users sees is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely make the error message something more descriptive, sure! I don't have an example off-hand though, unfortunately.
| }); | ||
| let merge_fallback = match fields.iter().find(|&(_, f)| f.is_unknown()) { | ||
| Some((field_ident, field)) => { | ||
| let merge = field.merge(&prost_path, quote!(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in seperate section? The fields are already sorted with the unknown fields as last, so this can be in the first loop, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it's in a separate iterator call so it can check if there are any unknown fields, and if there aren't, add the skip_field call at the end of the iteration. There's definitely a better way to do it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great (but not required) if you could find a better way. My suggestion would be to do a single loop for all field and then always do the skip. If unknown fields is enabled, then the skip branch is never triggered. I don't know whether the compiler accepts a double catch all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me!
|
Thank you for the review! I pretty much did as clean of a direct uplift as I could with the original MR, without changing too much. I'll go through and add docs and resolve the issues you pointed out. Apologies for the repeated pings. It's definitely a bear of a PR, I wasn't sure if you had seen it or not. |
- Remove Option wrapper on unknown fields - Actually utilise custom iterator instead of nested for loops - use Iterator.sum() instead of manual counter - use Buf.get_u64_le() and Buf.get_u32_le() instead of manual copies - Update conformance to not require failing_tests.txt - remove empty failing_tests.txt
- Add user-facing documentation for `prost` and `prost-build` for enabling unknown fields, along with limitations - Modify `prost_build::Config::include_unknown_fields` to have a default value for the field in which unknown fields are stored. (defaults to `unknown_fields`)
|
CI is currently failing, for reasons unclear to me. All CI failures are due to this error: Which is not accurate. It's only failing in CI (this failure does not occur if you run the same commands locally, afaict), and only on the files I've changed. |
I think that might be a separate uplift for the pbjson to support unknown fields as currently written in this PR. There might be some changes to make in the prost implementation to help with that change, but I think getting initial support in to prost is more important than making sure the entire ecosystem works all at once. If this PR goes through, I would highly recommend opening an issue in pbjson to that effect. I'm not familiar with the project personally, and the devs there would have a better sense of what, if anything, would need to change in prost to get their project to work properly, and could stack a patch on top of this PR. |
|
The change was quite minor, and it worked for me, but I wanted to call out how much of a breaking change this is. And how it is kind of awkward to have to initialize the unknown field field during struct creation. In all other languages, unknown fields are accessed after the fact, but the user doesn't need to think about them unless needed. I think this is partially an issue with rust not allowing an inherent default value (like other languages) and prost allowing struct literal creation. In the issue I suggested moving to a constructor with a private unknown fields or something, but those are also breaking changes. |
|
Part of the issue is also the fact that, the PR as originally written and as reimplemented here, is almost entirely for round trip consistency, and largely doesn't expose the unknown fields. You get an iterator, so it's not impossible to check based on tags, but it's not intuitive or ergonomic, especially since other structs just directly expose the internal fields. This PR also doesn't fully interpret fields beyond the bare minimum to parse the raw message, so there would be more logic necessary to fully interpret contained information. I like the suggestion of introducing constructors, I think that would help a lot in terms of new stuff being written. That being said, I think you could build a transition plan to introduce constructors, and still allow but deprecate manually creating struct literals, and then loop back around to disallowing struct construction directly later on. |
|
That's fair. I'm new to the rust world and was mostly hoping there was some magic that could be used to include something in a struct, but not have it be in a struct literal. As my use case (and most unknown fields use cases) are just making sure they don't get dropped in transit. I'm not actually doing anything with the unknown fields. |
|
There is the |
|
Yes, that's what I used in my pbjson update, but either way, it's a breaking change if you want to support unknown Fields |
1 similar comment
|
Yes, that's what I used in my pbjson update, but either way, it's a breaking change if you want to support unknown Fields |
I recently merge a change to DecodeError: #1330 I think you can reproduce this problem if you rebase your branch to the latest main |
No, this PR is not a breaking change. You have to explicitly enable the unknown fields, so existing users are not broken. Only the users that choose for unknown fields are broken. |
|
Right. I was saying that enabling unknown fields is a breaking change, and in all other languages it is just implicitly supported. I don't think there is a current way to make this not the case, but I originally was starting a discussion to see if one could be teased out. |
|
|
||
| for attr in attrs { | ||
| if word_attr("unknown_fields", attr) { | ||
| set_bool(&mut unknown, "duplicate message attribute")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the miscommunication, I do agree the error should be here.
The error text doesn't match the error situation. I think it should be something like: "multiple unknown_fields in one message". Do you have an example of what the complete error message that the users sees is?
| }); | ||
| let merge_fallback = match fields.iter().find(|&(_, f)| f.is_unknown()) { | ||
| Some((field_ident, field)) => { | ||
| let merge = field.merge(&prost_path, quote!(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great (but not required) if you could find a better way. My suggestion would be to do a single loop for all field and then always do the skip. If unknown fields is enabled, then the skip branch is never triggered. I don't know whether the compiler accepts a double catch all
|
|
||
| /// Gets an iterator over the fields contained in this set. | ||
| pub fn iter(&self) -> impl Iterator<Item = (u32, &UnknownField)> { | ||
| UnknownFieldIter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to replace the custom iterator by some kind of Iterator::flatten() call? Please let me know if you don't understand what I mean with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried tinkering with it for a while, but since the second iterator is inside a tuple, I wasn't having much luck. I'll have another go at it this coming Tuesday, see if I can manage something better.
|
Random thought. Protobuf naming guide recommends not using underscores as the initial character. Although it's not enforced, suggesting or using a default struct field name with a underscore prefix would be much more unlikely to have naming conflicts. |
For more information see Issue #2
This PR is largely similar to #574 , rebased to main, and with edits suggested by the maintainer on said PR implemented to the best of my ability.