Skip to content

v3.2: Support all common XML node types (element, attribute, text, cdata) #4592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: v3.2-dev
Choose a base branch
from

Conversation

handrews
Copy link
Member

@handrews handrews commented May 16, 2025


This change removes the restriction on where the xml field and XML Object can appear, and adds a nodeType field to support the four most commonly used XML node types: element, attribute, text, and cdata. A fifth nodetype, none, is used to prevent a Schema Object from producing a node.

It also adds many examples of how to model different XML scenarios, including using prefixItems for complex ordered nodes.

While this could have been made much simpler by always defaulting nodeType to element, a more complex requirement was needed to preserve compatibility with the existing (but now deprecated - see also PR #4591) attribute and wrapped fields. This way, the behavior of an XML object that is empty or uses only name, prefix, and/or namespace behaves the same whether you look at defaults for nodeType or for attribute and wrapped.


I expect the deprecation of attribute and wrapped to be controversial, but there are several reasons for doing it:

  • The solution proposed in How to represent XML elements with attributes #630 was to add another boolean field, text, to indicate text nodes. But if we are going to support text nodes, we definitely also want to support CDATA sections, which would require three fields (attribute, text, and cdata) that were all mutually exclusive. One boolean is fine, but three to manage a single state is more design smell than I can handle.
  • The wrapped field is a very specific case of a more general problem, which is Schema Objects that do not correspond to XML nodes. The old wording only acknowledged "property" and "root" schemas, and ignored the many other sorts of Schema Objects that might appear, and the complex ways in which they might map (or not) to XML nodes. The nodeType: none construct allows controlling this in a generalized way.
  • It becomes much more straightforward to define the name behavior, as certain types have names and others do not. This is much more intuitive than logic involving wrapped.
  • We might want to support additional node types in the future. I just don't know enough about any of them to know what might be worth supporting, much less how to model them, and no one has asked for those (yet).
  • The null-handling problems in Formalize how to express null value in xml #3959 are tricky, and this allows us to (if we want) define clear behavior for nodeType: attribute without breaking existing support for attribute: true.

In addition to null handling, I came up with several other possible caveats to call out or clarify, but I wanted to keep this as minimal as possible. If the related concerns come up in review I can add them, otherwise I will do it as a follow-on if it seems useful. But let's see how this lands first.


  • schema changes are included in this pull request
  • schema changes are needed for this pull request but not done yet
  • no schema changes are needed for this pull request

handrews added 3 commits May 15, 2025 16:13
Clarifies that the name of the root XML element comes from
the component name, which was shown in an example but was unclear
due to the use of the obsolete OAS 2.0 terminology "model."

This does not change the restriction (in the `xml` field of the
Schema Object) that the `xml` field only applies to property
schemas (and not root schemas).
This avoids reinforcing the "root schema" vs "property schema"
restriction that we plan to relax.
This change adds a nodeType field to support the four most
commonly used XML node types: element, attribute, text, and cdata.
A fifth nodetype, none, is used to prevent a Schema Object from
producing a node.

This also removes the restriction on where the xml field and
XML Object can appear, as the nodeType system is more flexible
than the old system.

This deprecates two existing fields:

* attribute, replaced by nodeType: attribute
* wrapped, replaced by nodeType: none
@handrews handrews added this to the v3.2.0 milestone May 16, 2025
@handrews handrews requested review from a team as code owners May 16, 2025 01:21
@handrews handrews added enhancement xml media and encoding Issues regarding media type support and how to encode data (outside of query/path params) labels May 16, 2025

##### Fixed Fields

| Field Name | Type | Description |
| ---- | :----: | ---- |
| <a name="xml-name"></a>name | `string` | Replaces the name of the element/attribute used for the described schema property. When defined within `items`, it will affect the name of the individual XML elements within the list. When defined alongside `type` being `"array"` (outside the `items`), it will affect the wrapping element if and only if `wrapped` is `true`. If `wrapped` is `false`, it will be ignored. |
| <a name="xml-node-type"></a>nodeType | `string` | One of `element`, `attribute`, `text`, `cdata`, or `none`, as explained under [XML Node Types](#xml-node-types). The default value is `none` if `$ref`, `$dynamicRef`, or `type: "array"` is present in the [Schema Object](#schema-object) containing the XML Object, and `element` otherwise. |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% the bits about $ref and $dynamicRef are backwards-compatible, because I'm not sure what, if anything, the behavior was in those cases before, as "property schema" was not well-defined. If we mean "literal inline schema directly under a name under properties without references", then there was no defined behavior and we can do what we want.

Otherwise, it might not be, and we should let schemas containing $ref or $dynamicRef have a default nodeType of element, and folks can explicitly set them to none. I don't think that change would impact anything else in this PR, and might be the more conservative compatibility choice.

Copy link
Contributor

@ralfhandl ralfhandl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to "XML Object Examples"

| <a name="xml-wrapped"></a>wrapped | `boolean` | MAY be used only for an array definition. Signifies whether the array is wrapped (for example, `<books><book/><book/></books>`) or unwrapped (`<book/><book/>`). Default value is `false`. The definition takes effect only when defined alongside `type` being `"array"` (outside the `items`). If `nodeType` is present, this field MUST NOT be present.<br /><br />**Deprecated:** Set `nodeType: "element"` explicitly in place of `wrapped: true` |

Note that when generating an XML document from object data, the order of the nodes is undefined.
Use `prefixItems` to control node ordering.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work? Given that document order is significant in XML, an example would be helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralfhandl there is an example... didn't you say you hadn't read the examples section yet? ;-)

In all seriousness, I'd love better examples once you get there, I struggle to come up with them. But there is a prefixItems example.

@ralfhandl ralfhandl requested a review from a team May 16, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement media and encoding Issues regarding media type support and how to encode data (outside of query/path params) xml
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants