Skip to content

Refactor binary encoding of canon builtins for easier future extensibilty #496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alexcrichton opened this issue Apr 8, 2025 · 0 comments
Labels

Comments

@alexcrichton
Copy link
Collaborator

Currently canon builtins are primarily encoded as a prefix byte plus any payload immediately afterwards. Over time though we might want to add more options/extensibility to preexisting builtins, such as the try idea from #444. In this situation it's always possible to add new builtin codes at the end of the index space, and functionally there's no issue with that. Conceptually though it'd be unfortunate if the same intrinsic could be defined across multiple opcodes and can make implementations a little more awkward to maintain -- e.g. parsing is spread out across major opcodes for the "same intrinsic".

An example of this split today is that 0x03 indicates the resource.drop intrinsic while 0x07 is resource.drop async. Morally these are the same intrinsic, just with a different option, and spreading it out across two opcodes is a little unfortunate.

What I'd envision in the future is something like:

  • Each canon builtin gets a prefix opcode, just as today.
  • Each canon builtin is then followed by flags:varu32, a leb-encoded 32-bit integer. This integer is a bitset of optional fields that follow
    • For example bit 0 could mean "async" so resource.drop async would be encoded as 0x03 0x01 while resource.drop would be encoded as 0x03 0x00.
  • The meaning of each bit would be intrinsic-specific, but a loose guideline would be that each bit may optionally indicate that there are more bytes to decode. For example async? wouldn't have any more bytes to decode, but some future flag may require another immediate to decode.
  • Intrinsics could still reserve the right to use this extensibility u32 as way of completely changing how the rest of the intrinsic is encoded, for example in the future an intrinsic might completely drop a canonopt list or something like that.

I don't think we should make this change in the near term per se as this is basically just a stylistic concern for the binary format. This might be good to finalize/discuss just before a final release of the component model though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant