Description
Problem description
The current MultiAddr spec does not have any good way for dealing with optional protocol parameters that have well defined defaults. Depending on the specific protocol in question different workarounds have been proposed, the predominant theme being recursion:
- IPv6 link scopes:
/ip6/fe00::32/ip6zone/6/…
- TLS Server Name Identification:
/tls/sni/example.com/…
This has the obvious problems that:
- Each protocol must have a special parser which will then greedily swallow up all following components that it considers relevant
- All possible of such “attribute protocol items” must be reserved to ensure that their names are no used as “regular” / “top-level” protocol names
- As protocols evolve this may also cause nasty conflicts between newly defined attributes and existing protocol names.
- While attribute names may be shared between different protocols they must still be treated as a separate class from top-level protocol names since they may never appear top-level while still sharing a common namespace with that top-level class.
- It is not immediately obvious to a human reader which items are items of the previous top-level protocol and which constitute the start of a new encapsulation layer
There also do not seem any obvious advantages to this scheme that would somehow make the above problems appear like reasonable trade-offs.
Another proposal suggested in some places (#63) was using plain greediness: After a given protocol item shows up in the path, all further items are swallowed up and used as single “path parameter”:
- HTTP:
/http/example.com/api/v1
(hereexample.com
is the hostname and/api/v1
the HTTP path base) - WS and WSS:
/wss/example.com/api/v1/tls/ws
- Unix domain:
/unix/path/to/socket.sock/tls/ws
While HTTP arguably is a terminator protocol (meaning that no other protocol may follow it anyways – this notion needs separate discussion!), Unix domain sockets and WebSockets definitely are not. Hence, it is unclear how a parser should figure out that /tls
does not refer to a path component and whether this even is the case (the parser would have to proactively probe the file system for this, which is very much not in line with the vision of MultiAddr being a common description of paths to application endpoints; with WebSockets this is not even reliably possible to start with).
The example with WebSockets in particular demonstrates why this cannot work. A suggested alternative was to wrap the path parameter inside some kind of special set of delimiters (different kinds of braces were suggested):
/wss/(/example.com/api/v1)/tls/ws
While this works, it does not take into account the fact that there is nothing usually required about the given parameter: The hostname can usually be inferred from previous protocol levels (and left empty if unknown) and the path may always be empty.
Also potentially relevant data (such as HTTP basic auth) may be missing from the above. By combining the two approaches discussed above we arrive at something similar to the following:
/wss/(/example.com:4443/api/v1)/user/john/password/doh/cookie/bla=blab/tls/ws
Or the following when excluding all attributes:
/wss/()/tls/ws
Neither of these strike the author as particularly intelligible.
This proposal will not attempt to resolve the issues with Unix domain sockets.
Proposed solution
Summary:
- Allow each protocol to carry an arbitrary number of keyword arguments whose meaning is protocol dependent
- Deprecate existing attribute protocol items:
ip6zone
- (Are there more actually standardized at the moment?)
Text-representation syntax
Extending the current spec, each protocol name may now optionally be followed by an opening parenthesis character ((
) indicating the start of the protocol parameter list. This is to be followed by an arbitrary number of key-value parameters, each delimited by the coma character (,
) and terminated by a closing parenthesis character ()
). After this closing character a forward slash (/
) is expected. If the parameter list is skipped the protocol name should immediately be followed by a forward slash (as is currently the case); an empty parameter list (()
) is allowed as well.
Each key-value pair consists of a name, made up only of ASCII lower-case characters, ASCII digits and the ASCII minus sign (-
), followed by a single equals sign (=
), followed by an arbitrary UTF-8 encoded value. The value may contain any character other then the NUL-byte, but requires escaping of the following characters using a single backward slash (\
) if they are to appear inside the value field: opening ((
) and closing parenthesis ()
), the coma character (,
) and the backward slash (\
) itself. Most importantly the forward-slash (/
) does not need to be escaped since it carries no special significance inside protocol parameter list; this allows for easy embedding of paths, like in the following example:
/http(host=example.com,base=/api/v1)
/http(base=/endpoint\(1:2\))
More examples:
/tls(sni=example.com)
/ip6(scope=6)/fe00::32/tcp/80/http
/wss(host=example.com:4443,base=/api/v1,user=john,password=doh,cookie=bla=blab)/tls/ws
- Note: The name
host
here refers to the HTTP Host-Header and has nothing to do where to connection will actually be made to.
- Note: The name
/wss/tls/ws
Each protocol may still accept zero or one static parameters or known or unknown binary length after the final forward-slash. It is expected the use of optional parameters will be minimal in practice (HTTP-y stuff probably being the prominent exception here, not the rule).
(Precise syntax subject to change/bikeshedding!)
Binary-representation syntax
The general format for the binary syntax is:
<BinaryMultiAddr> := (<ProtocolBinary>(<AttributeBinary>*))+
<ProtocolBinary>
is the binary MultiAddr representation of the protocol itself and uses the following format:
<ProtocolBinary> := <ProtocolType>([NIL]|<ProtocolValue>|<ProtocolLength><ProtocolValue>)
The format used for the <ProtocolValue>
part of the representation depends on the <ProtocolType>
:
[NIL]
(No value): Used by all protocols with zero static parameters; no value follows and attributes or further protocols may immediately follow.<ProtocolValue>
: Used by all protocols with one static parameter of known binary length; the value, of a length predefined for each protocol type, immediately follows.<ProtocolLength><ProtocolValue>
: Used by all protocols with one static parameter of variable binary length; the<ProtocolLength>
is a UVarInt containing the length of the following protocol value.- The mapping between the text and binary representation of the protocol's value may be implemented by an arbitrary protocol-specific function, as long as it is ensured that such transformation may be performed without loss of information with regards to the protocol described. That is, the following constraints must hold:
text_value ࣃ≃ binary2text(text2binary(text_value))
binary_value ≃ text2binary(binary2text(binary_value))
≃
means “must be equal with regards to the constraints imposed by the protocol” – for instance, DNS names are case-insensitive hence a loss of case may be acceptable as this is not considered relevant “information” in this protocol (XXX: find better wording for this).
- Due to this definition it is not possible to parse binary MultiAddrs with unknown protocol values.
<AttributeBinary>
is the binary MultiAddr representation of a single protocol attribute and must follow either a protocol binary representation or another attribute. All attributes share a single format:
<AttributeBinary> := [ATTR_TOKEN]<AttributeKey><AttributeLength><AttributeValue>
In this definition:
[ATTR_TOKEN]
is a reserved UVarInt indicating the start of an attribute, whose value must not every be used for a<ProtocolValue>
(TODO: Decide on a value)<AttributeKey>
is a UVarInt from a table of known attribute names. Attributes in this table are not bound to any specific protocol, it serves only as a look-up table for keeping the binary representation of attributes small.<AttributeLength>
is a UVarInt determining the length of the following<AttributeValue>
in bytes.<AttributeValue>
is the UTF-8 encoded text of the attribute's value in the text representation.
TODO: Allow storing unknown attributes in binary, whose names are not in the table?
Other requirements
Unexpected parameters should result in an error when trying to instantiate the given protocol and may result in an error during parsing of the given MultiAddr. For each expected parameter there must be a sensible default value and parameters whose value corresponds to such default value should be omitted from the textual and binary representations. All parameters must be optional, for mandatory parameters the current /protoname/param
syntax should be used instead.
EDIT 1: Some language improvements + language-change to always call it an “HTTP path base”, since the path only refers to the path bases used to multiplex different HTTP services of a single hostname and not about referring to actual single files
EDIT 2: Added example for escaping
EDIT 3: Specify binary encoding (but specific to the proposal at hand and for what we already have)