Define `/unix` #174

achingbrain · 2024-10-28T19:28:24Z

Summary

Adds a protocol note for how to encode paths to Unix domain sockets as strings, that may include the delimiting character of /.

This allows us to append other tuples to the multiaddr while also ensuring we can round-trip the address to a string and back.

This doesn't affect the binary representation of the multiaddr since everything is length-delimited.

Takes inspiration from #164 and proposes using URI encoding for the segment, the same as the /http-path component.

One difference is if the path is to represent the filesystem root, it must be included in the value portion of the tuple, otherwise it can be omitted.

Before Merge

Allow at least 24 hours for community input
If this is a new protocol, has the change been applied to https://github.com/multiformats/multicodec as well? If so, please link the multicodec PR.

Adds a protocol note for how to encode paths to Unix domain sockets as strings, that may include the delimiting character of `/`. This allows us to append other tuples to the multiaddr while also ensuring we can round-trip the address to a string and back. This doesn't affect the binary representation of the multiaddr since everything is length-delimited. Takes inspiration from #164 and proposes using URI encoding for the segment, the same as the `/http-path` component. One difference is if the path is to represent the filesystem root, it must be included in the value portion of the tuple, otherwise it can be omitted.

protocols/unix.md

Need to resolve how to encode unix paths so peer ids can be appended to them - multiformats/multiaddr#174

MarcoPolo · 2024-11-05T17:51:27Z

Are there any backwards compatibility issues? Is anyone using this multiaddr currently?

aschmahmann · 2024-11-05T20:15:10Z

Is anyone using this multiaddr currently

I'm not sure how heavily it's utilized, but kubo can use those addresses. e.g. https://github.com/ipfs/kubo/blob/4009ad3e5a502518ddc7d48a888707f812ddc629/docs/config.md?plain=1#L219

aschmahmann · 2024-11-05T20:30:19Z

Are there any backwards compatibility issues?

I haven't looked too deeply at usage, but I'd suspect so. The question is what to do about it. There are a bunch of issues that are basically about the failures related to protocols (like http or unix) in taking unescaped paths like #139 #55. They were also known from the earliest days multiformats/go-multiaddr#31.

Maybe the answer is to just break it, or to say that implementations MAY/SHOULD/MUST preferentially try looking for the entire path as a socket and if that fails will try with just the first path segment as a socket.

Not specific to this PR, but between http-path and this perhaps worth biting the bullet on specifying how "path" types should be done in general. Maybe the answer is just to require them all to be escaped, maybe it's something else (e.g. closer to @Stebalien's proposal in multiformats/multiformats#55).

achingbrain · 2024-11-08T18:53:59Z

There will be backwards compatibility issues with the string representation of the path due to the escaping, though not with the binary representation.

I think this is used but not very commonly. The js stack can also use unix addresses but currently only as a terminal element, that is no tuples can follow it.

Arguably the lack of escaping has harmed it's use - I know I've tried to use it a few times over the years but always come up hard against the inability to append anything else to the address and the various issues asking for clarification etc speak to a need for this.

or to say that implementations MAY/SHOULD/MUST preferentially try looking for the entire path as a socket and if that fails will try with just the first path segment as a socket

I wonder if this could get exploited by making longer paths with segments that mirror tuples after the unix section? Possibly with .. in them that go somewhere bad?

Maybe the answer is to just break it

I kind of think so. The original PR was merged in haste, the question about if we need to append things to it was unresolved.

It was flagged as experimental and subject to change so..

proposal in multiformats/multiformats#55

It's interesting that - it seems to mandate parsing paths left to right, we recently switched multiaddr-to-uri to parsing right to left to support some forms of multiaddr.

It also doesn't seem to say much about escaping so we'd probably still need to solve the same problem there.

achingbrain · 2025-03-14T08:23:03Z

Are there any further thoughts here?

ntninja · 2025-03-14T09:13:15Z

@achingbrain: Since I just noticed it wasn’t mentioned I'd just like to mention my old MultiAddr proposal from 2019 again: #87

It’s mainly about argument to individual MultiAddr path segments, but it also proposes syntax for encoding paths in MultiAddr segments:

/unix/(/dir/file.socket)/http, rather than /unix/dir%2Ffile.socket/http
/unix/(/dir\\weird/\(file\).sock)/ws, rather than /unix/dir\weird%2F(file).sock/ws
/unix/dir/file.socket, rather than /unix/dir%2Ffile.socket for backward-compatibility

This is IMHO significantly easier to read than using percent-encoded local file system paths.

achingbrain · 2025-03-14T17:40:05Z

Thanks, I didn't see that issue. I like your proposal for using () for tuple values and the flexibility it brings but I think it's more in the realm of a Multiaddr v2 as you point out. I would certainly support that style of stringification if it those sort of discussions started happening.

To solve this particular issue though, I would prefer to be consistent with other multiaddr tuple types. HTTP has already solved this problem by percent escaping the characters so I think we should use that here too.

ntninja · 2025-03-22T16:10:15Z

@achingbrain: I can see where you’re coming from, but I’d just like the point out that the major benefit, besides nicer style, is the fact that it’s backwards-compatible! (We are talking about MultiAddr v1 here after all. 🙂)

I know most of the MultiFormat people tend to lean on the string-representation => unimportant side, but at least ipfs-cluster and py-ipfs-http-client are using the /unix/dir/file.socket string-representation as part of their (potentially) persistent configuration right now. So changing it to /unix/dir%2Ffile.socket would actually be a backward-incompatible/major-release change for them.

By contrast, /unix/(/dir/file.socket)/… would remain compatible except in the super-obscure edge-case of a socket file path actually starting with /(/…

Not saying you shouldn’t go with your proposal, but that is something to thoroughly consider I think!

dhuseby · 2025-05-07T15:41:21Z

Are there any limitations on the character set that can be used in a path? I'm assuming that the characters are all UTF-8 encoded so unicode characters like ↑ (e.g. up arrow, UTF-8: 0xE2 0x86 0x91) are encoded as %25E28691. Is that correct? Or would it be encoded as %E28691?

achingbrain · 2025-05-07T15:51:13Z

Are there any limitations on the character set that can be used in a path?

Not really, it's kind of the wild west since most "unix" implementations can trace their lineage prior to UTF-8 and other modern conveniences.

POSIX has a definition of portable file name characters but realistically it could be UTF-8, ASCII or char[]. I think just not NUL \0

As per this proposal ↑ would be encoded as %E2%86%91.

Full details are in RFC 3986, linked to from the proposed changes in the PR.

achingbrain · 2025-05-28T15:28:30Z

Refs ipshipyard/roadmaps#16

MarcoPolo

Commenting on the spec itself, it seems reasonable. I won't comment on whether this is safe to deploy or if it will break existing users of the unix multiaddr component.

achingbrain · 2025-06-05T19:18:13Z

Allow at least 24 hours for community input

Given this has been open for over six months I think this has been satisfied.

I will merge this next week unless there are strenuous objections.

This is a PoC that aims to support multiformats/multiaddr#174 while not breaking existing Kubo users. See TODO in daemon.go – likely we want to move this to https://github.com/multiformats/go-multiaddr

nabijaczleweli · 2025-06-11T21:24:38Z

If this only governs the serialised form: Please do not merge this while either (a) this implies that an initial / is inserted if a path doesn't have one or (b) this implies that a relative path somehow magically becomes an absolute one. These are both severe footguns, and in the (a) case it also excludes serialising abstract socketaddrs.

If this governs both the serialised form and opines on ORM/binary forms: Please do not merge this. The spec is not reasonable because it (a) doesn't include abstract socketaddrs, (b) allows dropping the initial / which is nonsensical since that, instead of indicating a relative path (which is normal and real), somehow indicates an absolute path despite indicating the opposite, (c) allows dropping the initial / which precludes further extensions that would allow encoding abstract addresses.

Also, this completely misses the 3 unix socket types (STREAM/DGRAM/SEQPACKET). These map directly onto tcp/udp/{nothing}. What's the convention for these?

I'm implementing a libp2p unix-domain back-end where I need to parse Multiaddrs and I would like to not clash with the ecosystem. (If there is no ecosystem I will define one following the principles below.)

The two real address formats are:

normal (available universally): it's just a length-limited path (i.e. a sequence of non-0 bytes)
abstract (available under linux): it's a binary blob of length [1, 108], the first byte is 0

(1) are subject to normal path lookup (2) are used directly as keys into a kernel hashmap

Canonically, these are rendered (1) directly and (2) with an @ replacing embedded 0s (incl. the one at the start, observe ss --unix; these do NOT end at the second NUL, it's just convention to terminate them there; you can (and it is done) embed NULs anywhere). This is not acceptable for a serialisation format since an @ is a valid byte in a path.

I don't really see a reason to not render these directly (with an optional percent-encoding step for display, sure), so
/unix/%2Frun%2Fsystemd%2Fnotify ‒ (1) absolute path
/unix/notify ‒ (1) relative path, equivalent to the one above for peers in /run/systemd
/unix/%009e48095dfe839881%2Fbus%2Fsystemd-timesyn%2Fbus-api-timesync ‒ (2)
are all valid addresses of UNIX-domain sockets taken from my system.

To actually use them you would strictly need to add /stream (connection-based protocol like TCP) or /seqpacket (TCP 2, Linux-only) and /dgram (connection-less like UDP) HOWEVER they occupy the same namespace, so unlike in TCP/UDP, you CANNOT have both
/unix/%2Frun%2Fsystemd%2Fnotify/stream &
/unix/%2Frun%2Fsystemd%2Fnotify/dgram
(this fails at bind/sendto/connect time).

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli · 2025-06-12T00:31:53Z

God that's a lot of writing to say "i think we should just percent-encode the address instead of making it worse". Implementation of this proposal in https://github.com/libp2p/rust-libp2p/pull/6056/files#diff-ccb7f1d7de0deedbdbbcb4262ff70cd8658a1a0178fb3795e1b1a9dbbd00f919R363

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

achingbrain · 2025-06-12T06:02:27Z

@nabijaczleweli thanks for your feedback and the additional use cases.

If this only governs the serialised form:
If this governs both the serialised form and opines on ORM/binary forms:

As noted in the OP, this only affects the string representation of a multiaddr.

(a) this implies that an initial / is inserted if a path doesn't have one or (b) this implies that a relative path somehow magically becomes an absolute one

The assumption in this PR is that a relative path is only useful in the current context - if you try to interpret it starting from a different directory the path may not resolve or may resolve to an unexpected file which also seems like a significant footgun.

I can appreciate a light-touch garbage-in/garbage-out approach however so I am not opposed to dropping the initial / removal/insertion and just encoding whatever the value contains, though that complicates handling abstract socket paths as you point out.

i think we should just percent-encode the address

This sounds reasonable.

abstract (available under linux): it's a binary blob of length [1, 108], the first byte is 0
This is not acceptable for a serialisation format since an @ is a valid byte in a path.

I think there are two ways forward here:

Declare this out of scope and say that /unix only refers to filesystem paths. This way we can keep garbage-in/garbage-out encoding (e.g. support relative paths that may later make no sense). To handle this properly it may need a new protocol to be defined since as you point out, any string serialization of the address could also be interpreted as a filesystem path - /unix-abstract or something, I don't have a strong opinion on the name.
An alternative might be to say:
1. All filesystem paths must be absolute
2. Filesystem paths start with / (which we will now not strip, as per the above)
3. Abstract socket paths start with @
4. All paths are percent-encoded

achingbrain · 2025-06-12T08:57:31Z

My preference would be for option 2 since relative paths in multiaddrs seem to be too fragile to be useful. What do you think @MarcoPolo?

nabijaczleweli · 2025-06-12T10:25:25Z

Sorry, I don't think this makes sense. "This is different if you resolve it in a different context" is true of all paths/addresses, the point of an address is to tell the specific process you're giving it to how to call the peer you're thinking of. The "root" directory and the current directory are exactly the same in this model, and both are start-of-lookup inodes stored in the resolving process, configurable by the resolving process. Restricting this to just the root directory is strictly worse (and more ambiguous and less readable and less convention-following) since controlling the root directory is more convoluted and more disruptive for the resolving process than controlling the current directory.

You could analyse inet sockets the same way: on the same host, depending on the time, network namespace, routing table, &c., two different inet addresses can resolve to the same peer and two different peers can be reachable at the same address. The same more obviously applies to dns entries.

I think splitting unix into unix and unix-abstract makes sense (and this additionally cleans up the into of "unix works everywhere, unix-abstract works on linux" from "unix sometimes only works on linux, depending on the data") and GIGO percent-encoding them both makes the most sense.

Based on multiformats#174

nabijaczleweli · 2025-06-12T10:58:26Z

Please cf. #179 for my proposed wording.

Based on multiformats#174

achingbrain · 2025-06-12T11:26:02Z

"This is different if you resolve it in a different context" is true of all paths/addresses

It's not really. Something like /ip4/123.123.123.123/tcp/1234 is unambiguous (assuming you can route to the host, etc). When resolving: /unix/foo in a tree like:

/
├─ foo
├─ bar/
│  ├─ foo
├─ baz/
│  ├─ qux/
│      ├─ foo

..which foo you get depends entirely on where you start from. Resolving /unix/%2Ffoo gets you the same foo no matter which directory is the current working directory for the resolving process, though I supposed you could be chrooted or something.

controlling the root directory is more convoluted and more disruptive for the resolving process than controlling the current directory

It's possible, if the resolving process doesn't have access to the root or some intermediate directory(s). Maybe then you'd need relative paths and accept that it's all GIGO.

Anyway, like I said, I think just percent-encoding the address and dropping the / removal/insertion is reasonable.

I think splitting unix into unix and unix-abstract makes sense

Great, it sounds like you're in favour of option 1. I'm fine with that.

I think we can make the amendments to /unix here, then add /unix-abstract (or whatever it ends up being called) as a separate PR.

I see you've added a bunch of other stuff to #179, I don't want simply being able to escape unix paths (the real win here) to get bogged down in discussion of other features.

nabijaczleweli · 2025-06-12T11:36:20Z

I'm happy with the current wording, I can rebase #179 when this is merged.

Based on multiformats#174

achingbrain requested review from lidel and MarcoPolo October 28, 2024 19:28

achingbrain commented Oct 29, 2024

View reviewed changes

protocols/unix.md Outdated Show resolved Hide resolved

chore: add slash

df9f077

achingbrain added a commit to multiformats/js-multiaddr-matcher that referenced this pull request Nov 4, 2024

feat: add unix address matchers

72b0c54

Need to resolve how to encode unix paths so peer ids can be appended to them - multiformats/multiaddr#174

achingbrain mentioned this pull request Nov 4, 2024

feat: add unix address matchers multiformats/js-multiaddr-matcher#41

Merged

achingbrain mentioned this pull request Jun 4, 2025

define how to handle / in component values when represented as a string #178

Open

MarcoPolo approved these changes Jun 4, 2025

View reviewed changes

lidel mentioned this pull request Jun 9, 2025

refactor: support percent-encoded /unix paths ipfs/kubo#10833

Draft

1 task

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 11, 2025

transports/unix-stream: add

bb9c8cd

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli mentioned this pull request Jun 11, 2025

feat(transports/unix-stream): add libp2p/rust-libp2p#6056

Open

4 tasks

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 11, 2025

transports/unix-stream: add

cf4a285

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 11, 2025

feat(transports/unix-stream): add

7474677

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 12, 2025

feat(transports/unix-stream): add

29961e6

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 12, 2025

feat(transports/unix-stream): add

4d5f60e

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 12, 2025

feat(transports/unix-stream): add

19d5dd4

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 12, 2025

feat(transports/unix-stream): add

eebc01c

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/rust-libp2p that referenced this pull request Jun 12, 2025

feat(transports/unix-stream): add

5305280

Ripped 100% off transports/tcp Uses proposed encoding from multiformats/multiaddr#174 (comment)

nabijaczleweli added a commit to nabijaczleweli/multiaddr that referenced this pull request Jun 12, 2025

Specify /unix. Add /unix-abstract. Add /stream, /seqpacket, /dgram

1a494bd

Based on multiformats#174

nabijaczleweli mentioned this pull request Jun 12, 2025

Specify /unix. Add /unix-abstract. Add /stream, /seqpacket, /dgram #179

Open

2 tasks

nabijaczleweli added a commit to nabijaczleweli/multiaddr that referenced this pull request Jun 12, 2025

Specify /unix. Add /unix-abstract. Add /stream, /seqpacket, /dgram

eb8e465

Based on multiformats#174

docs: just encode the path as-is

4ea4411

achingbrain merged commit 8f9c2ee into master Jun 16, 2025
1 check passed

achingbrain deleted the define-unix branch June 16, 2025 09:00

nabijaczleweli added a commit to nabijaczleweli/multiaddr that referenced this pull request Jun 16, 2025

Specify /unix. Add /unix-abstract. Add /stream, /seqpacket, /dgram

b690498

Based on multiformats#174

Define /unix #174

Define /unix #174

Uh oh!

Conversation

achingbrain commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before Merge

Uh oh!

Uh oh!

MarcoPolo commented Nov 5, 2024

Uh oh!

aschmahmann commented Nov 5, 2024

Uh oh!

aschmahmann commented Nov 5, 2024

Uh oh!

achingbrain commented Nov 8, 2024

Uh oh!

achingbrain commented Mar 14, 2025

Uh oh!

ntninja commented Mar 14, 2025

Uh oh!

achingbrain commented Mar 14, 2025

Uh oh!

ntninja commented Mar 22, 2025

Uh oh!

dhuseby commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achingbrain commented May 7, 2025

Uh oh!

achingbrain commented May 28, 2025

Uh oh!

MarcoPolo left a comment

Choose a reason for hiding this comment

Uh oh!

achingbrain commented Jun 5, 2025

Uh oh!

nabijaczleweli commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nabijaczleweli commented Jun 12, 2025

Uh oh!

achingbrain commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achingbrain commented Jun 12, 2025

Uh oh!

nabijaczleweli commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nabijaczleweli commented Jun 12, 2025

Uh oh!

achingbrain commented Jun 12, 2025

Uh oh!

nabijaczleweli commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

Define `/unix` #174

Define `/unix` #174

achingbrain commented Oct 28, 2024 •

edited

Loading

dhuseby commented May 7, 2025 •

edited

Loading

nabijaczleweli commented Jun 11, 2025 •

edited

Loading

achingbrain commented Jun 12, 2025 •

edited

Loading

nabijaczleweli commented Jun 12, 2025 •

edited

Loading