The dag
API - One API to manipulate all the IPLD Format objects #2882
Description
We need to come up with an API to manipulate IPLD Format objects.
Currently, go-ipfs master
ships with a dag
API that offers get
and put
methods, it doesn't expose yet a dag resolve
API.
For reference, here are the help texts:
» ipfs dag --help
USAGE
ipfs dag - Interact with ipld dag objects.
SYNOPSIS
ipfs dag
DESCRIPTION
'ipfs dag' is used for creating and manipulating dag objects.
This subcommand is currently an experimental feature, but it is intended
to deprecate and replace the existing 'ipfs object' command moving forward.
SUBCOMMANDS
ipfs dag get <cid> - Get a dag node from ipfs.
ipfs dag put <object data> - Add a dag node to ipfs.
Use 'ipfs dag <subcmd> --help' for more information about each command.
» ipfs dag get --help
USAGE
ipfs dag get <cid> - Get a dag node from ipfs.
SYNOPSIS
ipfs dag get [--] <cid>
ARGUMENTS
<cid> - The cid of the object to get
DESCRIPTION
'ipfs dag get' fetches a dag node from ipfs and prints it out in the specifed format.
» ipfs dag put --help
USAGE
ipfs dag put <object data> - Add a dag node to ipfs.
SYNOPSIS
ipfs dag put [--format=<format> | -f] [--input-enc=<input-enc>] [--] <object data>
ARGUMENTS
<object data> - The object to put
OPTIONS
-f, --format string - Format that the object will be added as. Default: cbor.
--input-enc string - Format that the input object will be. Default: json.
DESCRIPTION
'ipfs dag put' accepts input from a file or stdin and parses it
into an object of the specified format.
While, in js-ipfs
, we have a pretty much straight out copy of this API, defined as an interface at: https://github.com/ipfs/interface-ipfs-core/tree/master/API/dag and an resolve
API exposed by the IPLD Resolver that goes as (simple as) follows:
Note: this function is capable of resolving through different formats.
We need to complete the dag API definition, taking into account the following issues
Current shortcomings
It is impossible to ensure that the right type is returned when using a non-strict IPLD Format
To help understand this issue, let's define as a strict IPLD Format something like dag-pb, eth-block, git-block and other Merkle Data Structures that have been predefined and that its format follows a structure. non-strict IPLD Formats are (so far, we have one main case) data structures like dag-cbor, which have no definition when it comes to the keys and the value types of its data.
When resolving through non-strict IPLD Formats, the entity that requests for a .resolve
to happen can't tell which is the type of the value that is going to be returned. This problem can be somewhat mitigated due to some languages support to type inference (or others that don't have type systems at all), it is an unavoidable problem when we have to pass a node through a transport like http. Let's illustrate the issue
//Imagine we have an object that is stored in cbor that looks like
{
name: 'fancy-music.mp3',
data: new Buffer(<bytes of fancy-music.mp3>)
}
// This object can be serialized and deserialize as many times we want,
// since cbor has a 1:1 mapping with JSON, however, if a http client
// requests this object, it will have to be JSON.stringify'ed in order to
// pass through the wire and so, the previous object will be converted to:
"{ \"name\": \"fancy-music.mp3\", \"data\": { \"type\": \"Buffer\" \"data\": [<array of bytes>] }"
// Now, if we do JSON.parse, we get:
{
name: 'fancy-music.mp3',
data: {
type: 'Buffer',
data: [<array of bytes>]
}
Now the client, would have to know that in the context of this application, the data
field is a Buffer and cast it manually, but this has to be application specific, which makes it specially hard to work with.
Another case, is what happens today, is that go-ipfs base64 any buffer it has to send and convers to a string, so in fact the returned object form a go-ipfs http-api would look on the wire bore like:
"{ \"name\": \"fancy-music.mp3\", \"data\": \"base64encodedArrayofBytes\" }"
This is ok for dag-pb
, because we can easily cast since we always know that data
in dag-pb
needs to be a buffer.
The user needs to know which type is going to be returned when doing an ipld.resolve across multiple IPLD Formats
In a similar way, each time a CID/path gets resolved and a change of IPLD Format is perfomed, the receiver needs to know before hand what is going to be the data type of the returned object.
Proposed solutions
1:1 JSON mapping. In order to support the weird casting, every IPLD Format would require to have a 1:1 mapping with JSON (toJSON, fromJSON methods), which is something non trivial (even if we reduced the scope for 'every object needs to be created first in its native serialized format and then converted to JSON'.
Last mile resolve. Another suggestion would be to do a last mile resolve
, where what gets passed on the wire is the last IPLD node serialized (in a block) and the client deserializes that node and resolves any remainderPath, being able to capture the right type for that object.
Boxing of values. We can also considere the boxing of values, where every value is passed around as a byte array inside a 'box' and that box also has a label saying its type, so that the consumer can properly cast it to the right type
Notes
We still haven't had the chance to have a long discussion about this dag API
, this issue purpose is to collect ideas and get feedback on the proposals.
@whyrusleeping here is the brain dump including notes from our chat yesterday, please add anything that I missed :)