Skip to content

Data round tripping - Sandro's review #237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lanthaler opened this issue Apr 1, 2013 · 12 comments
Closed

Data round tripping - Sandro's review #237

lanthaler opened this issue Apr 1, 2013 · 12 comments

Comments

@lanthaler
Copy link
Member

_This has been raised in #234 by @sandhawke. To simplify discussions, I've created this separate issue for it. Below is Sandro's original mail and my reply._

10.6 Data Round Tripping

This whole section was very confusing. Maybe add a paragraph at the start saying what you're talking about. I could never figure out if you meant round tripping (1) from RDF to JSON-LD and back to RDF or (2) from JSON-LD to RDF and back to JSON-LD.

Not sure I understand the difference!?

There was also a lot of duplication of XSD -- where you're spelling out the canonical forms -- but it's not clear whether you are just rephrasing the other spec or mean to be changing something about it. I suggest in generally it's best to not try to rephrase what other specs say.

We are just rephrasing it. Since this spec is addressing JSON developers we wanted to avoid that they have to read the XML spec. What do others think about this?

The bits of javascript are nice, but are they really examples? Hm.

I would say so. It's just an example how this could be done in one specific programming language.

Trying to make sense of this..... The point of this section seems to be to say in going JSON->RDF you need to use the canonical form. Why would that matter? I guess it would matter if when going from RDF->JSON you only convert to native types when the lexical representation is in canonical form. If that rule were in place, then I think datatypes would roundtrip perfectly. I think. I'm not seeing that rule, though, in either this section or the algorithm.

It is there to ensure that the result is deterministic and testing is simplified (you can verify the result using simple string comparison).

Considering that, do you think we need to change something?

When data such as decimals need to be normalized, JSON-LD authors should not use values that are going to undergo automatic conversion. This is due to the lossy nature of xsd:double values.

I can't quite make sense of this.

Is the word "normalized" confusing you? That's probably a left over from the normalization algorithm. What we are trying to say here is: if you have decimal values (e.g. money) you shouldn't use JSON number or a xsd:double but a string. Maybe we can just drop this sentence!?

When JSON-native numbers, are type coerced, lossless data round-tripping can not be guaranteed as rounding errors might occur.

You mean in going RDF-JSON-RDF, if you have a literal like "1.99999999999999999999999999999999E0"^^xs:double that it's like to get messed up while in JSON double form? That's true. But what are you saying to do about it? How about saying RDF->JSON converters MUST leave things like that in expanded form? Then we'd have round-tripping RDF-JSON-RDF. However, it would break JSON-RDF-JSON round tripping, if the JSON in question had a number like 1.999999999999999999999999999999E0 in it. (of course, many JSON parsers would mess that up right away; that's not really our fault that we can't round trip that.)

Yes, we mean exactly that. You should use strings instead. In most cases this won't matter and consequently I don't think the MUST you propose makes much sense. JSON developers want numbers and not strings. Just out of curiosity, isn't the same true in Turtle for instance?

@sandhawke
Copy link

So the idea here is that going RDF->JSON-LD->RDF we want to get back an isomorphic graph? (That is, the same triples, but with blank nodes replaced in a consistent way.)

I think people might want (or accept )some rewriting of literals that doesn't change the value, like "01"^xs:int being rewritten as "001"^^xs:int or "1"xs:int. In the SPARQL world that kind of rewriting is allowed, in parsing of RDF/XML or Turtle this is not allowed. (I'm pretty sure SPARQL also lets you convert "1"xs:int to "1"xs:integer, and such.) I don't really care which we decide on this, but there should be a decision and test cases, so people know what to expect.

So can literals be rewritten in RDF -> JSON-LD -> RDF or not?

If they can't, then the RDF->JSON and JSON-RDF algorithms have to line up. As you have it now JSON-RDF MUST produce canonical literals; that's fine. But we have to also say in RDF->JSON that only literals already in canonical form are to be output as native types; the rest MUST be left as strings in type objects.

If literals CAN be rewritten, then it seems to me both algorithms are free to ignore canonical form. Right? Why would we require canonical literals in JSON->RDF? Of course, canonical form is kind of a friendly/nice, it doesn't actually help with round tripping.

(The issue of handling things like 1.9999999999999999999999E0 is kind of trickier and less important, so let's settle the above one first.)

@lanthaler
Copy link
Member Author

Yes, that's correct. We already have test cases for this:

and a combined one for the reverse direction: fromRdf-0002-in.nq -> fromRdf-0002-out.jsonld.

So can literals be rewritten in RDF -> JSON-LD -> RDF or not?

Yes.

As you have it now JSON-RDF MUST produce canonical literals; that's fine. But we have to also say in RDF->JSON that only literals already in canonical form are to be output as native types; the rest MUST be left as strings in type objects.

If literals CAN be rewritten, then it seems to me both algorithms are free to ignore canonical form. Right? Why would we require canonical literals in JSON->RDF? Of course, canonical form is kind of a friendly/nice, it doesn't actually help with round tripping.

We produce canonical literals to simplify testing. Why should we convert just literals in canonical form to native types? To simplify things? Would be OK to me.

@sandhawke
Copy link

I don't think simplifying testing merits a MUST..... Or, if it does, then say that, instead of saying it's because of round-tripping....

Why should we convert just literals in canonical form to native types? To simplify things? Would be OK to me.

Yeah -- I don't have much opinion on this, as long as the story makes sense. Seems like something RDF WG folks might care about, though -- whether round-tripping an RDF graph through JSON preserves the non-canonical forms of literals (eg number of leading zeros on an integer). We might ask at the same time whether people care about values like 1.99999999999999999999999E0 being preserved through such round-tripping. Do you want to ask or shall I?

@lanthaler
Copy link
Member Author

RESOLVED: Specify what canonical lexical form is for xsd:integer and xsd:double by referencing the XML Schema 1.1 Datatypes specification. When processors are generating output, they are required to use this form.

@sandhawke
Copy link

And....? That doesn't address my questions about round tripping.

@sandhawke
Copy link

also, not to nitpick, but "processor" is a poor choice of name for things implementing the API -- as evidenced by the fact that the resolution uses that word when it should have used the term Implementation. Right?

@lanthaler
Copy link
Member Author

Sorry, as usually I just pasted the resolution we made during the telecon. We had a quite long discussion about this. The majority of the group thought it is required to achieve interoperability. I personally think that it is not required for the conversion of JSON-LD to RDF (the abstract syntax) to require canonical lexical form but wasn't able to argument it properly.

We also had problems finding some guidance in RDF Concepts. All we've found was literal equality (http://www.w3.org/TR/rdf11-concepts/#dfn-literal-equality) which requires literals to match character by character. Perhaps we should discuss this briefly in tomorrow's RDF WG telecon.

@lanthaler
Copy link
Member Author

also, not to nitpick, but "processor" is a poor choice of
name for things implementing the API -- as evidenced by the
fact that the resolution uses that word when it should have
used the term Implementation. Right?

Yes, you are completely right. What about changing Implementation to "JSON-LD 1.0 Processor" and Processor to "JSON-LD 1.0 API Implementation" (a bit clunky but definitely clearer I think)?

/cc @msporny @gkellogg @dlongley @niklasl

@sandhawke
Copy link

Yes, I agree those terms are better.

And yes, let's talk about roundtripping with the WG.

@gkellogg
Copy link
Member

gkellogg commented Apr 2, 2013

@sandhawke As we were discussion the need to specify lexical form, it seemed that RDF Concepts defines a literal as having a lexical form for value, datattype and language, and that two literals are equivalent only if the all compare equal. For the specific case of JSON numbers, it is really impossible to use the original lexical representation, as this is not maintained when a JSON document is processed. Therefore, when generating an RDF Literal, which must have a lexical representation for the native values that a processor is working with. For interoperability, it seems that we must specify a format for this.

This issue of round-tripping RDF->JSON-LD->RDF is somewhat different. The algorithm does involve a conversion to native form, but by re-using previous language, this can be controlled by an option, so that xsd:integer and xsd:double values can maintain a string representation, which eliminates such round-tripping issues.

@gkellogg
Copy link
Member

gkellogg commented Apr 2, 2013

Yes, you are completely right. What about changing Implementation to "JSON-LD 1.0 Processor" and Processor to "JSON-LD 1.0 API Implementation" (a bit clunky but definitely clearer I think)?

When talking about algorithms, I think that "processor" is a reasonable term, and consistent at least with how RDFa refers to them.

Whe talking about API, I think that talking about implementations is appropriate.

lanthaler added a commit that referenced this issue Apr 3, 2013
... and clarify parts of the relevant algorithms.

@sandhawke, could you please have a look at the new section and tell me whether it's clearer or if it still needs some love. Thanks.

This addresses #237.
lanthaler added a commit that referenced this issue Apr 4, 2013
lanthaler added a commit that referenced this issue Apr 4, 2013
@lanthaler
Copy link
Member Author

The data round-tripping section [1] has been improved considerably. Sandro already indicated that the updates address his concerns [2]. I will thus close this issue in 24 hours unless I hear objections.

[1] http://json-ld.org/spec/latest/json-ld-api/#data-round-tripping
[2] http://lists.w3.org/Archives/Public/public-rdf-wg/2013Apr/0046.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants