Skip to content

Commit ac9ad81

Browse files
gkelloggTallTedafsyamdan
authored
Canonicalization (#17)
* Improve canonicalization section. * Reference issue #16 as a future direction for canonicalization. * Add prohibition on using a datatype IRI if the datatype is xsd:string when canonicalizing. * Update change note on PN_CHARS_U to describe the change in blank node representation. * White space updates. * Change note motivating the use of canonical N-Quads. * Sync recent changes to w3c/rdf-concepts#16. * Fix IRI term references. * Update note motivating canonical N-Quads. --------- Co-authored-by: Ted Thibodeau Jr <[email protected]> Co-authored-by: Andy Seaborne <[email protected]> Co-authored-by: Dan Yamamoto <[email protected]>
1 parent 274e733 commit ac9ad81

File tree

1 file changed

+51
-31
lines changed

1 file changed

+51
-31
lines changed

spec/index.html

Lines changed: 51 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ <h2>Introduction</h2>
9393
also known as a <a data-cite="RDF12-CONCEPTS#dfn-quad">quad</a>.
9494
These may be separated by white space (spaces <code>#x20</code> or tabs <code>#x9</code>).
9595
This sequence is terminated by a '<code>.</code>'
96-
(optionaly proceded by white space and/or a comment),
96+
(optionally followed by white space and/or a comment),
9797
and a new line (optional at the end of a document).</p>
9898

9999
<pre id="ex-comments" class="example nquads" data-transform="updateExample"
@@ -125,7 +125,7 @@ <h2>N-Quads Language</h2>
125125
<a data-cite="RDF12-CONCEPTS#dfn-predicate">predicate</a>,
126126
<a data-cite="RDF12-CONCEPTS#dfn-object">object</a>, an optional
127127
<a data-cite="RDF12-CONCEPTS#dfn-graph-name">graph name</a>
128-
and optional blank lines.
128+
and optional <a>blank lines</a>.
129129
Comments may be given after a '<code>#</code>' that is not part of
130130
another lexical token and continue to the end of the line.</p>
131131

@@ -234,10 +234,6 @@ <h3>RDF Blank Nodes</h3>
234234
_:bob <http://xmlns.com/foaf/0.1/knows> _:alice .
235235
-->
236236
</pre>
237-
<p class="issue" data-number="2">
238-
Note open <a href="https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_30">erratum</a> on aligning the definition and production
239-
for blank node lables with Turtle.
240-
</p>
241237
</section>
242238
</section>
243239

@@ -248,36 +244,56 @@ <h2>A Canonical form of N-Quads</h2>
248244
less variability in layout.
249245
The grammar for the language is the same.</p>
250246

251-
<p class="note">Even when not explicitly serializing
252-
canonical N-Quads, implementers are encouraged to produce this form.</p>
247+
<p class="note">A canonical form of N-Quads can be used to ensure
248+
that variations in the syntactic representation of terms
249+
within that quad are determined; each code point
250+
can be represented by only one of
251+
<code><a href="#grammar-production-UCHAR">UCHAR</a></code>,
252+
<code><a href="#grammar-production-ECHAR">ECHAR</a></code>,
253+
or unencoded character,
254+
where the relevant production allows for a choice in representation.</p>
253255

254256
<p>Canonical N-Quads has the following additional constraints on layout:</p>
255257
<ul>
256-
<li>The white space following <code>subject</code>,
258+
<li>White space MUST NOT be used except after
259+
<code>subject</code>,
257260
<code>predicate</code>,
258261
<code>object</code>,
259-
and <code>graphLabel</code> if present, MUST be a single space,
260-
(<code>U+0020</code>). All other locations that allow
261-
white space MUST be empty.</li>
262-
<li>There MUST be no comments.</li>
263-
<li><code>HEX</code> MUST use only uppercase letters (<code>[A-F]</code>).</li>
264-
<li>Characters MUST NOT be represented by <code>UCHAR</code>.</li>
262+
and <code>graphLabel</code>,
263+
any of which MUST be a single space (<code>U+0020</code>).</li>
264+
<li><a data-cite="RDF12-CONCEPTS#literal">Literals</a> with the
265+
datatype <code>http://www.w3.org/2001/XMLSchema#string</code>
266+
MUST NOT use the datatype IRI part of the <a href="#grammar-production-literal">literal</a>,
267+
and are represented using only <a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a>.
268+
</li>
269+
<!--li><code><a href="#grammar-production-HEX">HEX</a></code> MUST use only uppercase letters (<code>[A-F]</code>).</li-->
270+
<li>Characters MUST NOT be represented by <code><a href="#grammar-production-UCHAR">UCHAR</a></code>.</li>
265271
<li>Within <a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a>,
266-
only the characters
272+
the characters
267273
<code>U+0022</code>, <code>U+005C</code>, <code>U+000A</code>, <code>U+000D</code>
268-
are encoded using <code>ECHAR</code>.
269-
<code>ECHAR</code> MUST NOT be used for characters that are
274+
MUST be encoded using <code><a href="#grammar-production-ECHAR">ECHAR</a></code>.
275+
<code><a href="#grammar-production-ECHAR">ECHAR</a></code> MUST NOT be used for characters that are
270276
allowed directly in
271-
<a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a>. </li>
272-
<li>The token EOL MUST be a single <code>U+000A</code>.</li>
273-
<li>The final EOL MUST be provided.</li>
277+
<code><a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a></code>. </li>
278+
<li>The token <code><a href="#grammar-production-EOL">EOL</a></code> MUST be a single <code>U+000A</code>.</li>
279+
<li>The final <code><a href="#grammar-production-EOL">EOL</a></code> MUST be provided.</li>
274280
</ul>
275-
<div class="issue" data-number="2">
276-
Note open errata:
277-
<ul>
278-
<li><a href="https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_32">32 – Issues with N-Quads/N-Triples canonicalization</a>.</li>
279-
<li><a href="https://www.w3.org/2001/sw/wiki/RDF1.1_Errata#erratum_33">33 – Ambiguity of canonical N-Triples</a></li>
280-
</ul>
281+
282+
<div class="issue" data-number="16">
283+
<p>Re-consider the use of `UCHAR` and `ECHAR` escapes in N-Triples/N-Quads canonicalization.
284+
The 1.1-based recommendation prohibits the use of `UCHAR` (`U+XXXX`)
285+
and allows `ECHAR` only for `U+0022` (quote `\"`),
286+
`U+005C` (backslash `\\`),
287+
`U+000A` (<code title="LINE FEED"><sub>LF</sub></code> `\n`),
288+
and `U+000D` (<code title="CARRIAGE RETURN"><sub>CR</sub></code> `\r`).
289+
However, the use of control characters can obfuscate text when presented,
290+
creating a potential security concern.</p>
291+
292+
<p>A future version may consider requiring all characters between
293+
`U+0000` and `U+001F` (other than `U+000A` (<code title="LINE FEED"><sub>LF</sub></code>)
294+
and `U+000D` (<code title="CARRIAGE RETURN"><sub>CR</sub></code>))
295+
along with `U+007F` (<code title="delete"><sub>DEL</sub></code>)
296+
to be represented using `UCHAR`.</p>
281297
</div>
282298
</section>
283299

@@ -395,9 +411,9 @@ <h3>White Space</h3>
395411

396412
<p>White space is significant in the production <a href="#grammar-production-STRING_LITERAL_QUOTE">STRING_LITERAL_QUOTE</a>.</p>
397413

398-
<p>A blank line, consisting of only white space and/or a comment,
414+
<p>A <dfn class="no=export">blank line</dfn>, consisting of only white space and/or a comment,
399415
may appear wherever a <code><a href="#grammar-production-statement">statement</a></code> production is allowed,
400-
and are treated as white space.</p>
416+
and is treated as white space.</p>
401417

402418
<p class="note">As with, N-Triples [[RDF12-N-TRIPLES]],
403419
N-Quads allows only horizontal white space (tab U+0009 or space U+0020).</p>
@@ -528,7 +544,7 @@ <h2>Changes between RDF 1.1 and RDF 1.2</h2>
528544
<li>Better align the use of white space and comments with [[RDF12-TURTLE]].</li>
529545
<li>Removed language about white space use between terminals that would otherwise
530546
be (mis-)recognized, is this can't happen in N-Triples.</li>
531-
<li>Clarify the use of blank lines, including those composed of white space
547+
<li>Clarify the use of <a>blank lines</a>, including those composed of white space
532548
and/or comments.
533549
Comments can appear at the end of a triple before the newline as
534550
was already evident from <a href="#ex-comments"></a>.</li>
@@ -537,7 +553,11 @@ <h2>Changes between RDF 1.1 and RDF 1.2</h2>
537553
<a href="#sec-grammar-comments">Comments</a>,
538554
better mirroring [[RDF12-TURTLE]].</li>
539555
<li>Updated the <a href="#grammar-production-PN_CHARS_U">PN_CHARS_U</a>
540-
grammar production to be consisten with with Turtle.</li>
556+
grammar production to be consistent with Turtle.
557+
Formerly, <a href="#grammar-production-PN_CHARS_U">PN_CHARS_U</a>
558+
included "`:`" in N-Triples and N-Quads, but not in Turtle nor TriG.
559+
<a href="#grammar-production-PN_CHARS_U">PN_CHARS_U</a> is a component
560+
of <a href="#grammar-production-PN_CHARS_U">BLANK_NODE_LABEL</a>.</li>
541561
<li>Separated <a href="#security"></a> from <a href="#sec-mediatype"></a>
542562
and updated language.</li>
543563
</ul>

0 commit comments

Comments
 (0)