Skip to content

Commit 38f6ae8

Browse files
committed
Fix percent-encoding for ISO-2022-JP
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
1 parent 9637645 commit 38f6ae8

File tree

1 file changed

+42
-53
lines changed

1 file changed

+42
-53
lines changed

url.bs

Lines changed: 42 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -217,81 +217,70 @@ inclusive, and U+007E (~).
217217
all code points, except the <a>ASCII alphanumeric</a>, U+002A (*), U+002D (-), U+002E (.), and
218218
U+005F (_).
219219

220-
<p>To <dfn for="code point">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
221-
<var>encoding</var>, <a for=/>code point</a> <var>codePoint</var>, and a
222-
<var>percentEncodeSet</var>, run these steps:
220+
<p>To <dfn for=string>percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
221+
<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and an
222+
optional boolean <var>spaceAsPlus</var> (default false), run these steps:
223223

224224
<ol>
225-
<li><p>Let <var>bytes</var> be the result of <a lt=encode>encoding</a> <var>codePoint</var> using
226-
<var>encoding</var>.
225+
<li><p>Let <var>encoder</var> be the result of <a>getting an encoder</a> from <var>encoding</var>.
227226

228-
<li>
229-
<p>If <var>bytes</var> starts with 0x26 (&amp;) 0x23 (#) and ends with 0x3B (;), then:
230-
231-
<ol>
232-
<li><p>Let <var>output</var> be <var>bytes</var>, <a>isomorphic decoded</a>.
227+
<li><p>Let <var>inputQueue</var> be <var>input</var> converted to an <a for=/>I/O queue</a>.
233228

234-
<li><p>Replace the first two code points of <var>output</var> with "<code>%26%23</code>".
235-
236-
<li><p>Replace the last code point of <var>output</var> with "<code>%3B</code>".
237-
238-
<li><p>Return <var>output</var>.
239-
</ol>
229+
<li><p>Let <var>output</var> be the empty string.
240230

241-
<p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
231+
<li>
232+
<p>Let <var>potentialError</var> be 0.
242233

243-
<li><p>Let <var>output</var> be the empty string.</p></li>
234+
<p class=note>This needs to be a non-null value to initiate the subsequent while loop.
244235

245236
<li>
246-
<p>For each <var>byte</var> of <var>bytes</var>:
237+
<p>While <var>potentialError</var> is non-null:
247238

248239
<ol>
249-
<li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
250-
is <var>byte</var>'s <a for=byte>value</a>.
240+
<li><p>Let <var>encodeOutput</var> be an empty <a for=/>I/O queue</a>.
251241

252-
<li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.
242+
<li><p>Set <var>potentialError</var> to the result of running <a>encode or fail</a> with
243+
<var>inputQueue</var>, <var>encoder</var>, and <var>encodeOutput</var>.
253244

254-
<li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
255-
<var>isomorph</var> to <var>output</var>.
245+
<li>
246+
<p>For each <var>byte</var> of <var>encodeOutput</var> converted to a byte sequence:
256247

257-
<li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
258-
<var>output</var>.
259-
</ol>
248+
<ol>
249+
<li><p>If <var>spaceAsPlus</var> is true and <var>byte</var> is 0x20 (SP), then append
250+
U+002B (+) to <var>output</var>.
260251

261-
<li><p>Return <var>output</var>.
262-
</ol>
252+
<li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
253+
is <var>byte</var>'s <a for=byte>value</a>.
263254

264-
<p>To <dfn for="string">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
265-
<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and a
266-
boolean <var>spaceAsPlus</var>, run these steps:
255+
<li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.
267256

268-
<ol>
269-
<li><p>Let <var>output</var> be the empty string.</p></li>
257+
<li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
258+
<var>isomorph</var> to <var>output</var>.
270259

271-
<li>
272-
<p>For each <var>codePoint</var> of <var>input</var>:
260+
<li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
261+
<var>output</var>.
262+
</ol>
273263

274-
<ol>
275-
<li><p>If <var>spaceAsPlus</var> is true and <var>codePoint</var> is U+0020, then append
276-
U+002B (+) to <var>output</var>.
264+
<li>
265+
<p>If <var>potentialError</var> is non-null, then append "<code>%26%23</code>", followed by the
266+
shortest sequence of <a for=/>ASCII digits</a> representing <var>potentialError</var> in base
267+
ten, followed by "<code>%3B</code>", to <var>output</var>.
277268

278-
<li><p>Otherwise, run <a for="code point">percent-encode after encoding</a> with
279-
<var>encoding</var>, <var>codePoint</var>, and <var>percentEncodeSet</var>, and append the result
280-
to <var>output</var>.
269+
<p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
281270
</ol>
282271

283272
<li><p>Return <var>output</var>.
284273
</ol>
285274

286275
<p>To <dfn for="code point" id=utf-8-percent-encode>UTF-8 percent-encode</dfn> a
287276
<a for=/>code point</a> <var>codePoint</var> using a <var>percentEncodeSet</var>, return the result
288-
of running <a for="code point">percent-encode after encoding</a> with <a for=/>UTF-8</a>,
289-
<var>codePoint</var>, and <var>percentEncodeSet</var>.
277+
of running <a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>,
278+
<var>codePoint</var> as a <a for=/>string</a>, and <var>percentEncodeSet</var>.
290279

291280
<p>To <dfn export for=string>UTF-8 percent-encode</dfn> a <a for=/>string</a> <var>input</var> using
292281
a <var>percentEncodeSet</var>, return the result of running
293-
<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>,
294-
<var>percentEncodeSet</var>, and false.
282+
<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>, and
283+
<var>percentEncodeSet</var>.
295284

296285
<hr>
297286

@@ -319,20 +308,20 @@ a <var>percentEncodeSet</var>, return the result of running
319308
<td>"<code>‽%25%2E</code>"
320309
<td>0xE2 0x80 0xBD 0x25 0x2E
321310
<tr>
322-
<td rowspan=3><a for="code point">Percent-encode after encoding</a> with <a>Shift_JIS</a>,
311+
<td rowspan=3><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>,
323312
<var>input</var>, and the <a>userinfo percent-encode set</a>
324-
<td>U+0020
313+
<td>"<code> </code>"
325314
<td>"<code>%20</code>"
326315
<tr>
327-
<td>U+2261 (≡)
316+
<td>"<code></code>"
328317
<td>"<code>%81%DF</code>"
329318
<tr>
330-
<td>U+203D (‽)
319+
<td>"<code></code>"
331320
<td>"<code>%26%238253%3B</code>"
332321
<tr>
333-
<td><a for="code point">Percent-encode after encoding</a> with <a>ISO-2022-JP</a>,
334-
<var>input</var>, and the <a>userinfo percent-encode set</a>
335-
<td>U+00A5 (¥)
322+
<td><a for=string>Percent-encode after encoding</a> with <a>ISO-2022-JP</a>, <var>input</var>,
323+
and the <a>userinfo percent-encode set</a>
324+
<td>"<code>¥</code>"
336325
<td>"<code>%1B(J\%1B(B</code>"
337326
<tr>
338327
<td><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>, <var>input</var>, the

0 commit comments

Comments
 (0)