Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

Commit d422607

Browse files
Define how to extract the sourceMappingURL comment
This patch explicitly defines how to extract such comments from JavaScript, CSS and WebAssembly sources. It defines multiple ways to do so: either by actually parsing the code, or by just going through all the lines of the program looking for what "looks like" a comment. This is so that different implementations can choose what's best for them, depending on whether they are already parsing the code or not. To ensure consist behavior accross implementations that choose different strategies, the specification enforces additional requirements on tools that append a `sourceMappingURL` comment to the generated code: the comment must be placed in such a way that all extraction methods yield the same result. This is not an unresonable burden, since if the progeram is syntactically valid, simply adding the comment at the end of the file only potentially followed by other tool-injected comments is enough. This requirement is lifted if the input code given to the tool is already "maliciously crafted", since we would otherwise require tool to go rewrite that code (for example, splitting strings that contain something that looks like a comment). I have left the CSS extraction method as TODO because first I want to check how do you feel about the JS one. It has the following properties: - It iterates line by line. Implementations can thus optimize it by going through each line _in reverse order_, and then scanning through its characters from the beginning to the end (which is what a regexp would do). - It expects multi-line comments to actually be in a single line. - It returns the last `sourceMappingURL` comment (or well, comment-like) found in the source. - It only considers comments after the last piece of code (i.e. it discards any comment found so far every time it sees some non-comment non-whitespace characters). - It has no requirements about what is _before_ a comment. Adding the comment at the end of the file without first ensuring that there is a newline before it is valid.
1 parent fbcf32f commit d422607

File tree

1 file changed

+205
-38
lines changed

1 file changed

+205
-38
lines changed

source-map.bs

+205-38
Original file line numberDiff line numberDiff line change
@@ -24,12 +24,29 @@ spec:html; type:element;
2424
text:title
2525
text:link
2626

27-
spec:bikeshed-1; type:dfn; for:railroad; text:optional
28-
29-
spec:fetch; type:dfn; for:/; text:request
30-
spec:fetch; type:dfn; for:/; text:response
27+
spec:fetch; type:dfn; for:/;
28+
text:request
29+
text:response
3130

3231
spec:url; type:dfn; for:/; text:url
32+
33+
spec:infra; type:dfn;
34+
text:list
35+
for:list; text:for each
36+
</pre>
37+
<pre class="anchors">
38+
urlPrefix:https://tc39.es/ecma262/#; type:dfn; spec:ecmascript
39+
url:sec-lexical-and-regexp-grammars; text:tokens
40+
url:table-line-terminator-code-points; text:line terminator code points
41+
url:sec-white-space; text: white space code points
42+
url:prod-SingleLineComment; text:single-line comment
43+
url:prod-MultiLineComment; text:multi-line comment
44+
url:prod-MultiLineComment; text:multi-line comment
45+
url:sec-regexpbuiltinexec; text:RegExpBuiltinExec
46+
47+
urlPrefix:https://webassembly.github.io/spec/core/; type:dfn; spec:wasm
48+
url:binary/modules.html#binary-customsec; text:custom section
49+
url:appendix/embedding.html#embed-module-decode; text:module_decode
3350
</pre>
3451

3552
<pre class="biblio">
@@ -59,17 +76,18 @@ spec:url; type:dfn; for:/; text:url
5976
"status": "archive",
6077
"title": "Give your eval a name with //@ sourceURL"
6178
},
79+
"ECMA-262": {
80+
"href": "https://tc39.es/ecma262/",
81+
"id": "esma262",
82+
"publisher": "ECMA",
83+
"status": "Standards Track",
84+
"title": "ECMAScript® Language Specification"
85+
},
6286
"V2Format": {
6387
"href": "https://docs.google.com/document/d/1xi12LrcqjqIHTtZzrzZKmQ3lbTv9mKrN076UB-j3UZQ/edit?hl=en_US",
6488
"publisher": "Google",
6589
"title": "Source Map Revision 2 Proposal"
6690
},
67-
"WasmCustomSection": {
68-
"href": "https://www.w3.org/TR/wasm-core-2/binary/modules.html#custom-section",
69-
"publisher": "W3C",
70-
"status": "Living Standard",
71-
"title": "WebAssembly custom section"
72-
},
7391
"WasmNamesBinaryFormat": {
7492
"href": "https://www.w3.org/TR/wasm-core-2/binary/values.html#names",
7593
"publisher": "W3C",
@@ -339,38 +357,12 @@ to have some conventions for the expected use-case of web server-hosted JavaScri
339357
There are two suggested ways to link source maps to the output. The first requires server
340358
support in order to add an HTTP header and the second requires an annotation in the source.
341359

342-
The HTTP header should supply the source map URL reference as:
343-
344-
```
345-
sourcemap: <url>
346-
```
347-
348-
Note: Previous revisions of this document recommended a header name of `x-sourcemap`. This
349-
is now deprecated; `sourcemap` is now expected.
350-
351-
The generated code should include a line at the end of the source, with the following form:
352-
353-
```
354-
//# sourceMappingURL=<url>
355-
```
356-
357-
Note: The prefix for this annotation was initially `//@` however this conflicts with Internet
358-
Explorer's Conditional Compilation and was changed to `//#`. Source map generators must only emit `//#`
359-
while source map consumers must accept both `//@` and `//#`.
360-
361-
Note: `//@` is needed for compatibility with some existing legacy source maps.
362-
363-
364-
This recommendation works well for JavaScript, but it is expected that other source files will
365-
have different conventions. For instance, for CSS `/*# sourceMappingURL=<url> */` is proposed.
366-
On the WebAssembly side, such a URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the custom section ([[WasmCustomSection]]) named `sourceMappingURL`.
367-
368-
`<url>` is a URL as defined in [[URL]]; in particular,
360+
Source maps are linked through URLs as defined in [[URL]]; in particular,
369361
characters outside the set permitted to appear in URIs must be percent-encoded
370362
and it may be a data URI. Using a data URI along with [=sourcesContent=] allows
371363
for a completely self-contained source map.
372364

373-
<ins>The HTTP `SourceMap` header has precedence over a source annotation, and if both are present,
365+
<ins>The HTTP `sourcemap` header has precedence over a source annotation, and if both are present,
374366
the header URL should be used to resolve the source map file.</ins>
375367

376368
Regardless of the method used to retrieve the [=Source Mapping URL=] the same
@@ -394,6 +386,181 @@ When the [=Source Mapping URL=] is not absolute, then it is relative to the gene
394386
- If the generated code is being evaluated as a string with the `eval()` function or
395387
via `new Function()`, then the [=source origin=] will be the page's origin.
396388

389+
### Linking through HTTP headers
390+
391+
If a file is served through HTTP(S) with a `sourcemap` header, the value of the header is
392+
the URL of the linked source map.
393+
394+
```
395+
sourcemap: <url>
396+
```
397+
398+
Note: Previous revisions of this document recommended a header name of `x-sourcemap`. This
399+
is now deprecated; `sourcemap` is now expected.
400+
401+
### Linking through inline annotations
402+
403+
The generated code should include a comment, or the equivalent construct depending on its
404+
language or format, named `sourceMappingURL` and that contains the URL of the source map. This
405+
specification defines how the comment should look like for JavaScript, CSS, and WebAssembly.
406+
Other languages should follow a similar convention.
407+
408+
For a given language there can be multiple ways of detecting the `sourceMappingURL` comment,
409+
to allow for different implementations to choose what is less complex for them. The generated
410+
code <dfn>unambiguously links to a source map</dfn> if the result of all the extraction methods
411+
is the same.
412+
413+
If a tool consumes one or more source files that [=unambiguously links to a source map=] and it
414+
produces an output file that links to a source map, it must do so [=unambiguously links to a
415+
source map|unambiguously=].
416+
417+
<div class="example">
418+
The following JavaScript code links to a source map, but it does not do so [=unambiguously links
419+
to a source map|unambiguously=]:
420+
421+
```js
422+
let a = `
423+
//# sourceMappingURL=foo.js.map
424+
//`;
425+
```
426+
427+
Extracing a Source Map URL from it [=extract a Source Map URL from JavaScript through
428+
parsing|through parsing=] gives null, while [=extract a Source Map URL from JavaScript
429+
without parsing|without parsing=] gives `foo.js.map`.
430+
431+
</div>
432+
433+
#### Extraction methods for JavaScript sources
434+
435+
To <dfn export>extract a Source Map URL from JavaScript through parsing</dfn> a [=string=] |source|,
436+
run the following steps:
437+
438+
1. Let |tokens| be the [=list=] of [=tokens=]
439+
obtained by parsing |source| according to [[ECMA-262]].
440+
1. [=For each=] |token| in |tokens|, in reverse order:
441+
1. If |token| is not a [=single-line comment=] or a [=multi-line comment=], return null.
442+
1. Let |comment| be the content of |token|.
443+
1. If [=match a Source Map URL in a comment|matching a Source Map URL in=]
444+
|comment| returns a [=string=], return it.
445+
446+
To <dfn export>extract a Source Map URL from JavaScript without parsing</dfn> a [=string=] |source|,
447+
run the following steps:
448+
449+
1. Let |lines| be the result of [=strictly split|strictly splitting=] |source| on [=line
450+
terminator code points|ECMAScript line terminator code points=].
451+
1. Let |lastURL| be null.
452+
1. [=For each=] |line| in |lines|:
453+
1. Let |position| be a [=position variable=] for |line|, initially pointing at the start of |line|.
454+
1. [=While=] |position| doesn't point past the end of |line|:
455+
1. [=Collect a sequence of code points=] that are [=white space code points|ECMAScript
456+
white space code points=] from |line| given |position|.
457+
458+
NOTE: The collected code points are not used, but |position| is still updated.
459+
1. If |position| points past the end of |line|, [=break=].
460+
1. Let |first| be the [=code point=] of |line| at |position|.
461+
1. Increment |position| by 1.
462+
1. If |first| is U+002F (/) and |position| does not point past the end of |line|, then:
463+
1. Let |second| be the [=code point=] of |line| at |position|.
464+
1. Increment |position| by 1.
465+
1. If |second| is U+002F (/), then:
466+
1. Let |comment| be the [=code point substring=] from |position| to the end of |line|.
467+
1. If [=match a Source Map URL in a comment|matching a Source Map URL in=]
468+
|comment| returns a [=string=], set |lastURL| to it.
469+
1. [=Break=].
470+
1. Else if |second| is U+002A (*), then:
471+
1. Let |comment| be the empty [=string=].
472+
1. While |position| + 1 doesn't point past the end of |line|:
473+
1. Let |c1| be the [=code point=] of |line| at |position|.
474+
1. Increment |position| by 1.
475+
1. Let |c2| be the [=code point=] of |line| at |position|.
476+
1. If |c1| is U+002A (*) and |c2| is U+002F (/), then:
477+
1. If [=match a Source Map URL in a comment|matching a Source Map URL in=]
478+
|comment| returns a [=string=], set |lastURL| to it.
479+
1. Increment |position| by 1.
480+
1. Append |c1| to |comment|.
481+
1. Else, set |lastURL| to null.
482+
1. Else, set |lastURL| to null.
483+
484+
Note: We reset |lastURL| to null whenever we find a non-comment code character.
485+
1. Return |lastURL|.
486+
487+
NOTE: The algorithm above has been designed so that the source lines can be iterated in reverse order,
488+
returning early after scanning through a line that contains a `sourceMappingURL` comment.
489+
490+
<div class="note">
491+
<span class="marker">Note:</span> The algorithm above is equivalent to the following JavaScript implementation:
492+
493+
```js
494+
const JS_NEWLINE = /^/m;
495+
496+
// This RegExp will always match one of the following:
497+
// - single-line comments
498+
// - "single-line" multi-line comments
499+
// - unclosed multi-line comments
500+
// - just trailing whitespaces
501+
// - a code character
502+
// The loop below differentiates between all these cases.
503+
const JS_COMMENT =
504+
/\s*(?:\/\/(?<single>.*)|\/\*(?<multi>.*?)\*\/|\/\*.*|$|(?<code>[^\/]+))/uym;
505+
506+
const PATTERN = /^[@#]\s*sourceMappingURL=(\S*?)\s*$/;
507+
508+
let lastURL = null;
509+
for (const line of source.split(JS_NEWLINE)) {
510+
JS_COMMENT.lastIndex = 0;
511+
while (JS_COMMENT.lastIndex < line.length) {
512+
let commentMatch = JS_COMMENT.exec(line).groups;
513+
let comment = commentMatch.single ?? commentMatch.multi;
514+
if (comment != null) {
515+
let match = PATTERN.exec(comment);
516+
if (match !== null) lastURL = match[1];
517+
} else if (commentMatch.code != null) {
518+
lastURL = null;
519+
} else {
520+
// We found either trailing whitespaces or an unclosed comment.
521+
// Assert: JS_COMMENT.lastIndex === line.length
522+
}
523+
}
524+
}
525+
return lastURL;
526+
```
527+
528+
</div>
529+
530+
To <dfn>match a Source Map URL in a comment</dfn> |comment| (a [=string=]), run the following steps:
531+
532+
1. Let |pattern| be the regular expression `/^[@#]\s*sourceMappingURL=(\S*?)\s*$/`.
533+
1. Let |match| be ! [=RegExpBuiltInExec=](|pattern|, |comment|).
534+
1. If |match| is not null, return |match|[1].
535+
1. Return null.
536+
537+
538+
Note: The prefix for this annotation was initially `//@` however this conflicts with Internet
539+
Explorer's Conditional Compilation and was changed to `//#`.
540+
541+
Source map generators must only emit `//#` while source map consumers must accept both `//@` and `//#`.
542+
543+
#### Extraction methods for CSS sources
544+
545+
TODO: `/*# sourceMappingURL=<url> */`
546+
547+
#### Extraction methods for WebAssembly binaries
548+
549+
To <dfn export>extract a Source Map URL from a WebAssembly source</dfn> given
550+
a [=byte sequence=] |bytes|, run the following steps:
551+
552+
1. Let |module| be [=module_decode=](|bytes|).
553+
1. If |module| is error, return null.
554+
1. [=For each=] [=custom section=] |customSection| of |module|,
555+
1. Let |name| be the `name` of |customSection|, [=UTF-8 decode without BOM or fail|decoded as UTF-8=].
556+
1. If |name| is "sourceMappingURL", then:
557+
1. Let |value| be the `bytes` of |customSection|, [=UTF-8 decode without BOM or fail|decoded as UTF-8=].
558+
1. If |value| is failure, return null.
559+
1. Return |value|.
560+
561+
Since WebAssembly is not a textual format and it does not support comments, it supports a single unambiguous extraction method.
562+
The URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the [=custom section=].
563+
397564
Linking eval'd code to named generated code
398565
-------------------------------------------
399566

0 commit comments

Comments
 (0)