-
-
Notifications
You must be signed in to change notification settings - Fork 32k
util: expose toUSVString #39814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: expose toUSVString #39814
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -17,6 +17,7 @@ const { | |||||
Promise, | ||||||
ReflectApply, | ||||||
ReflectConstruct, | ||||||
RegExpPrototypeExec, | ||||||
RegExpPrototypeTest, | ||||||
SafeMap, | ||||||
SafeSet, | ||||||
|
@@ -27,6 +28,10 @@ const { | |||||
SymbolFor, | ||||||
} = primordials; | ||||||
|
||||||
const { | ||||||
toUSVString: _toUSVString, | ||||||
} = internalBinding('url'); | ||||||
|
||||||
const { | ||||||
hideStackFrames, | ||||||
codes: { | ||||||
|
@@ -53,6 +58,18 @@ const experimentalWarnings = new SafeSet(); | |||||
|
||||||
const colorRegExp = /\u001b\[\d\d?m/g; // eslint-disable-line no-control-regex | ||||||
|
||||||
const unpairedSurrogateRe = | ||||||
/(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Patch: #39891 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried both this and negative lookbehind but they are slower. import Benchmark from 'benchmark';
const re1 =
/(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/;
const re2 =
/(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/;
const re3 = /\p{Surrogate}/u;
const str = 'foo bar baz qux\uD800';
const suite = new Benchmark.Suite();
suite.add('re1', function () {
re1.exec(str);
});
suite.add('re2', function () {
re2.exec(str);
});
suite.add('re3', function () {
re3.exec(str);
});
suite.on('cycle', function (event) {
console.log(String(event.target));
});
suite.on('complete', function () {
console.log('Fastest is ' + this.filter('fastest').map('name'));
});
suite.run({ async: true });
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It’s currently slower for the one specific scenario you’re testing, yes. Change the test string to something like: const str = 'foo bar baz qux\uD800 foo \uDBFF foo \uDC00 foo \uDFFF'; …and now
Change it to const str = '\uD800_\uDC00 foo \uD800 foo \uDBFF foo \uDC00 foo \uDFFF'; …and now
It’s possible to construct a benchmark that shows any three options as the “fastest”, but I don’t think it’s very meaningful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
by a tiny margin and I think because the surrogate is at the beginning of the string. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO, we should optimize for the case where there are no surrogates. We could even have false positives if that helps with performance since this is just an optimization to avoid calling into native code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
A benchmark where
I agree, no one will notice this in a real world application but switching to a slower option for a minor readability improvement seems a bit silly to me. The original regex is not that complex. Also, fwiw, I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@ronag I tend to agree — that’s probably the common case. All three solutions perform similarly in this case: const str = 'a'.repeat(1024 * 1024);
Do we all agree that all other things equal,
@lpinca So you’d reject a readability improvement unless it also happens to double performance? Woah, that’s a high bar :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, a 2x performance drop does not justify a minor readability improvement in my opinion. Anyway I'm not blocking #39891. Just wanted to report this because I tried with the negative lookbehind regex ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FWIW I would be happy with this readability improvement if performance was more or less the same for all inputs. This seems to be the case only for some inputs. Some inputs show a 100% performance drop, in those cases I would like to see a 100% improvement. I don't want it to be always 100% faster. If performance is the same ± 10% for all inputs, then that's great. |
||||||
function toUSVString(val) { | ||||||
const str = `${val}`; | ||||||
// As of V8 5.5, `str.search()` (and `unpairedSurrogateRe[@@search]()`) are | ||||||
// slower than `unpairedSurrogateRe.exec()`. | ||||||
const match = RegExpPrototypeExec(unpairedSurrogateRe, str); | ||||||
if (!match) | ||||||
return str; | ||||||
return _toUSVString(str, match.index); | ||||||
} | ||||||
|
||||||
let uvBinding; | ||||||
|
||||||
function lazyUv() { | ||||||
|
@@ -487,6 +504,7 @@ module.exports = { | |||||
sleep, | ||||||
spliceOne, | ||||||
structuredClone, | ||||||
toUSVString, | ||||||
removeColors, | ||||||
|
||||||
// Symbol used to customize promisify conversion | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -148,6 +148,8 @@ assert.strictEqual(util.isFunction(function() {}), true); | |
assert.strictEqual(util.isFunction(), false); | ||
assert.strictEqual(util.isFunction('string'), false); | ||
|
||
assert.strictEqual(util.toUSVString('string\ud801'), 'string\ufffd'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally we’d test lone lead surrogates, lone trail surrogates, and also surrogate pairs (to ensure those are unaffected). Patch: #39891 |
||
|
||
{ | ||
assert.strictEqual(util.types.isNativeError(new Error()), true); | ||
assert.strictEqual(util.types.isNativeError(new TypeError()), true); | ||
|
Uh oh!
There was an error while loading. Please reload this page.