Normalize request path to NFC and remove/resolve dot segments (#273). #573

cesarblum · 2016-01-11T23:39:22Z

#273

cc @Tratcher @halter73 @blowdart

cesarblum · 2016-01-13T19:01:58Z

Ping.

halter73 · 2016-01-13T19:54:23Z

I'm curious to see how this affects perf for requests that don't need normalization (e.g. the plaintext benchmark).

cesarblum · 2016-01-13T19:56:32Z

test/Microsoft.AspNet.Server.Kestrel.FunctionalTests/RequestTests.cs

+
+        [ConditionalFact]
+        [FrameworkSkipCondition(RuntimeFrameworks.Mono, SkipReason = "Test hangs after execution on Mono.")]
+        public async Task RequestPathIsNormalized()


Note to self: add test with PathBase.

benaadams · 2016-01-13T19:58:27Z

It shouldn't effect it much as needDecode would be false in that case and it should just skip it?

i.e. it only does it to % encoded strings

halter73 · 2016-01-13T20:02:20Z

There shouldn't be much of an effect as long as PathNormalizer.NeedsNormalization isn't too expensive. I would like to know for sure that's the case.

Tratcher · 2016-01-13T23:46:42Z

src/Microsoft.AspNet.Server.Kestrel/Http/Frame.cs

@@ -775,6 +775,11 @@ protected bool TakeStartLine(SocketInput input)
                    // URI was encoded, unescape and then parse as utf8
                    pathEnd = UrlPathDecoder.Unescape(pathBegin, pathEnd);
                    requestUrlPath = pathBegin.GetUtf8String(pathEnd);
+
+                    if (PathNormalizer.NeedsNormalization(requestUrlPath))


needDecode and NeedsNormalization are now out of sync because dots don't trigger needDecode. Add a functional test because I don't think this code is actually executing right now.

cesarblum · 2016-01-19T22:05:23Z

@benaadams Actually I was wrong to apply the normalization only to the path with percent-encoded characters in the URL. Since normalization comprises of normalization to NFC + dot segment removal, I have to check for the need for normalization in the plain text path too (because there might be dot segments in the request path).

cesarblum · 2016-01-19T23:18:07Z

Comments addressed, perf test still pending.

cesarblum · 2016-01-20T00:11:08Z

There's a small perf hit in the plain text benchmark:

Before:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.67ms   12.42ms 187.86ms   94.60%
    Req/Sec    34.92k     3.16k   61.86k    77.44%
  11197736 requests in 10.10s, 1.38GB read
Requests/sec: 1108681.79
Transfer/sec:    139.57MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.96ms    8.65ms 127.63ms   91.75%
    Req/Sec    34.84k     3.09k   44.80k    72.59%
  11194786 requests in 10.10s, 1.38GB read
Requests/sec: 1108504.52
Transfer/sec:    139.54MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     9.66ms   22.11ms 378.68ms   89.15%
    Req/Sec    34.24k     3.55k   59.86k    73.48%
  10997567 requests in 10.10s, 1.35GB read
Requests/sec: 1088929.91
Transfer/sec:    137.08MB

After:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.54ms   22.43ms 334.46ms   93.61%
    Req/Sec    33.75k     3.49k   64.74k    78.30%
  10834047 requests in 10.10s, 1.33GB read
Requests/sec: 1072774.90
Transfer/sec:    135.05MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.06ms   15.59ms 285.80ms   92.87%
    Req/Sec    33.99k     2.79k   51.70k    74.58%
  10911305 requests in 10.10s, 1.34GB read
Requests/sec: 1080355.79
Transfer/sec:    136.00MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.93ms   11.70ms 254.75ms   92.00%
    Req/Sec    33.52k     3.10k   49.13k    74.06%
  10763402 requests in 10.10s, 1.32GB read
Requests/sec: 1065676.09
Transfer/sec:    134.15MB

Ratio between worst-after and best-before is 0.9786.

cesarblum · 2016-01-20T00:18:56Z

The hit is significant when decoding + normalization are needed:

Before:

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.79ms    7.68ms 200.81ms   96.34%
    Req/Sec    29.86k     3.22k   51.04k    73.78%
  9585223 requests in 10.10s, 1.18GB read
Requests/sec: 949067.03
Transfer/sec:    119.47MB

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.21ms    9.23ms 129.26ms   94.94%
    Req/Sec    30.78k     2.88k   46.97k    78.71%
  9882682 requests in 10.10s, 1.21GB read
Requests/sec: 978584.03
Transfer/sec:    123.19MB

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.91ms   12.42ms 235.40ms   94.12%
    Req/Sec    31.24k     2.58k   49.29k    77.63%
  10036237 requests in 10.10s, 1.23GB read
Requests/sec: 993710.05
Transfer/sec:    125.09MB

After:

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.75ms    3.77ms  94.38ms   94.26%
    Req/Sec    22.60k     1.53k   32.20k    79.50%
  7252043 requests in 10.10s, 0.89GB read
Requests/sec: 718004.23
Transfer/sec:     90.39MB

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.56ms    3.10ms  58.15ms   91.99%
    Req/Sec    22.61k     1.54k   31.35k    77.02%
  7255969 requests in 10.10s, 0.89GB read
  Socket errors: connect 0, read 0, write 81, timeout 0
Requests/sec: 718417.96
Transfer/sec:     90.44MB

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.51ms    2.89ms  54.24ms   88.69%
    Req/Sec    22.50k     1.49k   32.51k    78.47%
  7218541 requests in 10.10s, 0.89GB read
Requests/sec: 714747.15
Transfer/sec:     89.98MB

cesarblum · 2016-01-20T00:21:44Z

@benaadams NFC normalization will only take place in paths with percent-encoded characters.

I'm making a small change that might make a difference - I'll only try to remove dot segments if I detect any. This will save cycles and allocations.

benaadams · 2016-01-20T00:22:25Z

src/Microsoft.AspNet.Server.Kestrel/Http/PathNormalizer.cs

+        {
+            if (path.IndexOf('/') > -1)
+            {
+                var normalizedChars = new char[path.Length];


System.Buffers?

Or jump in before the path string is created and work on the byte buffer?

I'm not familiar with it. Any pointers/examples?

halter73 · 2016-01-20T00:22:26Z

Do you have any data for normalizing ../, or ./ in paths?

cesarblum · 2016-01-20T00:25:07Z

@halter73 Gathering some.

Tratcher · 2016-01-20T00:25:27Z

src/Microsoft.AspNet.Server.Kestrel/Http/PathNormalizer.cs

+                path = new string(normalizedChars, normalizedIndex, normalizedChars.Length - normalizedIndex);
+            }
+
+            if (!path.IsNormalized(NormalizationForm.FormC))


I don't see any reason dot compression and unicode normalization have to be run at the same time. If you separate them then you can limit the unicode normalization to only run in the needDecode scenario.

You're right, will change.

cesarblum · 2016-01-20T00:38:21Z

Got some perf back by avoiding dot segment removal when not necessary:

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.91ms    5.80ms 173.92ms   95.48%
    Req/Sec    24.26k     1.88k   37.44k    75.29%
  7790526 requests in 10.10s, 0.96GB read
Requests/sec: 771378.77
Transfer/sec:     97.11MB

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.52ms    4.20ms 102.13ms   94.37%
    Req/Sec    24.24k     1.69k   36.02k    77.64%
  7780465 requests in 10.10s, 0.96GB read
Requests/sec: 770358.86
Transfer/sec:     96.98MB

Running 10s test @ http://10.0.0.100:5001/plaintext/%41%CC%8A
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.63ms    4.17ms 113.08ms   93.35%
    Req/Sec    24.48k     1.76k   36.00k    74.83%
  7859579 requests in 10.10s, 0.97GB read
Requests/sec: 778191.01
Transfer/sec:     97.96MB

@Tratcher suggested a better change which I'll implement now.

cesarblum · 2016-01-20T00:55:48Z

New plain text numbers. A veeery small improvement with suggested changes:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.83ms   16.09ms 271.85ms   93.81%
    Req/Sec    34.11k     2.59k   48.42k    75.19%
  10949164 requests in 10.10s, 1.35GB read
Requests/sec: 1084253.42
Transfer/sec:    136.49MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.68ms   10.46ms 209.49ms   93.48%
    Req/Sec    34.01k     2.61k   47.20k    76.08%
  10923702 requests in 10.10s, 1.34GB read
Requests/sec: 1081589.99
Transfer/sec:    136.16MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.48ms   17.26ms 297.70ms   93.62%
    Req/Sec    33.99k     3.81k   60.87k    85.81%
  10857619 requests in 10.10s, 1.33GB read
Requests/sec: 1075094.28
Transfer/sec:    135.34MB

halter73 · 2016-01-20T01:08:56Z

src/Microsoft.AspNet.Server.Kestrel/Http/PathNormalizer.cs

+            return path;
+        }
+
+        private static bool ContainsDotSegments(string path)


Can we use MemoryPool2Iterator2.Seek to find any '/' or '.' characters? I think we need to ask @troydai if this could work and if any utf8 characters could be normalized to a '/' or a '.'.

Check back on find to see if previous byte has high byte set? (e.g. >= 128)

@benaadams I don't follow. How would that work?

@CesarBS actually you don't need to back check; can remove dots before, after or during ut8 encoding
@halter73 for Url Path Encoding %2E is a valid ., however, in utf8 bytes only . is . and only / is /

Good table at start of https://en.wikipedia.org/wiki/UTF-8#Description

Backward compatibility: One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0. This means that ASCII text is valid UTF-8, and UTF-8 can be used for parsers expecting 8-bit extended ASCII even if they are not designed for UTF-8.

Clear distinction between multi-byte and single-byte characters: Code points larger than 127 are represented by multi-byte sequences, composed of a leading byte and one or more continuation bytes. The leading byte has two or more high-order 1s followed by a 0, while continuation bytes all have '10' in the high-order position.

And later as part of advantages

UTF-8 uses the codes 0–127 only for the ASCII characters. This means that UTF-8 is an ASCII extension and can be processed by software that supports 7-bit characters and assigns no meaning to non-ASCII bytes.

halter73 · 2016-01-20T19:49:36Z

We're hoping to move the path/dotsegment normilization logic to https://github.com/aspnet/FileSystem

cesarblum · 2016-01-20T22:06:12Z

Changing ContainsDotSegments to use a pointer instead of an index improved things:

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.33ms   14.74ms 263.96ms   93.40%
    Req/Sec    35.23k     3.33k   61.66k    79.25%
  11212229 requests in 10.10s, 1.38GB read
  Socket errors: connect 0, read 0, write 542, timeout 0
Requests/sec: 1110137.07
Transfer/sec:    139.75MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.64ms   16.03ms 243.53ms   93.68%
    Req/Sec    35.10k     3.50k   63.18k    84.33%
  11226167 requests in 10.10s, 1.38GB read
Requests/sec: 1111526.44
Transfer/sec:    139.92MB

Running 10s test @ http://10.0.0.100:5001/plaintext
  32 threads and 256 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.57ms   12.28ms 236.93ms   95.26%
    Req/Sec    34.36k     2.81k   52.99k    71.87%
  11025641 requests in 10.10s, 1.36GB read
Requests/sec: 1091646.87
Transfer/sec:    137.42MB

blowdart · 2016-01-20T22:11:13Z

Path and dot segment isn't just about file systems though. So it needs to happen higher up.

Also we need to have an unnormalized property on the request for people who want to do really weird things.

Tratcher · 2016-01-20T22:19:12Z

@blowdart can we design the GetRawUrl feature later? We'd need to decide how it flows through the entire stack.

blowdart · 2016-01-20T22:20:28Z

Sure, it can wait till after RC2

Tratcher · 2016-01-20T22:21:32Z

Ok, @CesarBS file a separate bug for that

cesarblum · 2016-01-21T19:50:46Z

Filed #594.

halter73 · 2016-01-21T20:27:02Z

src/Microsoft.AspNet.Server.Kestrel/Http/Frame.FeatureCollection.cs

@@ -325,5 +325,8 @@ void IHttpRequestLifetimeFeature.Abort()
        {
            Abort();
        }
+
+        // TODO: remove before merging.


Don't forget!

halter73 · 2016-01-21T20:37:19Z

dnfclas added the cla-already-signed label Jan 11, 2016

cesarblum reviewed Jan 13, 2016
View reviewed changes

Tratcher reviewed Jan 13, 2016
View reviewed changes

cesarblum force-pushed the cesarbs/normalize-request-path branch from a2d48ce to de913c4 Compare January 19, 2016 22:40

benaadams reviewed Jan 20, 2016
View reviewed changes

Tratcher reviewed Jan 20, 2016
View reviewed changes

halter73 reviewed Jan 20, 2016
View reviewed changes

cesarblum mentioned this pull request Jan 21, 2016

Expose raw request path/URL in HTTP context #594

Closed

halter73 reviewed Jan 21, 2016
View reviewed changes

cesarblum force-pushed the cesarbs/normalize-request-path branch 2 times, most recently from ae6a3d0 to 747fdca Compare January 26, 2016 21:40

cesarblum force-pushed the cesarbs/normalize-request-path branch 2 times, most recently from 0e2b9ec to abc10a0 Compare January 28, 2016 22:24

Normalize request path to NFC and resolve dot segments (#273).

1209eca

cesarblum force-pushed the cesarbs/normalize-request-path branch from abc10a0 to 1209eca Compare January 28, 2016 23:30

cesarblum merged commit 1209eca into dev Jan 29, 2016

cesarblum deleted the cesarbs/normalize-request-path branch January 29, 2016 17:03

Normalize request path to NFC and remove/resolve dot segments (#273). #573

Normalize request path to NFC and remove/resolve dot segments (#273). #573

Uh oh!

Conversation

cesarblum commented Jan 11, 2016

Uh oh!

cesarblum commented Jan 13, 2016

Uh oh!

halter73 commented Jan 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benaadams commented Jan 13, 2016

Uh oh!

halter73 commented Jan 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cesarblum commented Jan 19, 2016

Uh oh!

cesarblum commented Jan 19, 2016

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halter73 commented Jan 20, 2016

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halter73 commented Jan 20, 2016

Uh oh!

cesarblum commented Jan 20, 2016

Uh oh!

blowdart commented Jan 20, 2016

Uh oh!

Tratcher commented Jan 20, 2016

Uh oh!

blowdart commented Jan 20, 2016

Uh oh!

Tratcher commented Jan 20, 2016

Uh oh!

cesarblum commented Jan 21, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

halter73 commented Jan 21, 2016

Uh oh!

Uh oh!