Reduce memory usage when rendering markdown by lunny · Pull Request #20326 · go-gitea/gitea

lunny · 2022-07-12T02:59:39Z

This PR removed at least two whole markdown copy when rendering a document.

delvh

Took me (surprisingly) way too long to understand where we reduce the memory usage until I noticed it...

lunny · 2022-07-12T09:11:17Z

Took me (surprisingly) way too long to understand where we reduce the memory usage until I noticed it...

Not only that, I also removed rawHTML, err := io.ReadAll(input) which also copied the whole content. That means, previously there are at least 2 copy of the content in the memory and now they are gone.

Co-authored-by: delvh <dev.lh@web.de>

delvh · 2022-07-12T09:20:40Z

rawHTML, err := io.ReadAll(input) - yeah, that was the thing that took me so long.
But what is the other reduction I missed?
The replaceAll(buffer) vs replaceAll(input)?

lunny · 2022-07-12T09:25:27Z

rawHTML, err := io.ReadAll(input) - yeah, that was the thing that took me so long. But what is the other reduction I missed?

It's different memory usage between convert the whole content and convert part of them, when using a io.Reader to do that, there is only about 1K or 2K when reading the whole content.

The replaceAll(buffer) vs replaceAll(input)?

Yes

wxiaoguang · 2022-07-12T09:31:10Z

-		return err
+		return n, err
 	}
+	n = copy(bs, tagCleaner.ReplaceAll([]byte(nulCleaner.Replace(string(original[:n]))), []byte("&lt;$1")))


I do not think it is correct.

The content is incomplete in the original buffer, the tagCleaner may not work correctly. And tagCleaner is quite a complex regexp, I am not sure whether it works for incomplete content.

And one more thing, after the replacing, the returned string may be longer than before, it may overflow the bs buffer.

Gusted · 2022-07-12T09:39:58Z

The reduced memory is only correct for larger inputs.

1024 bytes (this PR use more memory):

-> % benchstat old.bench new.bench 
name            old time/op    new time/op    delta
PostProcess-12     199µs ± 2%     206µs ± 3%   +3.56%  (p=0.008 n=5+5)

name            old alloc/op   new alloc/op   delta
PostProcess-12    18.7kB ± 0%    22.1kB ± 0%  +17.72%  (p=0.008 n=5+5)

name            old allocs/op  new allocs/op  delta
PostProcess-12      27.0 ± 0%      29.0 ± 0%   +7.41%  (p=0.008 n=5+5)

1024 * 8 bytes (this PR use less memory):

-> % benchstat old.bench new.bench
name            old time/op    new time/op    delta
PostProcess-12    1.76ms ± 4%    1.72ms ± 3%     ~     (p=0.310 n=5+5)

name            old alloc/op   new alloc/op   delta
PostProcess-12     119kB ± 0%      92kB ± 0%  -22.57%  (p=0.008 n=5+5)

name            old allocs/op  new allocs/op  delta
PostProcess-12      35.0 ± 0%      36.0 ± 0%   +2.86%  (p=0.008 n=5+5)

And for reference 1024 * 128 bytes:

-> % benchstat old.bench new.bench
name            old time/op    new time/op    delta
PostProcess-12    27.1ms ± 6%    26.5ms ± 2%     ~     (p=0.421 n=5+5)

name            old alloc/op   new alloc/op   delta
PostProcess-12    2.01MB ± 0%    1.45MB ± 0%  -27.94%  (p=0.016 n=4+5)

name            old allocs/op  new allocs/op  delta
PostProcess-12      52.0 ± 2%      62.2 ± 2%  +19.62%  (p=0.008 n=5+5)

Bench code:

diff --git a/bench/bench_test.go b/bench/bench_test.go
new file mode 100644
index 000000000..33d9279cc
--- /dev/null
+++ b/bench/bench_test.go
@@ -0,0 +1,17 @@
+package bench
+
+import (
+       "io"
+       "strings"
+       "testing"
+
+       "code.gitea.io/gitea/modules/markup"
+)
+
+func BenchmarkPostProcess(b *testing.B) {
+       input := strings.Repeat("a", 1024)
+       b.ReportAllocs()
+       for i := 0; i < b.N; i++ {
+               markup.PostProcess(&markup.RenderContext{}, strings.NewReader(input), io.Discard)
+       }
+}

6543 · 2022-07-13T02:33:00Z

☝️ would be nice to also add this test into this pull

lunny · 2022-12-11T13:36:06Z

closed as it's not always correct.

extract from #20326

Reduce memory usage when rendering markdown

b9194bf

lunny added the performance/memory Performance issues affecting memory use label Jul 12, 2022

lunny added 2 commits July 12, 2022 13:37

Reduce memory usage when rendering markdown

e46f754

Merge branch 'main' into lunny/performance_renderer

ea3b13e

lunny added this to the 1.18.0 milestone Jul 12, 2022

lunny requested a review from zeripath July 12, 2022 05:43

Fix bug

e2f2300

delvh approved these changes Jul 12, 2022

View reviewed changes

Comment thread modules/markup/html.go Outdated

Comment thread modules/markup/html.go Outdated

GiteaBot added the lgtm/need 1 This PR needs approval from one additional maintainer to be merged. label Jul 12, 2022

Apply suggestions from code review

4c2c466

Co-authored-by: delvh <dev.lh@web.de>

wxiaoguang reviewed Jul 12, 2022

View reviewed changes

Merge branch 'main' into lunny/performance_renderer

d121484

6543 self-requested a review July 12, 2022 13:05

lunny added the pr/wip This PR is not ready for review label Jul 12, 2022

Merge branch 'main' into lunny/performance_renderer

424b7f8

6543 added 2 commits July 16, 2022 16:05

Merge branch 'master' into lunny/performance_renderer

d2bd8c2

add BenchmarkPostProcess

d6ee57a

lunny modified the milestones: 1.18.0, 1.19.0 Oct 17, 2022

lunny mentioned this pull request Dec 11, 2022

Use multi reader instead to concat strings #22099

Merged

lunny closed this Dec 11, 2022

lunny deleted the lunny/performance_renderer branch December 11, 2022 13:36

lunny removed this from the 1.19.0 milestone Dec 11, 2022

lunny added a commit that referenced this pull request Dec 12, 2022

Use multi reader instead to concat strings (#22099)

3e8285b

extract from #20326

go-gitea locked and limited conversation to collaborators May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce memory usage when rendering markdown#20326

Reduce memory usage when rendering markdown#20326
lunny wants to merge 9 commits into
go-gitea:mainfrom
lunny:lunny/performance_renderer

lunny commented Jul 12, 2022 •

edited

Loading

Uh oh!

delvh left a comment

Uh oh!

Uh oh!

Uh oh!

lunny commented Jul 12, 2022

Uh oh!

delvh commented Jul 12, 2022 •

edited

Loading

Uh oh!

lunny commented Jul 12, 2022 •

edited

Loading

Uh oh!

wxiaoguang Jul 12, 2022

Uh oh!

wxiaoguang Jul 12, 2022

Uh oh!

Gusted commented Jul 12, 2022

Uh oh!

6543 commented Jul 13, 2022

Uh oh!

lunny commented Dec 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

lunny commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

delvh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lunny commented Jul 12, 2022

Uh oh!

delvh commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lunny commented Jul 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wxiaoguang Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

wxiaoguang Jul 12, 2022

Choose a reason for hiding this comment

Uh oh!

Gusted commented Jul 12, 2022

Uh oh!

6543 commented Jul 13, 2022

Uh oh!

lunny commented Dec 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lunny commented Jul 12, 2022 •

edited

Loading

delvh commented Jul 12, 2022 •

edited

Loading

lunny commented Jul 12, 2022 •

edited

Loading