Skip to content

Add max_upstream_conn parameter for each proxy_cache project#22348

Merged
stonezdj merged 1 commit intogoharbor:mainfrom
stonezdj:25aug26_rate_limit_proxycache_upstream
Sep 30, 2025
Merged

Add max_upstream_conn parameter for each proxy_cache project#22348
stonezdj merged 1 commit intogoharbor:mainfrom
stonezdj:25aug26_rate_limit_proxycache_upstream

Conversation

@stonezdj
Copy link
Copy Markdown
Contributor

@stonezdj stonezdj commented Sep 12, 2025

Limit the proxy connection to the upstream registry

fixes #22184

Thank you for contributing to Harbor!

Comprehensive Summary of your change

Issue being fixed

Fixes #(issue)

Please indicate you've done the following:

  • Well Written Title and Summary of the PR
  • Label the PR as needed. "release-note/ignore-for-release, release-note/new-feature, release-note/update, release-note/enhancement, release-note/community, release-note/breaking-change, release-note/docs, release-note/infra, release-note/deprecation"
  • Accepted the DCO. Commits without the DCO will delay acceptance.
  • Made sure tests are passing and test coverage is added if needed.
  • Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 17.64706% with 56 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.87%. Comparing base (c8c11b4) to head (e430ebc).
⚠️ Report is 573 commits behind head on main.

Files with missing lines Patch % Lines
src/server/middleware/repoproxy/proxy.go 0.00% 32 Missing ⚠️
src/pkg/project/models/project.go 0.00% 11 Missing ⚠️
src/pkg/proxy/connection/limit.go 46.15% 5 Missing and 2 partials ⚠️
src/server/v2.0/handler/project.go 0.00% 6 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main   #22348       +/-   ##
===========================================
+ Coverage   45.36%   65.87%   +20.50%     
===========================================
  Files         244     1073      +829     
  Lines       13333   116018   +102685     
  Branches     2719     2927      +208     
===========================================
+ Hits         6049    76427    +70378     
- Misses       6983    35354    +28371     
- Partials      301     4237     +3936     
Flag Coverage Δ
unittests 65.87% <17.64%> (+20.50%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/server/v2.0/handler/project_metadata.go 14.28% <100.00%> (ø)
src/server/v2.0/handler/project.go 4.87% <0.00%> (ø)
src/pkg/proxy/connection/limit.go 46.15% <46.15%> (ø)
src/pkg/project/models/project.go 37.25% <0.00%> (ø)
src/server/middleware/repoproxy/proxy.go 3.71% <0.00%> (ø)

... and 982 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@stonezdj stonezdj force-pushed the 25aug26_rate_limit_proxycache_upstream branch 2 times, most recently from 59dd2bf to 38e3056 Compare September 12, 2025 08:54
@stonezdj stonezdj added the release-note/enhancement Label to mark PR to be added under release notes as enhancement label Sep 12, 2025
Comment thread api/v2.0/swagger.yaml Outdated
@stonezdj stonezdj force-pushed the 25aug26_rate_limit_proxycache_upstream branch 2 times, most recently from 9077c61 to 8879689 Compare September 15, 2025 02:52
@Vad1mo
Copy link
Copy Markdown
Member

Vad1mo commented Sep 17, 2025

@Strainy does this replace your PR #22185?

@Strainy
Copy link
Copy Markdown

Strainy commented Sep 17, 2025

@Strainy does this replace your PR #22185?

I've had a quick scan through, and it seems like the client will experience 429s when the maximum connection limit it reached? This is different to my change (but perhaps complementary?), requests for the same manifests/blobs will queue behind a single request and return the result from the cache when it completes (i.e., no requests failures when there is contention for the same resources).

@stonezdj
Copy link
Copy Markdown
Contributor Author

stonezdj commented Sep 18, 2025

@Strainy does this replace your PR #22185?

I've had a quick scan through, and it seems like the client will experience 429s when the maximum connection limit it reached? This is different to my change (but perhaps complementary?), requests for the same manifests/blobs will queue behind a single request and return the result from the cache when it completes (i.e., no requests failures when there is contention for the same resources).

Queue all subsequent requests will consume all tcp connection on the Harbor server, if there are 500 request, then only 1 request can proceed, other 499 will wait, if the first takes 30 minutes, the other should wait 30 minutes. and meanwhile other connection might fail because TCP connection is a limited resource. return 429 is better than blocking the request.

@Strainy
Copy link
Copy Markdown

Strainy commented Sep 18, 2025

@Strainy does this replace your PR #22185?

I've had a quick scan through, and it seems like the client will experience 429s when the maximum connection limit it reached? This is different to my change (but perhaps complementary?), requests for the same manifests/blobs will queue behind a single request and return the result from the cache when it completes (i.e., no requests failures when there is contention for the same resources).

Queue all subsequent requests will consume all tcp connection on the Harbor server, if there are 500 request, then only 1 request can proceed, other 499 will wait, if the first takes 30 minutes, the other should wait 30 minutes. and meanwhile other connection might fail because TCP connection is a limited resource. return 429 is better than blocking the request.

I am in full agreement that returning a 429 is better than blocking the request. Which is why I feel this change is complementary with mine. This change doesn't address the upstream request deduplication issue that I was originally trying to solve on my branch.

@Strainy
Copy link
Copy Markdown

Strainy commented Sep 19, 2025

To add a bit more context: I have a use case where we are very sensitive to boot times, particularly for new images that need to be pulled from upstream. We also have a very large number of pods often concurrently hitting the registry for the same artifact.

So I would just like to make sure we're efficient about retrieving resources from the upstream. Which goes further than just rate limiting imo, we should aim to de-duplicate concurrent requests where possible - which is the motivation of my change.

This approach has been working well for us. We're using our fork of Harbor as a DC-local cache for images in Google Artifact Registry. We routinely handle spikes of >1K concurrent pulls with this approach.

But I think if we were to just use this rate limiting approach, we'd have quite a lot of pods just spinning in CrashLoopBackOff... and that'd be a bad time for us.

Comment thread src/pkg/proxy/connection/limit.go Outdated
@stonezdj
Copy link
Copy Markdown
Contributor Author

To add a bit more context: I have a use case where we are very sensitive to boot times, particularly for new images that need to be pulled from upstream. We also have a very large number of pods often concurrently hitting the registry for the same artifact.

So I would just like to make sure we're efficient about retrieving resources from the upstream. Which goes further than just rate limiting imo, we should aim to de-duplicate concurrent requests where possible - which is the motivation of my change.

This approach has been working well for us. We're using our fork of Harbor as a DC-local cache for images in Google Artifact Registry. We routinely handle spikes of >1K concurrent pulls with this approach.

But I think if we were to just use this rate limiting approach, we'd have quite a lot of pods just spinning in CrashLoopBackOff... and that'd be a bad time for us.

the event is ImagePullBackOff when the image pulling failed, and it will retry pulling image with a back off policy.
I prefer return 429 literaly rather than hang on the connection on server side. if there are 500 requests come to pull the same image at the same time, only 1 is served, but it may take longer time than expected to get the cache ready time, during this interval, other 499 connection is hanging on, because the max connection to a server is a limit resource, it is possible that other client cannot connect to Harbor without any response. it is not a good user experience.

Comment thread src/pkg/project/models/pro_meta.go
Comment thread src/server/middleware/repoproxy/proxy.go Outdated
Comment thread src/server/middleware/repoproxy/proxy.go
Comment thread src/server/middleware/repoproxy/proxy.go
Comment thread src/pkg/proxy/connection/limit.go
Comment thread src/pkg/proxy/connection/limit.go
Comment thread src/server/middleware/repoproxy/proxy.go Outdated
Comment thread src/server/middleware/repoproxy/proxy.go Outdated
Copy link
Copy Markdown
Contributor

@wy65701436 wy65701436 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment thread api/v2.0/swagger.yaml
Comment thread src/server/middleware/repoproxy/proxy.go Outdated
@stonezdj stonezdj force-pushed the 25aug26_rate_limit_proxycache_upstream branch 5 times, most recently from 9fc124a to 1fb5176 Compare September 28, 2025 05:28
Comment thread src/pkg/project/models/project.go
Comment thread src/pkg/proxy/connection/limit.go Outdated
@reasonerjt
Copy link
Copy Markdown
Contributor

Could you make Limiter an interface and add some testcases?

@stonezdj stonezdj force-pushed the 25aug26_rate_limit_proxycache_upstream branch from 0a096a6 to 9632140 Compare September 30, 2025 06:47
@stonezdj stonezdj enabled auto-merge (squash) September 30, 2025 06:48
@stonezdj stonezdj force-pushed the 25aug26_rate_limit_proxycache_upstream branch 3 times, most recently from c9b1ad7 to 7bfbf41 Compare September 30, 2025 07:01
  limit the proxy connection to upstream registry

Signed-off-by: stonezdj <stonezdj@gmail.com>
@stonezdj stonezdj force-pushed the 25aug26_rate_limit_proxycache_upstream branch from 7bfbf41 to e430ebc Compare September 30, 2025 09:49
@stonezdj stonezdj merged commit c004f2d into goharbor:main Sep 30, 2025
12 checks passed
stonezdj added a commit to stonezdj/harbor that referenced this pull request Sep 30, 2025
…r#22348)

limit the proxy connection to upstream registry

Signed-off-by: stonezdj <stonezdj@gmail.com>
stonezdj added a commit that referenced this pull request Oct 9, 2025
…#22409)

limit the proxy connection to upstream registry

Signed-off-by: stonezdj <stonezdj@gmail.com>
OrlinVasilev pushed a commit to OrlinVasilev/harbor that referenced this pull request Oct 29, 2025
…r#22348)

limit the proxy connection to upstream registry

Signed-off-by: stonezdj <stonezdj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/enhancement Label to mark PR to be added under release notes as enhancement target/2.14.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coalesce upstream requests where possible

8 participants