Add max_upstream_conn parameter for each proxy_cache project#22348
Add max_upstream_conn parameter for each proxy_cache project#22348stonezdj merged 1 commit intogoharbor:mainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #22348 +/- ##
===========================================
+ Coverage 45.36% 65.87% +20.50%
===========================================
Files 244 1073 +829
Lines 13333 116018 +102685
Branches 2719 2927 +208
===========================================
+ Hits 6049 76427 +70378
- Misses 6983 35354 +28371
- Partials 301 4237 +3936
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
59dd2bf to
38e3056
Compare
9077c61 to
8879689
Compare
|
I've had a quick scan through, and it seems like the client will experience 429s when the maximum connection limit it reached? This is different to my change (but perhaps complementary?), requests for the same manifests/blobs will queue behind a single request and return the result from the cache when it completes (i.e., no requests failures when there is contention for the same resources). |
Queue all subsequent requests will consume all tcp connection on the Harbor server, if there are 500 request, then only 1 request can proceed, other 499 will wait, if the first takes 30 minutes, the other should wait 30 minutes. and meanwhile other connection might fail because TCP connection is a limited resource. return 429 is better than blocking the request. |
I am in full agreement that returning a 429 is better than blocking the request. Which is why I feel this change is complementary with mine. This change doesn't address the upstream request deduplication issue that I was originally trying to solve on my branch. |
|
To add a bit more context: I have a use case where we are very sensitive to boot times, particularly for new images that need to be pulled from upstream. We also have a very large number of pods often concurrently hitting the registry for the same artifact. So I would just like to make sure we're efficient about retrieving resources from the upstream. Which goes further than just rate limiting imo, we should aim to de-duplicate concurrent requests where possible - which is the motivation of my change. This approach has been working well for us. We're using our fork of Harbor as a DC-local cache for images in Google Artifact Registry. We routinely handle spikes of >1K concurrent pulls with this approach. But I think if we were to just use this rate limiting approach, we'd have quite a lot of pods just spinning in CrashLoopBackOff... and that'd be a bad time for us. |
the event is ImagePullBackOff when the image pulling failed, and it will retry pulling image with a back off policy. |
9fc124a to
1fb5176
Compare
|
Could you make Limiter an interface and add some testcases? |
0a096a6 to
9632140
Compare
c9b1ad7 to
7bfbf41
Compare
limit the proxy connection to upstream registry Signed-off-by: stonezdj <stonezdj@gmail.com>
7bfbf41 to
e430ebc
Compare
…r#22348) limit the proxy connection to upstream registry Signed-off-by: stonezdj <stonezdj@gmail.com>
…r#22348) limit the proxy connection to upstream registry Signed-off-by: stonezdj <stonezdj@gmail.com>
Limit the proxy connection to the upstream registry
fixes #22184
Thank you for contributing to Harbor!
Comprehensive Summary of your change
Issue being fixed
Fixes #(issue)
Please indicate you've done the following: