-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Setting to disable expensive endpoints for anonymous users #33966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This feature would be very welcomed. I have the same problems as there are AI scrapers which do not even set an appropriate useragent string, that terrorize my Gitea instance for a small open source project. The vast majority of accesses go to Edit: my proposal to block queries with GET parameters seems pointless as even navigating through the issues needs a |
Or perhaps anonymous users will have limited access or reduced traffic. |
What do you think about this? -> Add a config option to block "expensive" pages #34024 |
@wxiaoguang isn't that another solution for #33951 ? |
These 2 are different.
These 2 PRs could also co-exist and won't conflict. And since #33951 has no progress in recent days, I think at least we can implement this issue's proposal to help the users under "AI crawler attack" as a quick solution. |
…o-gitea#34024) Fix go-gitea#33966 ``` ;; User must sign in to view anything. ;; It could be set to "expensive" to block anonymous users accessing some pages which consume a lot of resources, ;; for example: block anonymous AI crawlers from accessing repo code pages. ;; The "expensive" mode is experimental and subject to change. ;REQUIRE_SIGNIN_VIEW = false ``` # Conflicts: # routers/api/v1/api.go # tests/integration/api_org_test.go
1.23 nightly is ready (it is a stable release and will be 1.23.7 soon) It has a new config option: [service]
;; User must sign in to view anything.
;; It could be set to "expensive" to block anonymous users accessing some pages which consume a lot of resources,
;; for example: block anonymous AI crawlers from accessing repo code pages.
;; The "expensive" mode is experimental and subject to change.
;REQUIRE_SIGNIN_VIEW = false Welcome to try and provide feedbacks. |
Works wonderful, thank you. It already reduced the load and the number of accesses both. @wxiaoguang Is it possible to configure what endpoints are considered as “expensive”? I would like to give anonymous users access to the source code, because this is already what one can see on the main repository page, but only the root directory. Also if users are able to checkout the complete repo anonymously, it makes not much sense to restrict the |
Since the |
Normally I would say this is an expensive endpoint. It’s just that in my case the bots hammered on the |
@wxiaoguang Was there any change to the admin config settings page to reflect this? I turned it on the "expensive", but I still get an X. Could we maybe get it to say if the expensive setting took? ![]() |
"admin config settings page" is quite out-dated. A lot of config options are not there or incorrectly displayed there. To confirm, just try to visit a repo's file anonymously. If we'd like to make the "admin config settings page" work correctly, it needs to completely refactor the setting system. |
Just enabled this, good stuff. My two cents: I think it could be a little more fine-grained though - not being able to view issues strikes me as too much. I also think it should be possible to browse code, but only for the default branch, not for any commit. And can we add the |
I have a complete plan here: #33951 (comment)
Then end users could choose a combination to satisfy their requirements. To answer your question:
|
For this one: would you like to build your own binary to allow the "issues" related endpoints? I think they are not that heavy and it might work well in your case. |
I like these ideas!
I was still able to reach it via the releases. |
Would you like to try this one? Rate-limit anonymous accesses to expensive pages when the server is overloaded #34167 It will do the best to serve contents to end users and crawlers, anonymous end users just need to wait for 1 second to see the "expensive" page if the server is overloaded. |
Feature Description
Since AI scrapers are terrorizing the web and flooding innocent gitea instances, it would make sense to have an option to only allow expensive endpoints (like
/src/commit
or/blame
) for logged in users.What I have observed is that crawlers like Claudebot and Bytespider don't respect my robots.txt and decide to crawl every single file from every single commit. For big repositories this can become a massive performance hit since gitea has to run git to be able to serve the requests, which has a lot of overhead. I even enabled a redis cache but since they hit new files all the time it didn't help much.
As a workaround I have configured my reverse proxy nginx to redirect these endpoints to an Anubis instance (https://anubis.techaro.lol/) which seems to kill most of the scrapers or at least wastes their time for long enough to make their DDOS (because that's what it is, really!) less annoying.
However, since this is a solution that works on proxying with nginx, every user sees the Anubis thing before being able to look at commits, even if they are logged in. Therefore it would be preferrable to just have an option to disallow these endpoints. If someone external wants to look at the commits they can just check out the repository and look at the history there.
Screenshots
No response
The text was updated successfully, but these errors were encountered: