-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Index Page Incredibly Slow to Load - Slow SQL Query #32112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Any time we need to teach dumb database engine to execute simple queries with such a mess, it leads to "wtf is happening here???" in the feature. |
The indexes are already there, the query that is being ran is just not optimized to use the indexes appropriately. The tl;dr is the because of the inner join, every row on actions needs to be compared Limiting the number of comparisons (by only comparing against the set of repo_id that do not exist in repository.id) massively increases performance. In my example, I have ~70 repositories and ~86k actions. Each of the 86k actions needs to be compared against ~70 repository.id values, leading to 86k * 70 = ~6 million comparisons. Reducing that to... 0 (in my case) means that the query can run quite a lot faster. It's making the database do less work. Instead of asking "get me the set of actions for the repo_ids in repository.id" ask "get me the set of actions with a repo_id not in the set of ids that do not exist in repository.id" i.e. it's a "double negative" to get back to the same place, but let's the database figure it out a lot faster. Ideally action.repo_id should be a foreign key to repository.id, then the database would already know if the repository exists in the repository table already. Without it being a foreign key, it needs to do all this computation every time (or use the "not not" trick I showed above). |
inner join is not that straightforward. It leads to inner loop in case of bad planning or, for example, because of query hint (inner loop join for MSSQL). Mostly it is a main tool for queries and highly optimized. No 6 mils comparisons in loops. |
Maybe we can remove the inner join directly and provide a doctor command to clean the dirty records on action table. |
@lunny that would probably work pretty well, maybe have it be one of the "cronjobs" where it removes action entries if they refer to repos that do not exist in the repository table. This could be done using the select unique(xxx) not in (select id) query example. The select without the join is plenty fast, it's just the comparison between tables that's slow. Ideally, I believe that action.repo_id should be a foreign key to the repository table and then a delete cascade can be added to automatically "clean up" the action table as well. |
On the index page (and on pagging)
On which hardware do you run gitea? After #31752 the whole start page need ~3-5 seconds. It run on an NUC with an Intel Pentium Silver J5005 and 32GB RAM and PostgreSQL.
In my case the problem is this
which is made on every page. All other stuff is fast now. |
Description
Recently on my Gitea instance, loading the index page started taking an absolute eternity, 30 seconds+ quite often. This is a bug report / analysis as to why.
I took a look in the logs for the container and noticed this:
Hmm... that's odd. Why would that query be so slow?
Running similar query against database using my actual user_id=2 and is_deleted = 0 returns in 0.006 seconds
Strange... similar query gets results quite quickly.
Let's try removing the where...
Wow! That makes the query take significantly longer!
Output from the actual select statement:
Changing the select to remove the inner join:
From reviewing the source code, the columns for
repository
are intentionally ignored.gitea/models/activities/action.go
Lines 456 to 459 in e1f0598
With this knowledge, I was able to build a significantly faster query that achieves the same thing (only getting actions for repositories that exist in the
repository
table):Look at that! 0.028 seconds! Much better! But wait? What in the world is that query? Let me break it down a bit:
The original query is completing an "inner join" for the apparently sole purpose of determing if the actions belong to repositories that currently exist.
The first thing I tried was to change it from an inner join to a subquery:
Unfortunately... this is also just as slow...
So thinking about it... how do you make it faster? What are we trying to do? Exclude actions that have a repo_id that is not in repositories right?
Let's take a look at the queries we just used above!
Ok, both of those look to be pretty similar, but what's happening is MariaDB needs to compare all of the ids against the "select id from repository" part of the subquery.
This is likely the "same thing" that MariaDB is doing when completing the above inner join.
So how do you make this faster?
Let's flip it around and only get the repo_ids of actions that are not in the repository table!
Neat! For me it's an empty set! (and it generally should be; however, repo_id is not a foreign key in the current schema)
Now, when you do a select against that subquery, it is much faster to execute as there is "less" that needs to be scanned for each row.
Bringing us back to this...
That is where the odd looking subquery comes from; however, it leads to significantly faster queries that still maintain the "same" restrictions
as the existing inner join!
Finally, let's add user_id=2 and is_deleted=0 back just for a "apples to apples" comparison with original query:
Neat! Just as fast (0.002 seconds), I'm omitting the actual output from this query as it's just an absolute load of json in the
content
column, but otherwise looks similar to above samples.I'd test changing the query in the source code; however, I do not have a Go development environment currently setup. I do hope this knowledge can be incorporated into the next version update for faster index page loads!
Thanks!
Gitea Version
1.22.2
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
No response
Git Version
No response
Operating System
AlmaLinux 9.4 - Official Gitea container
How are you running Gitea?
Official gitea container, running on podman:
Database
MySQL/MariaDB
The text was updated successfully, but these errors were encountered: