Skip to content

Do not block the database(sqlite) during archive creation #27563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

matrss
Copy link

@matrss matrss commented Oct 10, 2023

The database transaction in doArchive is currently held open for the entirety of the archive generation process. For larger repositories this could be a long time. If the database is sqlite3 (others not tested) then this can block other processes from using the database, which e.g. leads to long load times for the web UI route of all repositories on the gitea server, among other things. This patch finishes the transaction right before generating the archive and starts a new one to update the database afterwards.

You can reproduce the issue with the following steps:

  1. create a git repository with a large file (e.g. 1GB seems sufficient to see the issue)
  2. commit the file and push to a gitea instance using a sqlite database
  3. from the web UI start a .tar.gz (or .zip) download
  4. try to navigate to the same or another repository
  5. observe a long loading time until the archive is created (i.e. the download starts)

I am new to Go and therefore not sure if this is the most idiomatic way to fix this. I essentially just copied the way the transaction is created in the beginning to after the archive is created and .Commit and .Closed before that. Specifically I am not sure if I would have to propagate the err from .Close as well, since that is deferred in the beginning which would silence the error as well, right? Is this "double .Close" from the defer and the explicit call OK to do?

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Oct 10, 2023
@lunny lunny added the type/enhancement An improvement of existing functionality label Oct 11, 2023
@lunny lunny added this to the 1.22.0 milestone Oct 11, 2023
matrss added a commit to matrss/gitea that referenced this pull request Oct 31, 2023
This commit can be dropped as soon as
go-gitea#27563 is accepted.
kousu pushed a commit to matrss/gitea that referenced this pull request Nov 4, 2023
This commit can be dropped as soon as
go-gitea#27563 is accepted.
kousu pushed a commit to matrss/gitea that referenced this pull request Nov 4, 2023
This commit can be dropped as soon as
go-gitea#27563 is accepted.
@lunny
Copy link
Member

lunny commented Mar 4, 2024

How about moving the archive generation before database updates?

@matrss
Copy link
Author

matrss commented Mar 11, 2024

That would work if the doArchive function is only ever called when the archive should definitely be created, but the existence of

if archiver != nil {
// FIXME: If another process are generating it, we think it's not ready and just return
// Or we should wait until the archive generated.
if archiver.Status == repo_model.ArchiverGenerating {
return nil, nil
}
suggests that this is not the case, i.e. there can be a situation in which the archive shouldn't be created. I am not familiar enough with the codebase to judge if that would be a real issue though, or if it would "just" be unnecessary work.

@lunny
Copy link
Member

lunny commented Mar 12, 2024

That would work if the doArchive function is only ever called when the archive should definitely be created, but the existence of

if archiver != nil {
// FIXME: If another process are generating it, we think it's not ready and just return
// Or we should wait until the archive generated.
if archiver.Status == repo_model.ArchiverGenerating {
return nil, nil
}

suggests that this is not the case, i.e. there can be a situation in which the archive shouldn't be created. I am not familiar enough with the codebase to judge if that would be a real issue though, or if it would "just" be unnecessary work.

Yes, that's a simpler lock mechanism based on database. And I think this is only affecting sqlite but for other database, it works.

@lunny lunny changed the title Do not block the database during archive creation Do not block the database(sqlite) during archive creation Mar 12, 2024
@matrss
Copy link
Author

matrss commented Mar 12, 2024

And I think this is only affecting sqlite but for other database, it works.

Yes, I think the underlying issue is sqlite's limitations in regard to concurrent reads/writes. Holding the transaction open for the entire duration of archive creation seems to even block read access to the database.

Other databases do not have these limitations, although keeping the transaction open for a long time might have some effect there as well, like increased resource usage and such to accommodate this situation.

Some sqlite options, like WAL mode, might also have an effect on how this issue manifests itself, but I haven't tested out anything in that regard.

@lunny lunny modified the milestones: 1.22.0, 1.23.0 Mar 29, 2024
@lunny
Copy link
Member

lunny commented Sep 28, 2024

It's better to use the current globallock modules to instead of database operations.

@lunny
Copy link
Member

lunny commented Oct 3, 2024

Can you help to confirm that #32186 can resolve the problem?

@lunny
Copy link
Member

lunny commented Oct 9, 2024

Replaced by #32186

@lunny lunny closed this Oct 9, 2024
@GiteaBot GiteaBot removed this from the 1.23.0 milestone Oct 9, 2024
lunny added a commit that referenced this pull request Nov 15, 2024
Since there is a status column in the database, the transaction is
unnecessary when downloading an archive. The transaction is blocking
database operations, especially with SQLite.

Replace #27563
GiteaBot pushed a commit to GiteaBot/gitea that referenced this pull request Nov 15, 2024
Since there is a status column in the database, the transaction is
unnecessary when downloading an archive. The transaction is blocking
database operations, especially with SQLite.

Replace go-gitea#27563
silverwind pushed a commit that referenced this pull request Nov 15, 2024
Backport #32186 by @lunny

Since there is a status column in the database, the transaction is
unnecessary when downloading an archive. The transaction is blocking
database operations, especially with SQLite.

Replace #27563

Co-authored-by: Lunny Xiao <[email protected]>
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Jan 7, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants