Skip to content

Infrastructure: Change db from mariadb to postgres #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

ConnorNelson
Copy link
Member

@ConnorNelson ConnorNelson commented May 30, 2025

This resolves #710.

TODO:

  • Fix dojo db backup
  • Fix dojo db restore
  • Resolve all PG-JSON TODOs (json incompatibility between sqlalchemy mariadb vs sqlalchemy postgres)
  • Determine and document migration story
  • Improve slow queries

@ConnorNelson
Copy link
Member Author

As part of this PR, on production, I am going to put the DB back on the main node. Both the main node and the db node sit almost entirely idle.

@ConnorNelson
Copy link
Member Author

ConnorNelson commented Jun 11, 2025

Roughly speaking, this is the logic of doing the migration.

Let's assume we merge the PR, and do something like:

dojo compose down ctfd
dojo compose down db
dojo backup

We block anyone from connecting during the migration:

iptables -I INPUT -p tcp --dport  80 -j DROP 
iptables -I INPUT -p tcp --dport 443 -j DROP

Then we save off the backup to /tmp

cp /data/backups/$(ls -th /data/backups | head -n 1) /tmp/db.sql.gz

We'll want to update /data/config.env with some new good default values like:

DB_HOST=db
DB_NAME=ctfd
DB_USER=ctfd
DB_PASS=ctfd

(and also remove DB_EXTERNAL)

And bring everything up, with the updates:

git pull
dojo sync
dojo compose up -d --build db
dojo compose up -d --build --no-deps ctfd

Create a temporary mariadb, which we are going to load all the data in.

docker run -d --name mariadb-tmp --network pwncollege_default \
  -e MYSQL_ROOT_PASSWORD=mypass \
  -e MYSQL_DATABASE=importdb \
  mariadb:10.4.12 \
  --skip-log-bin \
  --innodb_flush_log_at_trx_commit=0 --sync_binlog=0 --innodb_doublewrite=0 \
  --innodb_buffer_pool_size=1G \
  --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

We need to wait for the temp db to finish setting up:

sleep 30

Load the backup into the temp db:

gunzip -c /tmp/db.sql.gz | docker exec -i mariadb-tmp mysql -u root -pmypass importdb

Now, migrate the data into postgres. We are intentionally ignoring the comments/files tables (which contain old data and cause migration issues).

cat <<EOF > pgloader.load
LOAD DATABASE
     FROM mysql://root:mypass@mariadb-tmp/importdb
     INTO postgresql://ctfd:ctfd@db/ctfd

 WITH data only, truncate, reset sequences, prefetch rows = 1000

 EXCLUDING TABLE NAMES MATCHING ~/^comments$/, ~/^files$/

 ALTER SCHEMA 'importdb' RENAME TO 'public';
EOF

docker run --rm --network pwncollege_default \
       -v $(pwd)/pgloader.load:/tmp/pgloader.load \
       dimitri/pgloader:latest pgloader /tmp/pgloader.load

We should be good to go now. So, we let people connect again and cleanup:

iptables -D INPUT -p tcp --dport  80 -j DROP
iptables -D INPUT -p tcp --dport 443 -j DROP
docker kill mariadb-tmp
docker rm mariadb-tmp

@ConnorNelson
Copy link
Member Author

Here's some profiling.

This compares this PR on a mostly-idle box (postgres) against master on production (mariadb).

It is unclear how much of the difference is due to the load conditions and how much is due to the database engines, but it should be noted that production was relatively quiet at the time of testing.


Dojo Stats

from CTFd.plugins.dojo_plugin.utils.stats import get_dojo_stats

os.environ["CACHE_WARMER"] = "true"

for dojo_id in ["computing-101", "welcome", "intro-to-cybersecurity", "cse365-s2025"]:
    print(dojo_id)
    dojo = Dojos.from_id(dojo_id).first()
    %timeit -n1 -r3 get_dojo_stats(dojo)

This PR on a mostly-idle box (postgres):

computing-101
22.7 s ± 73.7 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
welcome
8.89 s ± 23.2 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
intro-to-cybersecurity
15.9 s ± 269 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
cse365-s2025
4.4 s ± 325 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Master on production (mariadb):

18.9 s ± 435 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
welcome
7.06 s ± 36.4 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
intro-to-cybersecurity
19.3 s ± 217 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
cse365-s2025
1min 32s ± 462 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Dojo Scoreboard

def get_scoreboard_for(model, duration):
    duration_filter = (
        Solves.date >= datetime.datetime.utcnow() - datetime.timedelta(days=duration)
        if duration else True
    )
    solves = db.func.count().label("solves")
    rank = (
        db.func.row_number()
        .over(order_by=(solves.desc(), db.func.max(Solves.id)))
        .label("rank")
    )
    user_entities = [Solves.user_id, Users.name, Users.email]
    query = (
        model.solves()
        .filter(duration_filter)
        .group_by(*user_entities)
        .order_by(rank)
        .with_entities(rank, solves, *user_entities)
    )

    row_results = query.all()
    results = [{key: getattr(item, key) for key in item.keys()} for item in row_results]
    return results

for dojo_id in ["computing-101", "welcome", "intro-to-cybersecurity", "cse365-s2025"]:
    print(dojo_id)
    dojo = Dojos.from_id(dojo_id).first()
    %timeit -n1 -r3 get_scoreboard_for(dojo, None)

This PR on a mostly-idle box (postgres):

computing-101
2.77 s ± 60.1 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
welcome
3.15 s ± 20.3 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
intro-to-cybersecurity
2.71 s ± 13.1 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
cse365-s2025
3.46 s ± 143 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Master on production (mariadb):

computing-101
4.19 s ± 111 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
welcome
2.26 s ± 51.6 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
intro-to-cybersecurity
4.37 s ± 55 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)
cse365-s2025
17.5 s ± 296 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Dojo Scores

def scores_query(granularity, dojo_filter):
    solve_count = db.func.count(Solves.id).label("solve_count")
    last_solve_date = db.func.max(Solves.date).label("last_solve_date")
    fields = granularity + [ Solves.user_id, solve_count, last_solve_date ]
    grouping = granularity + [ Solves.user_id ]

    dsc_query = db.session.query(*fields).where(
        Dojos.dojo_id == DojoChallenges.dojo_id, DojoChallenges.challenge_id == Solves.challenge_id,
        dojo_filter
    ).group_by(*grouping).order_by(Dojos.id, solve_count.desc(), last_solve_date)

    return dsc_query

def dojo_scores():
    dsc_query = scores_query([Dojos.id], or_(Dojos.data["type"].astext == "public", Dojos.official))

    user_ranks = { }
    user_solves = { }
    dojo_ranks = { }
    for dojo_id, user_id, solve_count, _ in dsc_query:
        dojo_ranks.setdefault(dojo_id, [ ]).append(user_id)
        user_ranks.setdefault(user_id, {})[dojo_id] = len(dojo_ranks[dojo_id])
        user_solves.setdefault(user_id, {})[dojo_id] = solve_count

    return {
        "user_ranks": user_ranks,
        "user_solves": user_solves,
        "dojo_ranks": dojo_ranks
    }

%timeit -n1 -r3 dojo_scores()

This PR on a mostly-idle box (postgres):

24.2 s ± 884 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Master on production (mariadb):

1min 44s ± 530 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Module Scores

def scores_query(granularity, dojo_filter):
    solve_count = db.func.count(Solves.id).label("solve_count")
    last_solve_date = db.func.max(Solves.date).label("last_solve_date")
    fields = granularity + [ Solves.user_id, solve_count, last_solve_date ]
    grouping = granularity + [ Solves.user_id ]

    dsc_query = db.session.query(*fields).where(
        Dojos.dojo_id == DojoChallenges.dojo_id, DojoChallenges.challenge_id == Solves.challenge_id,
        dojo_filter
    ).group_by(*grouping).order_by(Dojos.id, solve_count.desc(), last_solve_date)

    return dsc_query

def module_scores():
    dsc_query = scores_query([Dojos.id, DojoChallenges.module_index], or_(Dojos.data["type"].astext == "public", Dojos.official))

    user_ranks = { }
    user_solves = { }
    module_ranks = { }
    for dojo_id, module_idx, user_id, solve_count, _ in dsc_query:
        module_ranks.setdefault(dojo_id, {}).setdefault(module_idx, []).append(user_id)
        user_ranks.setdefault(user_id, {}).setdefault(dojo_id, {})[module_idx] = len(module_ranks[dojo_id][module_idx])
        user_solves.setdefault(user_id, {}).setdefault(dojo_id, {})[module_idx] = solve_count

    return {
        "user_ranks": user_ranks,
        "user_solves": user_solves,
        "module_ranks": module_ranks
    }

%timeit -n1 -r3 module_scores()

This PR on a mostly-idle box (postgres):

38.8 s ± 279 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)

Master on production (mariadb):

5min 43s ± 1.13 s per loop (mean ± std. dev. of 3 runs, 1 loop each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate DB to PostgreSQL
1 participant