Skip to content

tools: add build_frontend.sh for reliable Toolforge frontend builds#534

Open
lgelauff wants to merge 90 commits into
masterfrom
tools/build-frontend-script
Open

tools: add build_frontend.sh for reliable Toolforge frontend builds#534
lgelauff wants to merge 90 commits into
masterfrom
tools/build-frontend-script

Conversation

@lgelauff
Copy link
Copy Markdown
Collaborator

Summary

  • Adds tools/build_frontend.sh — a script for building the Vue frontend on Toolforge using a jobs run job (node20, 4Gi) rather than an interactive webservice shell
  • Cherry-picks frontend/.env.production from fix(frontend): set empty VITE_API_ENDPOINT for production builds #516, setting VITE_API_ENDPOINT= so production builds use a relative API base URL instead of the localhost default
  • Updates deployment.md (both fresh-install and update sections) to use the script instead of the manual npm install + npm run toolforge:build approach

Known workarounds documented in the script

Temporary--ignore-scripts + explicit @esbuild/linux-x64 install:
The Toolforge node20 image ships npm 9.2.0, which does not correctly install platform-specific optional binaries when package-lock.json was generated on macOS with npm 10+. This causes esbuild's post-install validation to fail. Workaround: skip post-install scripts, then install the correct linux-x64 binary explicitly at the version from the lock file.
Long-term fix: switch to --image node22 (ships npm 10+), see T393437.

Permanentfrontend/.env.production:
Required so production builds use /v1/ as the API base URL. Must stay in the repo.

Test plan

  • git pull && bash tools/build_frontend.sh completes with ✓ built in Xs on montage-beta
  • montage/static/assets/ contains freshly built files
  • No errors in --- errors --- section of script output
  • Service responds correctly after toolforge webservice python3.11 restart

🤖 Generated with Claude Code

lgelauff and others added 9 commits April 27, 2026 17:51
Runs npm install + toolforge:build via `toolforge jobs run` to avoid
OOM crashes in the interactive webservice shell. Works for any tool
account (montage-beta, montage-dev) after `become <account>`.

Usage: bash tools/build_frontend.sh
Without frontend/.env.production, the .env file is gitignored on
Toolforge so VITE_API_ENDPOINT is undefined at build time, causing
Axios baseURL to resolve to undefined/v1/ and breaking all API calls.

Verified working on montage-dev (2026-04-20).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wrap command in bash -c so && is interpreted by a shell
- Use npm ci instead of npm install (strict, never modifies lock file)
- Restore package-lock.json from git before building to prevent binary version drift
- Derive log paths from tool name instead of hardcoding montage-beta
- Replace unreliable toolforge jobs logs polling with tail -f on the output file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nary fix

- Use --wait flag on toolforge jobs run instead of polling
- Derive esbuild version from package-lock.json and install matching
  @esbuild/linux-x64 binary explicitly (workaround for npm 9 in node20
  image not installing correct optional platform dep)
- Show logs with cat after job completes instead of tail -f polling
- Drop rm -rf node_modules (not needed with explicit binary fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
esbuild's install.js post-install script validates the platform binary
version immediately during npm install. With a stale npm cache serving
@esbuild/linux-x64@0.25.5, it fails before we can replace the binary.
--ignore-scripts skips the validation; the explicit binary install that
follows provides the correct version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Truncate log files before each run so stale errors from previous
  attempts don't appear in output
- Filter EBADENGINE/WARN noise from stderr; only show real errors
- Consolidate comments explaining the --ignore-scripts workaround

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a KNOWN WORKAROUNDS header distinguishing temporary hacks (esbuild
binary mismatch due to npm 9 / cross-platform lock file) from permanent
requirements (VITE_API_ENDPOINT in .env.production), with pointers to
root causes and what removes each workaround.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the manual webservice shell approach (npm install + build in an
interactive node20 shell) with the build script, which handles the
esbuild binary workaround and runs as a proper Toolforge job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…/patch

- vite 6.3.6 → 6.4.2 (fixes 3 high-severity dev-server CVEs)
- vue 3.5.22 → 3.5.33
- axios 1.12.2 → 1.15.2
- dayjs 1.11.18 → 1.11.20
- prettier 3.6.2 → 3.8.3

Remaining 12 vulnerabilities are all in Cypress test deps or build
tooling and do not affect the production deployment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lgelauff lgelauff marked this pull request as ready for review April 27, 2026 18:31
lgelauff and others added 20 commits April 27, 2026 20:32
…flicts

npm install inside the Toolforge job modifies package-lock.json on the
bastion's NFS mount. This causes 'git pull' to abort with a merge
conflict on the next deploy. Restore the file from git at the start of
the script so it's always clean before the job runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The build script now restores package-lock.json from git before running,
so git pull + bash tools/build_frontend.sh is the single deploy command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- README: add GitHub Issues link, mark Phabricator as archived, add pointer to dev.md
- dev.md: fix Node.js version (v16→v18), rdb.py line number (113→119), and project tree (filenames, removed non-existent config/ dir)
- .gitignore: add .claude/ and tmp/
- app.py: fix /a/ StaticFileRoute to point to static/index.html (not static/a/index.html which no longer exists)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…static files

- dev.md: document debug:true and userid options for bypassing OAuth locally
- config.default.yaml: add debug:false as an optional documented field
- Remove montage/static/a/index.html and static/dist/ (obsolete pre-Vue frontend files)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…keys

Add debug_userid and debug_username config options so developers can set
which account they are auto-logged in as when debug: true, instead of
always defaulting to Slaporte. Document the /complete_login workaround
for bypassing OAuth locally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- deployment.md: correct db_url format (mysql+pymysql://), document
  log paths for both production and beta instances separately
- dev.md: remove non-existent config/ directory from project tree,
  fix Dockerfile -> dockerfile reference in Docker files section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…evision/filetypes

Closes #425. Relates to #505.

Changes:
- labs.py: new query using file/filerevision/filetypes tables; adds file_id;
  SELECT DISTINCT to prevent duplicates from multiple linktarget rows
- rdb.py: add file_id column to Entry; deduplicate entries case-insensitively
  in add_entries() to match MariaDB utf8mb4_unicode_ci collation
- loaders.py: pass file_id through make_entry(); update export dicts
- tests: update fixtures and assertions for new query shape
- tools/migrate_prod_db.sql, revert_prod_db.sql: production schema migration
- requirements.txt: cffi 1.17.1, setuptools pin for Python 3.13
- deployment.md: update python3.9 → python3.13 references

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without frontend/.env.production, the .env file is gitignored on
Toolforge so VITE_API_ENDPOINT is undefined at build time, causing
Axios baseURL to resolve to undefined/v1/ and breaking all API calls.

Verified working on montage-dev (2026-04-20).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n debugging section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Step 6: replace vague bullet list with cp command, chmod, and annotated config template
- Step 7: fix venv creation (python3.13, --without-pip + curl bootstrap)
- Step 8: fix path (tools/create_schema.py, not montage/create_schema.py)
- Fix python3.11 → python3.13 throughout
- Fix log paths (no logs/ subdirectory)
- Fix step numbering in "Deploying new changes" (was 1,2,3,5,8,9)
- Add cd ~ before restart commands (must run from ~, not repo)
- General grammar and clarity pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…step

Brings deployment.md up to date from tools/build-frontend-script (PR #534):
- Replace vague bullet instructions with actual commands
- Add config template with openssl cookie secret generation
- Fix venv creation (python3.13, --without-pip + curl bootstrap)
- Fix tools/create_schema.py path
- Fix log paths (no logs/ subdirectory)
- Add step 0 (optional clean slate) with backup of irreplaceable files
- Add venv rebuild procedure to debugging section
- Fix python3.11 → python3.13 throughout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ge reinstall

- reinstall.sh: backs up config, wipes home dir, clones fresh, restores config,
  builds frontend, inits schema. Interactive confirmations before destructive steps.
  Refuses to run on production (montage) account. Detects service running under
  unexpected Python version and warns.
- reinstall_venv.sh: rebuilds venv inside webservice shell pod using --without-pip
  + curl bootstrap (avoids subprocess hang in pod).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dump usernames with is_organizer=1 to ~/backup/organizers.txt before
the wipe, then re-insert them after schema init so organizer access
survives a clean-slate reinstall.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
deploy.sh handles routine updates (git pull + build + restart), with
--no-frontend and --pip flags. tools/steps/pip_install.sh and
restart_service.sh are reusable building blocks; reinstall_venv.sh
now delegates pip install to the step script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
lgelauff and others added 29 commits May 3, 2026 15:34
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
toolforge webservice stop returns non-zero when no service is running,
causing set -e to silently exit the script. Add || true to the command
substitution so the exit code is always 0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If a previous run created ~/backup/ and exited early, the directory
may have wrong permissions on the next run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Avoids permission errors when re-running reinstall.sh after a partial
run — if the backup already matches the source, no copy is needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three-pass wipe for NFS-backed directories; clone step explicitly
removes any leftover src directory before cloning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If running from NFS (/data/project/), copy script to /tmp and re-exec
before the wipe so bash holds no open handles on the NFS filesystem.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The webservice pod holds NFS file handles. Without waiting for it to
terminate, rm -rf fails silently on the open directories.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of silently failing at git clone, detect when ~/www/python/src
survives the wipe (webservice pod still holding NFS handles), try a
file-by-file deletion as a second attempt, and if still stuck exit with
clear step-by-step recovery instructions.
CLONE_ERR=$(git clone ...) with set -e exits the script before
if [$? -ne 0] runs when clone fails. Switch to $() || { } form.

Also detects 'already exists' in the error and prints the NFS-lock
recovery steps inline rather than showing a generic error.
rm -rf removes contents but NFS prevents removing the directory itself.
Strategy: delete all files first, then rmdir the empty directory — rmdir
succeeds even when rm -rf on the parent tree fails.
If the src directory survives all deletion attempts (NFS lock on the
tools/ subdirectory), fall back to initialising a git repo in-place
and doing fetch + reset --hard + clean instead of clone.
- Preserve ~/.kube/ during reinstall wipe so toolforge-jobs keeps its
  kubeconfig and the frontend build doesn't break.
- Use Python import instead of regex to get ENV_NAME, with 'dev' as
  fallback. Guard against ENV_NAME='default' so credentials never get
  written into the committed config.default.yaml template.
…t from lockfile

The lockfile version and the version npm actually installs can diverge
(cross-platform lockfile + npm 9), causing a host/binary mismatch.
Reading the version after npm install ensures they always match.
Curl-bootstrapping pip is faster than a full venv rebuild when only pip
itself is missing. Seen on montage-beta 2026-05-04.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant