Single-file installer for a club-3090 inference host, providing a full-featured browser admin control panel, distro-aware install/update logic, systemd service setup, a reverse proxy that automatically routes requests to the correct containers via a single endpoint with optional configurable vLLM setting presets, live log management, GPU power and fan speed controls, multi-instance GPU orchestration, and API access control for multiple users.
This repository is the server-management layer. It integrates with the upstream club-3090 runtime and, on a fresh install clones that upstream repo automatically into /opt/ai/club-3090 before wiring up the control plane.
- Users are encouraged to first deploy club-3090 to /opt/ai/, follow the Quick Start Guide in the original project and ensure that everything works correctly before attempting to deploy Club-3090-Server. If /opt/ai/club-3090 is not found, the script will automatically clone the latest repo for you and deploy it in that location and attempt to run setup.sh and set everything up for you anyway, but it's best to first familiarize yourself with the original project's strengths and ensure it meets your needs and works correctly before attempting to host a server.
- The latest release of this script may or may not be up to date with the original repo's fast evolving architecture. To avoid unforeseen issues, make sure the commit of club-3090 you're using with club-3090-server matches the expected version from the latest release. You may also simply trust the installer and let it pull the latest commit but your YMMV depending on whether the original project has introduced breaking changes since this script was last updated.
- Fully self contained script that handles everything you need to self-host a club-3090 based inference server, optimized for headless Arch and Debian/Ubuntu based distros
- A control stack installed under
/opt/club3090-controland an admin web panel running onlocalhost:8008/admin - An OpenAI-compatible proxy on
:8009so you can chat with the LLM on a unified port regardless of what docker containers are in use. - GPU-aware backend selection with per-GPU and multi-instance runtime management for single-card, dual-card or multi-card presets with dynamic runtime inventory discovery from the local
club-3090checkout - A modern scope model built around
GLOBAL,GPU<n>, andPAIRx_ytargets, including built-in sequential auto-pairs and custom user-defined pairs - A Presets experience that includes model-summary relaunch surfaces, Setup Assistant recommendations, selector-scoped launch overrides, and a first-class Custom Model import path backed by upstream
pull.sh - Built-in local inference chat with server-persisted conversations across browsers/devices, per-conversation saved settings, folders, automatic conversation naming, system-prompt templates, rich markdown/media rendering, collapsible thinking trace rendering, local attachments, and automatic context compaction/rollover
- Live Docker, Audit, and Debug log streaming that stays aware of multiple backend containers, including an interactive debug-shell lane with autocomplete, history, upload/download helpers, and zip-based folder transfers
- Preset launch state that only flips to
Activeafter the backend fully finishes booting, with boot/error browser notifications and fast failure reporting when a compose never actually starts a container - UI-driven model download/setup jobs with stdout/stderr streamed into Audit Logs
- Hardware metrics, uptime, health reporting, and backend auto-start and manual control, as well as full server remote management controls
- Optional GDDR6/GDDR6X junction and VRAM temperature telemetry in the Main GPU cards and Metrics graphs, using a vendored helper source/header compiled locally when supported
- Power presets, fan speed control, idle downclocking, and system information for remote server management
- Create dynamic instances and run multiple docker containers per GPU or GPU Pairs and be able to run inference on a single unified endpoint with different configurable vLLM setting presets
- Optional per-user API key authentication, access control, and optional quota enforcement
- An optional loopback-only automation API for local tools on the same machine
- Optional
--onlinemode that opens and forwards only the admin/proxy ports to the internet - Optional
--online use-httpsmode that enables a Caddy-backed self-signed HTTPS frontend for the admin panel and proxy - Docker logs automatically printed to tty1 and last instance autostart if the kernel condition club3090.server=1 is active
- Optional
--skip-tempsinstaller switch for operators who do not want the extra temperature helper, dependency install, bootloaderiomem=relaxedstaging, or reboot notice
This section is written for someone who just wants the server working and may not be so familiar with the complexities of LLM configuration.
- club-3090 is the stack that actually runs the AI models
- this server script adds a browser-based admin panel, a stable proxy port, logs, power controls, and easier model/preset management
- after setup, you can control everything from the admin panel instead of manually typing long terminal commands
Make sure your Linux machine already has:
- a supported NVIDIA GPU
- working NVIDIA drivers
- Docker
- internet access
- a normal Linux user account that can use
sudo
If you plan to download gated Hugging Face models, also have your Hugging Face token ready.
By default the installer also prepares optional GDDR6/GDDR6X junction and VRAM temperature telemetry for supported NVIDIA cards. It vendors the helper source and NVIDIA NVML header inside the installer, compiles the helper locally against the host NVIDIA/libpci libraries, stages the required iomem=relaxed kernel command-line flag when it is not already active, and tells you to reboot only when that reboot is needed for the new readings to work.
Run the installer directly:
curl -fsSL https://tinyurl.com/club-3090-webserver | bashIf you already downloaded this repo locally, you can also run:
bash install-club3090-server.shIf you need gated model downloads and already have a Hugging Face token, use:
curl -fsSL https://tinyurl.com/club-3090-webserver | \
HF_TOKEN=hf_xxx bashOn a fresh machine, the installer will create /opt/ai/ if needed, clone the upstream club-3090 repo into /opt/ai/club-3090, fix script permissions, and then continue with the normal setup.
When the installer finishes, open a browser and go to:
http://YOUR-SERVER-IP:8008/admin
If you installed with custom ports, replace 8008 with your chosen admin port.
When the login screen appears, sign in with your normal Linux user credentials.
Use:
- your Linux username
- your Linux password
In other words, log in with the same account you would normally use for sudo or for signing into that Linux machine. The admin panel does not create a separate default password for you.
After logging in, open the Presets tab.
You will see discovered model presets from the local club-3090 checkout. Some presets may show that downloads are still required.
If a preset is not ready:
- Click its
Downloadbutton. - Wait for the download/setup job to finish.
- Watch progress in
Audit Logs.
If you prefer to prepare the model manually in the terminal first, a common example is:
cd /opt/ai/club-3090
bash scripts/setup.sh qwen3.6-27bSome advanced presets, such as DFlash variants, may require extra model files. The admin panel will usually tell you what command is needed.
Still in the Presets tab:
- Pick the model you want.
- Pick a preset that says it is ready.
- Click
Apply. - The UI will jump into Docker Logs immediately so you can follow the launch.
- Wait while the instance launches.
- When startup completes successfully, the ready badge flips to
Active, the card shows how long launch took, and the action button becomesStop
If something fails, the preset will show Error and the logs will show what happened.
For most beginners:
- Use a single-GPU preset if you only want one model running on one GPU, or the global scope to automatically distribute presets to all available GPUs.
- Use the default or recommended preset first before trying special variants like long-context, dual-card, turbo, or DFlash
- Test if inference is working correctly via the built-in chat interface and watch the generated statistics
Once a preset is active, the proxy is usually available on:
http://YOUR-SERVER-IP:8009/v1
That proxy gives you one stable API endpoint even if you switch between different backends later.
You can test it with:
curl http://YOUR-SERVER-IP:8009/v1/modelsOr with a chat request:
curl -s http://YOUR-SERVER-IP:8009/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.6-27b-autoround","messages":[{"role":"user","content":"Hello"}],"max_tokens":128}'Here is the simple mental model for the main tabs:
Main: quick status, running containers, uptime, and GPU overviewSystem: service controls, power/fan controls, and machine informationPresets: download models, start/stop presets, and manage which runtime is activeChat: built-in local streaming chat client with server-backed conversation storage, lazy detail loading, per-conversation stats caching, share/export links, configuration, and local or remote MCP serversUsers: create API users and keys if you want to share the server safelyMetrics: request counts, usage history, and runtime performance dataLogs: Docker logs and audit logs for troubleshooting
After installation, most people will want to do these things:
- Log in with their Linux username and password.
- Open
Presets. - Download the model files the desired preset needs.
- Click
Launchon a recommended preset. - Wait until it becomes
Active. - Copy the proxy base URL
http://YOUR-SERVER-IP:8009/v1into their AI client.
Most OpenAI-compatible apps need:
- Base URL:
http://YOUR-SERVER-IP:8009/v1 - API key: only if you enabled user/API-key protection
- Model name: whichever model the active backend serves
If the client asks for an endpoint and does not mention presets, start with the plain proxy URL above.
Try these checks in order:
- Confirm the admin panel opens.
- Confirm you can log in with your Linux user credentials.
- Confirm the preset you selected finished downloading.
- Confirm the preset reached
Active, not justBooting. - Read
Docker LogsandAudit Logsfor the exact failure message. - Confirm Docker and the NVIDIA driver are working on the Linux host.
If you changed the upstream repo manually or pulled a new upstream version, run update or migrate again so the control layer and upstream runtime stay in sync.
- Arch
- Manjaro
- EndeavourOS
- Ubuntu
- Debian
- Linux Mint
- Pop!_OS
The script reads /etc/os-release and installs the right package names for the detected family.
- Linux with
systemd - NVIDIA drivers already installed
- Docker already installed and usable
- Docker Compose plugin or compatible
docker compose - Internet access so the installer can clone the upstream
club-3090repo into/opt/ai/club-3090on first install
As of v0.5, the server no longer ships a hardcoded preset catalog. Instead it scans the local upstream repo under /opt/ai/club-3090, parses compose headers and compose files directly, and builds /opt/club3090-control/runtime_inventory.json.
That means the Presets tab now reflects whatever models and variants exist in the checked-out upstream repo, including:
- single-GPU presets
- dual-GPU presets
- multi-GPU presets
- experimental or caveat-marked upstream variants
Upstream switch tags such as vllm/default, vllm/dual, vllm/gemma-mtp, and llamacpp/default are still recognized when present, but the control layer can also launch compose variants that are only discoverable by scanning the repo tree.
Since the broader v0.8.x and v0.9.x migrations, the inventory path is also registry/profile-aware and understands the newer nested upstream compose hierarchy, so the admin panel can surface richer upstream metadata such as:
- preview, production, experimental, and caveat-marked variants
- structured launch defaults pulled from compose/profile metadata instead of a local hardcoded table
- compose-registry aliases, shared profile resources, and newer upstream families such as
autoround-int4,awq,unsloth-q4km,ubergarm-iq4ks, andex0bit-prism - profile-driven asset planning for model paths, draft models, projectors, Hugging Face cache-backed assets, directory-backed vLLM resources, and setup-script driven resources
- family summaries and model-summary relaunch cards
- built-in and custom API-preset aware routes
- Custom Model imports that register as persistent local runtime cards once upstream import succeeds
The older single-mode/legacy-dual distinction has gradually been replaced by a clearer scope-oriented control model.
The admin panel now thinks in terms of:
GLOBAL: fan a compatible preset out across every eligible target automaticallyGPU0,GPU1,GPU2, ...: explicit single-card targetsPAIR0_1,PAIR2_3, ...: dual-card targets- built-in automatic sequential pairs plus custom user-defined pairs
Important behaviors added across the v0.7.x and v0.8.x series include:
- built-in auto-pairs are preserved separately from custom pairs
- Global single-card and Global dual-card launches fan out in parallel instead of waiting for each target to finish before starting the next one
- generated compose overrides now keep host GPU reservation separate from the in-container CUDA ordinal view, which is critical for higher-index GPUs and pairs
- preset cards can expose selector-scoped Launch Settings overrides sourced from compose/profile metadata
- preset cards show richer disk/download status, shared-resource markers, provenance tooltips, and safer delete dialogs that preview the exact files and reclaim size
- per-runtime Generation Statistics reflect the actual launched context and runtime state rather than only a static catalog guess
- Setup Assistant / Preset Recommendation Survey can recommend curated presets based on detected hardware and desired rollout style
If you use many presets often, the Summary view is also worth calling out. It keeps a relaunch-oriented cache of recent and active presets per model, including temporary boot entries and bulk stop/restart actions.
One of the biggest v0.8.x additions is the first-class Custom Model flow in the Presets tab.
Instead of manually treating every non-curated model as a one-off local hack, the panel now supports:
- a dedicated
[+] Custom Modelentry point - import flows that delegate to upstream
scripts/pull.sh - streaming import progress and gate/caveat output into Audit Logs
- reference-profile guidance so imports can reuse a known-good compose shape
- persistent registration of successful imports as local runtime cards
- removal flows that clean up the registered runtime state instead of leaving orphaned cards behind
This keeps the UI model-first while still deferring the actual import/evaluation logic to upstream Club-3090.
Default install:
curl -fsSL https://tinyurl.com/club-3090-webserver | bashCustom upstream checkout path and default mode:
curl -fsSL https://tinyurl.com/club-3090-webserver | \
CLUB3090_DIR=/opt/ai/club-3090 DEFAULT_MODE=vllm/dual-dflash bashPass a Hugging Face token inline when gated downloads are needed:
curl -fsSL https://tinyurl.com/club-3090-webserver | \
HF_TOKEN=hf_xxx bashRunning a local copy:
bash install-club3090-server.shCustom admin and proxy ports:
bash install-club3090-server.sh --ports 18008:18009Enable the local automation API:
bash install-club3090-server.sh --local-automationPersist a Hugging Face token into the local club-3090 repo .env so setup jobs and later admin-panel model downloads can reuse it automatically:
bash install-club3090-server.sh --hf-token hf_xxxPublic internet exposure:
bash install-club3090-server.sh --onlinePublic internet exposure with TLS layer:
bash install-club3090-server.sh --online use-httpsSkip optional junction/VRAM temperature telemetry setup and keep the old install/update behavior:
bash install-club3090-server.sh --skip-temps--skip-temps bypasses helper unpacking, compiler/libpci dependency installation, bootloader iomem=relaxed staging, and the related reboot notice.
Updates the control layer without intentionally stopping an already-running backend:
curl -fsSL https://tinyurl.com/club-3090-webserver | bash -s -- --updateOr from a local copy:
bash install-club3090-server.sh --updatePublic-mode update:
bash install-club3090-server.sh --update --onlineUpdate with custom ports:
bash install-club3090-server.sh --update --ports 18008:18009Use --migrate when the upstream repo checkout itself needs to be replaced with a fresh clone while preserving downloaded model assets and runtime state:
curl -fsSL https://tinyurl.com/club-3090-webserver | bash -s -- --migrateOr from a local copy:
bash install-club3090-server.sh --migrateIf an interrupted migration needs to be discarded and restarted from scratch instead of resumed:
bash install-club3090-server.sh --migrate restart--migrate:
- compares the local checkout head to the latest upstream head and skips repo replacement entirely if they already match
- renames the old checkout to a timestamped backup
- clones a fresh upstream
club-3090repo into the original path - preserves
models-cache, safe compose cache directories, and.env - logs the resolved old and new
MODEL_DIRpaths during migration and mirrors preserved weights into the effective post-merge model directory when it differs from the repo-local default - normalizes older preserved Qwen GGUF assets into the paths expected by the newer upstream compose/profile tree
- preserves old Genesis patch trees in the migration backup instead of copying them back into the fresh repo before
setup.shreclones them - rebuilds the dynamic runtime inventory
- replays the required upstream setup commands and re-runs only the model setup commands required by the currently configured presets
- writes resumable migration state to
/opt/club3090-control/migration-state.env - emits continuous step-by-step progress and heartbeat messages during long-running phases like clone, setup, and update
- resumes interrupted migrations automatically on the next
--migrate - deletes the resume-state file automatically after a successful migration
- accepts Hugging Face credentials either through
HF_TOKEN=...or--hf-token ..., then stores that token in the repo.envso later setup and admin-panel download jobs can reuse it
If DEFAULT_MODE is not set, the installer counts GPUs with nvidia-smi and chooses:
vllm/defaultwhen 0 or 1 GPU is detectedvllm/dualwhen 2 or more GPUs are detected
For testing or overrides, you can force the detected count:
CLUB3090_GPU_COUNT_OVERRIDE=1 bash install-club3090-server.shYou can still hard-set a default mode:
DEFAULT_MODE=vllm/long-text bash install-club3090-server.shWhen the server is using single-card presets, the control service can manage one backend instance per GPU. Each instance has:
- its own GPU binding
- its own docker project/container naming
- its own backend port
- its own admin-panel state
- its own proxy path prefix
- its own boot autostart toggle
The instance catalog is stored in:
/opt/club3090-control/instances.json
On multi-GPU systems, this lets you run:
- the same single-card preset on every GPU
- different single-card presets on different GPUs
- only a subset of GPUs
This is intentionally separate from the legacy dual-GPU presets. If no per-GPU single-card instances are enabled, the control service can still fall back to the old upstream dual-mode flow.
In newer releases this model also extends cleanly into the scope-based control plane:
- instance relaunch, boot tracking, and log handoff are scope-aware
- popout log viewers keep the selected runtime target instead of collapsing back to a generic bootstrap snapshot
- runtime readiness for non-vLLM stacks such as llama.cpp and ik-llama is tracked by real local API reachability, not only by vLLM-style log markers
- higher-index GPU scopes and pair scopes are protected by the remapped env/export fixes added during the
v0.8.4xcycle
When you install or update with --online, the script treats the control service as the only public surface.
That means:
- only the configured admin and proxy ports are opened in the firewall
- only the configured admin and proxy ports are attempted for UPnP forwarding
- raw backend ports such as
8010-8020and per-instance ports like8200+are intentionally left private - all internet-facing inference traffic is expected to go through the proxy
This prevents security bypass around API-key checks, user limits, and per-instance access controls.
If you also pass use-https, the installer binds the Python control service to loopback-only internal ports and places Caddy in front of it as the public HTTPS endpoint for both admin and proxy traffic.
When HTTPS mode is enabled, the installer places Caddy in front of the admin UI and proxy on the public admin/proxy ports. Direct IP access uses the self-signed certificate path under /opt/club3090-control/tls.crt and /opt/club3090-control/tls.key by default. For a browser-trusted certificate, point a DNS or DDNS hostname at the server and set CLUB3090_HTTPS_HOST or CLUB3090_ONLINE_HOST; Caddy will then request and renew a normal Let's Encrypt certificate for that hostname. Direct IP Let's Encrypt certificates are only attempted with the explicit --online use-https:cert-ip mode, or when you explicitly set an IP host and CLUB3090_ENABLE_IP_LE=true.
If the server is already on Tailscale, --online use-https:tailscale detects the node's MagicDNS name, such as aardwolf-halfmoon.ts.net, and writes Caddy routes for that hostname. This avoids router port-forwarding for certificate issuance and is intended for access from devices inside the tailnet. Public internet access still requires Tailscale Funnel or a separately configured public hostname/DDNS path.
--online use-https:tailscale:enable-funnel additionally configures Tailscale Funnel. Because Funnel only exposes HTTPS on selected public ports, the installer maps the admin UI to https://YOUR-NODE.ts.net/admin on public port 443 and the proxy API to https://YOUR-NODE.ts.net:8443/v1/chat/completions. Funnel may require interactive approval in Tailscale or a tailnet policy update the first time it is used.
Tailscale HTTPS modes skip UPnP installation, router port-forwarding, and UPnP cleanup because tailnet/Funnel routing does not depend on router NAT mappings. The installer materializes the Tailscale certificate with tailscale cert into /opt/club3090-control/tailscale.crt and /opt/club3090-control/tailscale.key, then points the hostname-specific Caddy route at those files.
The admin and proxy ports are configurable at install/update time with:
--ports ADMIN:PROXY
Example:
bash install-club3090-server.sh --ports 18008:18009Defaults remain:
- admin:
8008 - proxy:
8009
The chosen ports are propagated into the generated control service and, when --online is used, those same ports are the ones opened in the firewall and requested through UPnP. In use-https mode, those public ports are owned by Caddy while the control service moves behind it onto internal loopback-only ports. The installer also opens ports 80 and 443 when needed so Caddy can complete ACME validation for automatic certificates.
- Admin UI:
http://SERVER:8008/admin - Default proxy base:
http://SERVER:8009/v1 - Default chat completions:
http://SERVER:8009/v1/chat/completions
Per-GPU routes are also supported:
http://SERVER:8009/GPU0/v1/chat/completionshttp://SERVER:8009/GPU1/v1/chat/completionshttp://SERVER:8009/GPU2/v1/chat/completions
Custom preset routes also work globally and per GPU:
http://SERVER:8009/v1/<preset>/chat/completionshttp://SERVER:8009/<preset>/v1/chat/completionshttp://SERVER:8009/GPU0/v1/<preset>/chat/completionshttp://SERVER:8009/GPU1/<preset>/v1/chat/completions
Length-capped variants remain available with short- and concise- prefixes.
The admin web panel uses Linux account credentials through pamtester and is served from:
http://SERVER:8008/admin
With --online use-https, the public entrypoint becomes:
https://SERVER:8008/admin
The inference proxy can run in either mode:
- open proxy mode, where requests to
:8009are allowed without a per-user API key - authenticated proxy mode, where OpenAI-compatible API keys are required
This is controlled from the Users tab in the admin panel. In --online installs, authenticated proxy mode is enabled by default.
Per-user API entries support:
- generated API keys
- optional group/plan membership for shared policies
- enable/disable state
- allowed backend targets such as
legacy,GPU0,GPU1, or* - total request limits
- request limits per 5 hours
- request limits per week
- total token limits
- total tool-call limits
- total thinking-time limits
The proxy enforces those controls before forwarding a request to the backend.
User groups/plans can define shared allowed-target rules and quota defaults, so multiple users can inherit the same service tier without repeating the whole policy by hand.
If installed or updated with --local-automation, the control service also exposes a loopback-only API:
http://127.0.0.1:10881
This API is intentionally not part of the online/public surface. It is:
- bound only to loopback
- protected with a local token stored in
/opt/club3090-control/local_api_token
Typical uses:
- automatic user creation
- automatic user deletion
- user quota updates
- API key rotation
The current implementation exposes local user-management and server-config endpoints for same-machine tools.
--online, use-https, and --local-automation are explicit opt-in flags on each install or update run. If you omit them on a later --update, the installer turns those surfaces back off and removes the tracked firewall and UPnP exposure it previously created.
The admin UI is designed to control the whole server from one place. It exposes:
- current active backend, service state, and uptime
- request counters and recent latency/token metrics
- GPU, RAM, CPU, storage, and network telemetry
- optional GPU core, junction/hotspot, and VRAM temperature telemetry when the helper is installed and the kernel allows MMIO access
- per-instance GPU subtabs for multi-GPU systems
- dynamic model cards for upstream-discovered single, dual, multi, and experimental presets
- model-summary view that caches up to the latest five active or previously active presets per model as a quick relaunch surface, including temporary boot entries and bulk stop/restart controls
- Setup Assistant / Preset Recommendation Survey for guided preset selection from detected hardware and workload intent
- first-class Custom Model import, registration, and removal flows
- per-instance preset assignment for single-card runtimes
- per-instance start, restart, stop, and boot-autostart toggles with boot-progress log handoff, active/booting/error state badges, and launch-time reporting
- one-click runtime inventory rebuild from the web panel
- preset-aware model downloads that stream installer output into Audit Logs
- direct Hugging Face downloads, setup-script downloads, and missing-size file probes report progress through the same Audit Log job path
- selector-scoped Launch Settings / engine-switch override editing sourced from compose/profile defaults
- per-runtime generation stats cards that aggregate live latency, TTFT, throughput, context fill, KV-cache usage, prefix-cache, speculative/MTP drafted and accepted tokens, acceptance rate, and token counters across all running instances
- a local inference chat interface with realtime streaming, container and API-preset selection, richer markdown rendering, multi-attachment image/text upload support, paste-to-Markdown attachment conversion for long pastes, optional browser voice dictation, shareable Markdown conversation exports, a compact modal chat-settings editor, per-conversation plus per-runtime generation stats, and archived-chat restore flows
- chat conversations are stored server-side in
/opt/club3090-control/conversations/state.json, with the conversation list loading first and individual transcripts fetched on demand to keep large histories responsive - chat conversations archive by default from the Chats tab so they can be restored later from Chat Options; hold
Shiftwhile using the archive button for permanent deletion instead - assistant transcript bodies now favor a plain-text presentation path for stability, while preserving the expandable reasoning panel, attachments, exports, and per-runtime stats
- live chat recovery across refreshes, including persisted partial assistant output and server-side stream-state tracking
- per-conversation stop/recovery handling so refreshed pages can still abort an active server-side generation
- chat transcript auto-follow, fit-to-content resize helpers, and lighter live streaming paths for long responses
- optional MCP integration for the local chat interface, with UI-managed add/enable/disable flows for both local stdio commands and remote MCP URLs so enabled tools can be exposed to the model during chat
- a Users tab for API-key users, quotas, access rules, and proxy-auth policy
- global power profile management
- fan control and optimization toggles
- Wake-on-LAN support
- machine restart and shutdown controls
- self-update controls that read shipped release metadata directly from
build/metadata.json - update flows lock the panel onto updater logs while the updater is running and resume update monitoring after a page reload
- custom API preset creation, editing, and deletion
- live Docker and Audit logs streaming with search tools
- an interactive Debug Logs shell lane with inline ghost completion, command history, file upload/download helpers, and shell-aware zip transfers for files, folders, and glob patterns
- shared searchable audit-log streaming backed by
/opt/club3090-control/audit.log
Authentication uses Linux usernames and passwords through pamtester.
On Ubuntu and Debian-family systems the installer pulls pamtester from apt. On Arch-family systems it first tries the normal package path and then falls back to yay or paru if needed, failing safely with a clear manual-install message if it still cannot be installed.
The console log follower and browser log stream both understand the newer instance-aware docker naming scheme.
That means:
- single backend mode still works
- multiple backend containers can be tailed correctly
- logs remain usable even when container names differ by instance or runtime
- the admin panel can subscribe to the selected instance's log stream
The browser-side logging stack is also much richer now than the early v0.7.x path:
- Docker, Audit, Debug, updater, and upstream-service streams can all be switched in-panel
- detached/popped-out viewers stay pinned to the same runtime target
- Audit Logs carry long-running installer and model-download progress instead of stalling on carriage-return updates
- Debug Logs include an interactive bash command box with autocomplete, history, and transfer actions
- the debug-shell transfer helpers can upload files, zip and upload folders, and zip/download explicit files, folders, or glob-expanded results from the current shell working directory
- updater logs remain selected during self-update runs so duplicate updates and accidental log-source switches are avoided
The built-in local chat client has grown significantly since early v0.7.x and the v0.9.x stabilization pass. In addition to basic local chat, it now includes:
- server-persisted conversations with lazy transcript loading
- per-conversation state recovery after refreshes or network interruptions
- per-conversation and per-runtime generation statistics
- support for reasoning/thinking traces with collapsible display
- image and text attachments, including paste-to-Markdown conversion for long pasted text
- optional browser voice dictation
- share/export flows for conversation Markdown
- optional local and remote MCP tool integration
- streaming paths that prioritize transcript stability and progressively lighter render/update behavior during long generations
- incremental stream-metrics handling so per-message and conversation Generation Statistics update before the final response completes
The chat renderer and syntax pipeline also received a long series of fixes across the v0.7.x, v0.8.x, and v0.9.x lines:
- streamed fenced code and inline code render more reliably while text is still arriving
- syntax colors now come from the normalized
code_syntax.jsontheme/config instead of brittle embedded fallbacks - Markdown list/inline-span split bugs were fixed for live streaming
- HTML/XML fenced blocks, footnotes, Mermaid export helpers, and admonition rendering were all improved
- The control service stores the active default mode in
/opt/club3090-control/active_mode - It stores the last known-good legacy mode in
/opt/club3090-control/last_good_mode - It stores per-instance configuration in
/opt/club3090-control/instances.json - It stores the scanned upstream runtime registry in
/opt/club3090-control/runtime_inventory.json - It stores public/auth policy in
/opt/club3090-control/server_config.json - It stores group/plan definitions in
/opt/club3090-control/groups.json - It stores tracked online firewall and UPnP state in
/opt/club3090-control/network_state.json - It stores chat conversations and chat prompt-template state in
/opt/club3090-control/conversations/state.json - It stores API users, quotas, and usage accounting in
/opt/club3090-control/users.json - It stores the loopback automation token in
/opt/club3090-control/local_api_token - It applies the selected power profile before backend startup
- It can auto-start a backend when a proxied inference request arrives
- It can stop idle containers after a configurable quiet period
- It can downclock hardware during idle windows
- It can restore enabled per-GPU instances on boot
- It can fall back to the last known-good legacy mode if a startup attempt fails
- It can persist and resume migration/update state with metadata-aware self-update handling
- It can rebuild the upstream-derived runtime inventory after repo/profile changes
- It can keep shipped metadata, changelog text, and code-syntax config synchronized with the browser/admin payloads
- It can compile and install the optional vendored
gputempshelper under/opt/club3090-control/bin/gputempsfor junction/hotspot and VRAM temperature telemetry
The script writes these systemd units:
club3090-vllm.service- restores enabled per-GPU instances on boot, or falls back to legacy mode startup
club3090-control.service- hosts the admin UI and proxy
club3090-console-log.service- streams matching backend Docker logs to
tty1
- streams matching backend Docker logs to
club3090-headless-x.service- provides a private on-demand Xorg session for NVIDIA fan control
These services are gated by the kernel command-line flag club3090.server=1, so they can be installed without forcing server mode on every boot.
/opt/club3090-control/control.py/opt/club3090-control/start-vllm-last-mode.sh/opt/club3090-control/follow-vllm-log.sh/opt/club3090-control/prepare-headless-x.sh/opt/club3090-control/active_mode/opt/club3090-control/last_good_mode/opt/club3090-control/instances.json/opt/club3090-control/server_config.json/opt/club3090-control/groups.json/opt/club3090-control/users.json/opt/club3090-control/network_state.json/opt/club3090-control/Caddyfile/opt/club3090-control/local_api_token/opt/club3090-control/bin/gputemps/opt/club3090-control/include/nvml.h/opt/club3090-control/src/gputemps/gputemps.c/opt/club3090-control/src/gputemps/nvml.h/opt/club3090-control/control.log/opt/club3090-control/audit.log/etc/systemd/system/club3090-vllm.service/etc/systemd/system/club3090-control.service/etc/systemd/system/club3090-caddy.service/etc/systemd/system/club3090-console-log.service/etc/systemd/system/club3090-headless-x.service
- This script manages upstream
club-3090; it does not replace it - The design goal is to avoid forking upstream runtime files wherever possible
- Docker and NVIDIA drivers should be installed before first use
- Headless fan control depends on
nvidia-settingsand a private Xorg display - Extra junction/VRAM telemetry depends on a locally compiled helper that links against the host NVIDIA Management Library and libpci; the installer vendors the helper source/header, installs only the compile/link packages when the helper is missing, and can be bypassed with
--skip-temps - The
iomem=relaxedkernel switch is required for that extra telemetry on systems that otherwise block GPU MMIO mapping. It is useful on trusted single-user inference hosts, but it does relax a kernel I/O-memory safety boundary, so security-sensitive or shared systems may prefer--skip-temps - The installer attempts to install
nvidia-settings, Xorg,pamtester, OpenSSL, Caddy,miniupnpc, and firewall tooling as needed, and exits safely with a clear error if a required dependency still cannot be installed - The control layer is intended for unattended inference hosts and physical server consoles
install-club3090-server.sh: full installer/updaterREADME.md: feature and usage referencecontrol/: split-source Python backend modules used to build the shipped control planeweb/: split-source HTML/CSS/JS for the admin panelbuild/: build pipeline, metadata, and smoke tests used to compose the shipped single-file installerbuild/vendor/: vendored helper source/header payloads embedded into the installer for optional junction/VRAM temperature telemetry
The project has used a split-source build pipeline since the v0.7.0 refactor, but still ships a single integrated installer artifact. Day-to-day development happens in control/, web/, and build/, then build/build.py regenerates the monolithic script and bundled assets. Shipped control, updater, admin UI, code-syntax, and optional temperature-helper payloads are compressed into the installer so release artifacts stay deterministic and do not depend on helper repositories being reachable during install.









