Anchor is a high-fidelity machine learning system engineered to map SaaS telemetry and preemptively identify revenue leakage. Moving beyond traditional "black-box" models, Anchor leverages Explainable AI (XAI) to provide transparent, strategic directives—not just predicting who will churn, but illuminating exactly why.
In the early stages of Anchor, the system reported a 99.8% Accuracy. In a production SaaS environment, this is almost always a symptom of Data Leakage.
The primary culprit was tenure_days. By including historical tenure in the training set without strict temporal partitioning, the model "learned" that customers who stay long don't churn—a circular logic that provides zero predictive value for new cohorts.
We re-engineered the feature space to prioritize Product Velocity and Seating Dynamics:
- The Shift: We intentionally decoupled temporal bias, moving from a 99% "leaky" metric to a 44% True Positive Benchmark.
- Why 44%? In SaaS retention, a 44% precision on a balanced signal is "High-Signal." It represents a realistic, actionable window where intervention strategies (CSM outreach, plan adjustments) yield the highest ROI without the noise of false positives generated by overfitted models.
The Anchor console is designed for high-signal visibility, transforming complex telemetry into actionable attritional risk levels with real-time SHAP analysis.
Figure 1: High-fidelity predictive console with real-time risk assessment.
Anchor doesn't just provide a probability score; it breaks down the Influence Drivers behind every prediction. This allows CSM and Revenue teams to see exactly which features (e.g., mrr_per_seat, seat_growth_ratio) are pushing a customer toward churn.
Figure 2: Prediction with SHAP-based influence drivers providing transparent "Why" behind predictions.
Anchor operates as a clean monorepo, orchestrating a polyglot stack designed for sub-second inference and production stability.
- Backend: FastAPI (Python 3.13) utilizing Poetry 2.0.1 with PEP 621 compliance.
- Frontend: React 19 + Vite 8 + Framer Motion.
- The Proxy Strategy: To resolve common Docker networking "404/Refused" issues, we employ an Nginx Reverse Proxy. This layer handles the routing between the static React frontend and the FastAPI inference engine, ensuring a unified origin and seamless CORS management.
anchor/
├── backend/ # FastAPI Inference Engine
│ ├── src/ # Modular Python Logic
│ │ ├── serving/ # API Endpoints & Uvicorn entry
│ │ ├── training/ # LightGBM + Optuna Pipelines
│ │ └── shared/ # Pydantic Settings & Shared Config
│ ├── models/ # Local DVC symlinks for .pkl artifacts
│ └── Dockerfile # Optimized Python 3.13 Slim image
├── frontend/ # React Console
│ ├── src/ # Component architecture
│ ├── Dockerfile # Multi-stage Nginx production build
│ └── Dockerfile.dev # Vite HMR-enabled dev environment
├── data/ # DVC-tracked telemetry (GCS Remote)
└── docker-compose.yml # Multi-container orchestration
Anchor utilizes a 13-feature input vector (including mrr_per_seat, seat_growth_ratio, and billing_frequency) to calculate attritional risk.
- Optimization: We used Optuna to perform Bayesian optimization over the LightGBM hyperparameter space, focusing on
is_churnrecall. - Inference: The model is served via joblib, achieving sub-50ms inference latency.
- Explainability: Every prediction is passed through a SHAP (SHapley Additive exPlanations) explainer. This transforms the raw probability into a set of Influence Drivers, showing the top 3 factors (e.g., "Plan Downgrade" or "Low MRR per Seat") pushing a customer toward churn.
To keep the Git repository lightweight, all model artifacts (.pkl) and raw telemetry datasets are managed by DVC (Data Versioning Control).
- Remote: Google Cloud Storage (GCS).
- Workflow: Run
dvc pullto synchronize the localmodels/folder with the latest production-validated weights.
Initialize the entire protocol in a single command:
git clone https://github.com/your-repo/anchor.git
cd anchor
docker compose up --build- Frontend Console: http://localhost:3000
- Inference API (Swagger): http://localhost:8000/docs
During the containerization of the frontend, we initially faced build failures with native ARM64 bindings.
- Resolution: Switched to
node:20-alpineand explicitly includedlibc6-compatin the APK layer to support high-performance native dependencies required by modern JS toolchains.
Direct browser-to-API calls frequently failed in containerized environments due to host-resolution mismatches.
- Resolution: Implemented the Nginx proxy to map
/predictcalls to theapiservice internally. This allows the frontend to call its own origin, letting Nginx handle the cross-service bridge.
Standard joblib serialization can lose pandas categorical types.
- Resolution: Implemented a robust metadata reconstruction layer in
src/serving/app.pythat maps the booster's internalpandas_categoricallist back toCategoricalDtypeon the fly during inference.
Anchor v1.0 — Mapping the future of retention.