ResisTrack — Agent Rules

Project-Specific Guidelines for AI Model Building

Project: ResisTrack · AI-Powered AMR Risk Prediction & Infection Control Platform
Team: Curelytics · Impact-AI-Thon 2026
Version: 1.0

1. Project Context & Mission

ResisTrack predicts antimicrobial resistance (AMR) risk in hospitalized patients within 6 hours of admission, before culture results are available (which take 48–120 hours). The platform ingests EHR data, lab results, vital signs, prior antibiotic history, and clinical notes to generate real-time AMR risk scores and stewardship recommendations.

Primary Goal: Close the diagnostic gap. Reduce inappropriate antibiotic prescribing. Support — never replace — clinical judgment.

2. Core Ethical & Safety Rules

These rules are non-negotiable and override all other instructions:

RULE-SAFETY-01: The model MUST NEVER issue antibiotic prescriptions, 
modify medication orders, or make autonomous treatment decisions.
All outputs are decision support — final authority rests with the clinician.

RULE-SAFETY-02: The model MUST NOT surface predictions with a 
calibrated confidence score below 0.60 without attaching a 
LOW_CONFIDENCE_FLAG = true to the output payload.

RULE-SAFETY-03: The model MUST flag when input data quality is 
insufficient (e.g., < 3 lab values in prior 72 hours, missing vitals) 
and must communicate data completeness score alongside the risk output.

RULE-SAFETY-04: Model outputs must NEVER be used to deny treatment.
They are risk stratification tools only.

RULE-SAFETY-05: No model version may be promoted to production without 
passing clinical validation on >= 1,000 patient records with documented 
sensitivity >= 0.80 and specificity >= 0.75.

3. Data Handling Rules

RULE-DATA-01: ALL patient data must be treated as Protected Health 
Information (PHI) under HIPAA. No PHI may leave the AWS VPC boundary.

RULE-DATA-02: The model must NEVER receive raw patient identifiers 
(name, SSN, DOB, MRN) as direct input features. All patient references 
must use tokenized internal IDs only.

RULE-DATA-03: Clinical notes passed to ClinicalBERT/BioBERT must be 
processed inside the VPC only. No external API calls (e.g., OpenAI, 
Anthropic) with patient note content.

RULE-DATA-04: Model training data must originate only from hospitals 
that have signed a Business Associate Agreement (BAA) and data sharing 
consent. Training on non-consented data is prohibited.

RULE-DATA-05: All model inputs and outputs must be logged to the 
audit trail in RDS with timestamp, user role, hospital_tenant_id, 
and de-identified patient_token. Log retention: 7 years minimum.

RULE-DATA-06: Training datasets must be de-identified per HIPAA 
Safe Harbor (removing all 18 PHI identifiers) before use in 
any non-production environment.

4. Input Feature Specification

4.1 Accepted Structured Features (XGBoost / Tabular Model)

Feature Name	Type	Source	Notes
`wbc_trend_7d`	float	LIS	White blood cell count — 7-day slope
`crp_latest`	float	LIS	C-Reactive Protein, most recent value
`creatinine_trend`	float	LIS	Creatinine 72h delta
`prior_beta_lactam_count`	int	Pharmacy	Count of prior beta-lactam Rx in past 90 days
`prior_fluoroquinolone_count`	int	Pharmacy	Count in past 90 days
`prior_carbapenem_flag`	bool	Pharmacy	Any carbapenem exposure in past 12 months
`icu_admission_flag`	bool	EHR	Is the current encounter ICU admission
`age_years`	int	EHR	Patient age — do NOT use DOB directly
`charlson_comorbidity_index`	int	Calculated	From ICD-10 codes in active problem list
`admission_ward_code`	categorical	EHR	Encoded ward ID (not ward name)
`days_since_last_hospitalization`	int	EHR	0 if no prior admission in system
`culture_positive_history_flag`	bool	LIS	Any prior positive culture on record
`isolation_flag_current`	bool	EHR	Active contact/droplet isolation order
`temperature_max_48h`	float	Vitals	Max temp (°C) in past 48 hours
`heart_rate_max_48h`	float	Vitals	Max HR in past 48 hours

All feature values must be validated against acceptable ranges before inference. Values outside physiologically plausible ranges must trigger DATA_QUALITY_FLAG.

4.2 Temporal Features (PyTorch LSTM)

Input shape: (batch_size, 72, 13) — 72 hourly timestamps, 13 channels (8 lab values + 5 vitals)
Missing time steps: forward-fill with last known value; if >30% of timestamps are missing → set DATA_COMPLETENESS_SCORE < 0.70 and attach warning
Normalization: z-score per feature using hospital-cohort training statistics (not global statistics)

4.3 NLP Features (ClinicalBERT)

Input: last 3 clinical notes (physician + nursing), max 512 tokens each after truncation
Truncation strategy: keep first 128 tokens (contains chief complaint / assessment) + last 384 tokens
Notes older than 72 hours: exclude unless no newer notes exist
Do NOT pass radiology report image data — text reports only

5. Model Output Schema

Every inference call must return the following structured JSON payload:

{
  "patient_token": "string (de-identified internal token)",
  "hospital_tenant_id": "string",
  "inference_timestamp": "ISO 8601 UTC",
  "amr_risk_score": 0.0,
  "risk_tier": "LOW | MEDIUM | HIGH | CRITICAL",
  "confidence_score": 0.0,
  "low_confidence_flag": false,
  "data_completeness_score": 0.0,
  "data_quality_flag": false,
  "antibiotic_class_risk": {
    "beta_lactam": 0.0,
    "carbapenem": 0.0,
    "fluoroquinolone": 0.0,
    "aminoglycoside": 0.0,
    "vancomycin": 0.0
  },
  "shap_top_features": [
    {
      "feature_name": "string",
      "shap_value": 0.0,
      "direction": "INCREASES_RISK | DECREASES_RISK",
      "human_readable": "string (plain English explanation for clinician)"
    }
  ],
  "recommended_action": "string (stewardship recommendation text)",
  "model_version": "string",
  "explanation_available": true
}

Risk Tier Thresholds

Score Range	Tier	Required Action
0 – 24	LOW	No immediate action required; monitor
25 – 49	MEDIUM	Flag for pharmacist review within 24h
50 – 74	HIGH	Trigger CDS Hook alert to attending physician and pharmacy
75 – 100	CRITICAL	Immediate CDS alert + infection control notification

6. Model Training Rules

RULE-TRAIN-01: Train/validation/test split must be 70/15/15 with 
stratification on outcome label (resistant/sensitive) and hospital_tenant_id.
Do NOT train and test on data from the same hospital to avoid 
site-specific overfitting.

RULE-TRAIN-02: Class imbalance handling — apply SMOTE or class_weight 
balancing when positive (resistant) class prevalence < 20%.
Document imbalance ratio in the model card.

RULE-TRAIN-03: XGBoost hyperparameter search must use Bayesian 
optimization (not random search) with 50+ trials via SageMaker HPO.
Key parameters to tune: max_depth (3–8), learning_rate (0.01–0.3),
n_estimators (100–1000), subsample (0.6–1.0).

RULE-TRAIN-04: ClinicalBERT fine-tuning must use a clinical-domain 
pre-trained checkpoint (e.g., emilyalsentzer/Bio_ClinicalBERT).
Do NOT fine-tune general-domain BERT on clinical notes.

RULE-TRAIN-05: All training runs must be logged to SageMaker 
Experiments with: dataset version, feature set version, hyperparameters, 
AUC-ROC, AUPRC, sensitivity@80%specificity, and confusion matrix.

RULE-TRAIN-06: Ensemble weights (XGBoost vs LSTM vs NLP) must be 
learned via a held-out validation set meta-learner. Do NOT hardcode 
equal weights.

RULE-TRAIN-07: Model performance must be disaggregated by 
subgroup: age band (< 18, 18–65, > 65), ICU vs non-ICU, 
and primary organism if label is available.
Report any subgroup performance gaps >= 10% AUC as a risk item.

7. Evaluation Metrics & Acceptance Criteria

The following metrics must ALL be met before a model can be promoted to production:

Metric	Minimum Threshold	Primary Model (XGBoost)
AUC-ROC	≥ 0.82	Primary evaluation metric
AUPRC	≥ 0.70	Required for imbalanced data fairness
Sensitivity @ 80% Specificity	≥ 0.80	Critical for patient safety — miss rate
False Positive Rate	≤ 0.20	Alert fatigue prevention
Calibration (Brier Score)	≤ 0.15	Probability reliability
Inference Latency (p95)	≤ 2,000 ms	Real-time CDS requirement

Mandatory: All thresholds must be validated on a held-out test set (not validation set) before the model card is signed off.

8. CDS Hook Integration Rules

RULE-CDS-01: CDS Hook responses must be returned within 2 seconds 
(p95). If inference endpoint latency exceeds 1.5 seconds, return 
a cached score from the last inference run (max 24 hours old) 
and flag CACHED_RESULT = true in the response.

RULE-CDS-02: CDS Hook cards must include a "Why this alert?" 
link that opens the SHAP explainability panel — mandatory for 
High and Critical tier alerts.

RULE-CDS-03: Every CDS Hook alert must provide three response 
options to the clinician: 
  (a) "Acknowledged — will act" 
  (b) "Override — not applicable" (requires reason code selection)
  (c) "Escalate to ID specialist"
All responses must be logged.

RULE-CDS-04: Override rate per clinician must be monitored. 
If any clinician's override rate exceeds 60% over a 30-day period,
auto-generate a model feedback report for review by the clinical 
informatics team.

9. MLOps & Deployment Rules

RULE-MLOPS-01: Model retraining schedule — monthly automated 
SageMaker Pipeline run on new hospital data. Emergency retraining 
triggered if model drift score (PSI > 0.20) is detected in 
production monitoring.

RULE-MLOPS-02: Blue/green deployment required for all model 
updates. New model receives 10% traffic initially; auto-promote 
to 100% if AUC-ROC on production shadow traffic >= previous 
model - 0.02 over 72 hours.

RULE-MLOPS-03: Model versioning: semantic versioning (MAJOR.MINOR.PATCH).
MAJOR version bump required for changes to feature set.
MINOR for retrained weights on same feature set.
PATCH for calibration-only updates.

RULE-MLOPS-04: Model rollback capability must be maintained for 
the previous 2 production versions. Rollback execution time 
target: < 15 minutes.

RULE-MLOPS-05: All production model predictions must be stored 
(de-identified) for post-hoc analysis and ground truth 
comparison once culture results are available.
Model accuracy against culture ground truth must be reported monthly.

10. Prohibited Patterns

The following patterns are strictly prohibited in any code, model, or pipeline component:

# ❌ NEVER DO: External API calls with patient data
import openai
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": patient_note}]  # PROHIBITED
)

# ❌ NEVER DO: Raw PHI as model features
features["patient_name"] = row["patient_name"]  # PROHIBITED
features["date_of_birth"] = row["dob"]           # PROHIBITED
features["social_security"] = row["ssn"]         # PROHIBITED

# ❌ NEVER DO: Log PHI to CloudWatch or stdout
print(f"Processing patient {patient_mrn}")        # PROHIBITED
logger.info(f"Patient name: {patient_name}")       # PROHIBITED

# ❌ NEVER DO: Hardcode thresholds for clinical decisions
if amr_score > 50:
    prescribe_vancomycin()  # PROHIBITED — model never prescribes

# ❌ NEVER DO: Deploy model without validation gate
model.deploy(validation_passed=False)             # PROHIBITED

11. Code Quality Standards

All Python code must pass mypy --strict type checking
All ML pipelines must be reproducible: set random_state=42 or equivalent for all stochastic operations
Feature engineering functions must have unit tests with ≥80% line coverage
SageMaker Processing scripts must be containerized (Docker) and version-pinned for reproducibility
Secrets (API keys, DB credentials) must NEVER appear in code — use AWS Secrets Manager exclusively
All infrastructure must be provisioned via AWS CDK (TypeScript) — no console-created resources in production

12. Glossary

Term	Definition
AMR	Antimicrobial Resistance — resistance of microorganisms to antimicrobial medicines
MDRO	Multi-Drug Resistant Organism
SHAP	SHapley Additive exPlanations — model explainability method
CDS Hooks	Clinical Decision Support Hooks — standard for EHR-integrated alerts
SMART on FHIR	Substitutable Medical Applications, Reusable Technologies on FHIR
HL7 v2	Health Level 7 version 2 — legacy healthcare messaging standard
FHIR R4	Fast Healthcare Interoperability Resources Release 4 — modern healthcare data standard
BAA	Business Associate Agreement — HIPAA-required contract for PHI handling
PHI	Protected Health Information
PSI	Population Stability Index — metric for detecting model/data drift
AUPRC	Area Under the Precision-Recall Curve

ResisTrack Agent Rules v1.0 — Team Curelytics — Impact-AI-Thon 2026
These rules must be reviewed and updated with each MAJOR model version release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResisTrack — Agent Rules

Project-Specific Guidelines for AI Model Building

1. Project Context & Mission

2. Core Ethical & Safety Rules

3. Data Handling Rules

4. Input Feature Specification

4.1 Accepted Structured Features (XGBoost / Tabular Model)

4.2 Temporal Features (PyTorch LSTM)

4.3 NLP Features (ClinicalBERT)

5. Model Output Schema

Risk Tier Thresholds

6. Model Training Rules

7. Evaluation Metrics & Acceptance Criteria

8. CDS Hook Integration Rules

9. MLOps & Deployment Rules

10. Prohibited Patterns

11. Code Quality Standards

12. Glossary

FilesExpand file tree

ResisTrack_Agent_Rules.md

Latest commit

History

ResisTrack_Agent_Rules.md

File metadata and controls

ResisTrack — Agent Rules

Project-Specific Guidelines for AI Model Building

1. Project Context & Mission

2. Core Ethical & Safety Rules

3. Data Handling Rules

4. Input Feature Specification

4.1 Accepted Structured Features (XGBoost / Tabular Model)

4.2 Temporal Features (PyTorch LSTM)

4.3 NLP Features (ClinicalBERT)

5. Model Output Schema

Risk Tier Thresholds

6. Model Training Rules

7. Evaluation Metrics & Acceptance Criteria

8. CDS Hook Integration Rules

9. MLOps & Deployment Rules

10. Prohibited Patterns

11. Code Quality Standards

12. Glossary