Skip to content

Tarek0/zerotune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

49 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ZeroTune

ZeroTune provides instant zero-shot hyperparameter optimization using advanced pre-trained models. Get competitive hyperparameters for your machine learning models in sub-millisecond time with robust performance across diverse datasets!

πŸ† Decision Tree: 100% win rate β€’ 🌲 Random Forest: 100% win rate β€’ πŸ”§ XGBoost: 100% win rate β€’ πŸš€ +7.08%, +1.47% & +0.80% improvements β€’ ⚑ <1ms prediction β€’ πŸ“Š 50-seed validated

πŸš€ Quick Start

from zerotune import ZeroTunePredictor
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load your dataset
df = pd.read_csv('your_dataset.csv')
X = df.drop('target', axis=1)
y = df['target']

# Get optimal hyperparameters instantly
predictor = ZeroTunePredictor(model_name='decision_tree', task_type='binary')
best_params = predictor.predict(X, y)

# Train model with predicted hyperparameters
model = DecisionTreeClassifier(**best_params)
model.fit(X, y)

print(f"Optimal hyperparameters: {best_params}")
# Expected: +7.08% improvement over random hyperparameters

✨ Key Features

  • πŸ† 100% Win Rate: All three models (Decision Tree, Random Forest, XGBoost) beat random hyperparameters on every test dataset
  • ⚑ Instant Predictions: Sub-millisecond hyperparameter optimization (vs hours of traditional HPO)
  • 🎯 Significant Improvements: +7.08%, +1.47%, +0.80% average performance gains respectively
  • πŸ”¬ Scientifically Validated: 50-seed evaluation across diverse datasets with statistical rigor
  • πŸš€ Production Ready: Pre-trained models included - no training required
  • πŸ”§ Optuna Integration: Warm-start TPE optimization with perfect baseline consistency

🎯 Supported Models

Model Binary Classification Performance
πŸ† Decision Tree βœ… 100% win rate, +7.08%
🌲 Random Forest βœ… 100% win rate, +1.47%
πŸ”§ XGBoost βœ… 100% win rate, +0.80%

All models achieve 100% win rates - every single prediction outperforms random hyperparameter selection.

πŸ“¦ Installation

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install ZeroTune
git clone https://github.com/your-repo/zerotune.git
cd zerotune
poetry install

πŸš€ Ready-to-Use: All trained models are included - start predicting immediately!

πŸ”§ Usage

Zero-Shot Predictions (Main Use Case)

from zerotune import ZeroTunePredictor

# For different models
predictor_dt = ZeroTunePredictor(model_name='decision_tree', task_type='binary')
predictor_rf = ZeroTunePredictor(model_name='random_forest', task_type='binary')
predictor_xgb = ZeroTunePredictor(model_name='xgboost', task_type='binary')

# Get instant predictions
hyperparams = predictor_dt.predict(X, y)

Optuna TPE Warm-Start

from zerotune.core.optimization import optimize_hyperparameters

# Use zero-shot predictions to warm-start Optuna TPE
best_params, study = optimize_hyperparameters(
    X=X, y=y,
    model_type='decision_tree',
    param_grid=param_grid,
    n_trials=20,
    warm_start=True,  # Uses ZeroTune predictions
    n_jobs=1
)

Command Line Interface

# Quick evaluation on test datasets
poetry run python decision_tree_experiment.py eval-test
poetry run python random_forest_experiment.py eval-test  
poetry run python xgb_experiment.py eval-test

# Full evaluation with Optuna benchmarking
poetry run python decision_tree_experiment.py eval-full --optuna_trials 25 --seeds 50

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Knowledge Base    │───▢│   Pre-trained Model  │───▢│  Zero-Shot Predict  β”‚
β”‚   Building          β”‚    β”‚   Training           β”‚    β”‚  (ZeroTunePredictor)β”‚
β”‚   (ZeroTune)        β”‚    β”‚                      β”‚    β”‚                     β”‚
β”‚                     β”‚    β”‚                      β”‚    β”‚                     β”‚
β”‚ β€’ Multi-seed HPO on β”‚    β”‚ β€’ RFECV feature      β”‚    β”‚ β€’ Sub-ms prediction β”‚
β”‚   many datasets     β”‚    β”‚   selection (15/22)  β”‚    β”‚ β€’ 100% win rate     β”‚
β”‚ β€’ Extract 22+ meta- β”‚    β”‚ β€’ Top-K filtering    β”‚    β”‚ β€’ Feature selection β”‚
β”‚   features          β”‚    β”‚ β€’ RandomForest +HPO  β”‚    β”‚ β€’ High performance  β”‚
β”‚ β€’ Store full trials β”‚    β”‚ β€’ Meta-features β†’    β”‚    β”‚                     β”‚
β”‚   dataframes        β”‚    β”‚   Hyperparameters    β”‚    β”‚                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                     β”‚
                                                                     β–Ό
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚   Optuna TPE         │◀───│  Your ML Pipeline   β”‚
                           β”‚   Warm-Start         β”‚    β”‚                     β”‚
                           β”‚                      β”‚    β”‚                     β”‚
                           β”‚ β€’ Zero-shot init     β”‚    β”‚ β€’ Train your model  β”‚
                           β”‚ β€’ Faster convergence β”‚    β”‚ β€’ Better performanceβ”‚
                           β”‚ β€’ study.enqueue()    β”‚    β”‚ β€’ Production deploy β”‚
                           β”‚ β€’ Perfect baseline   β”‚    β”‚ β€’ Instant results   β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How It Works

  1. Knowledge Base: Multi-dataset HPO experiments with 22+ meta-features extracted
  2. Model Training: RFECV feature selection + RandomForest predictor with hyperparameter optimization
  3. Zero-Shot Prediction: Instant hyperparameter prediction based on dataset characteristics
  4. Optional Warm-Start: Use predictions to initialize Optuna TPE for further optimization

πŸ“Š Performance Summary

Quick Results Overview

Model Win Rate Avg Improvement Best Single Win Statistical Significance
Decision Tree 100% +7.08% +17.4% 90% of datasets
Random Forest 100% +1.47% +4.4% 50% of datasets
XGBoost 100% +0.80% +2.6% 90% of datasets

Key Benefits:

  • βœ… Perfect Reliability: 100% win rate across all models and test datasets
  • βœ… Instant Results: Sub-millisecond prediction vs hours of traditional HPO
  • βœ… Statistical Rigor: 50 random seeds Γ— 10 datasets = 500 total experiments
  • βœ… Production Ready: No training required, robust error handling

For detailed performance analysis, see PERFORMANCE_ANALYSIS.md

πŸ“ˆ Research & Publication

For researchers and advanced users:

# Generate publication-ready analysis and charts
poetry run python publication_analysis.py DecisionTree --auto-detect
poetry run python publication_analysis.py RandomForest --auto-detect
poetry run python publication_analysis.py XGBoost --auto-detect

See PUBLICATION_CHARTS_GUIDE.md for detailed documentation.

πŸ› οΈ Advanced Usage

Building Custom Knowledge Bases

from zerotune import ZeroTune

# Build knowledge base from your datasets
zt = ZeroTune(model_type='xgboost', kb_path='my_knowledge_base.json')
dataset_ids = [31, 38, 44, 52, 151]  # OpenML dataset IDs
kb = zt.build_knowledge_base(dataset_ids=dataset_ids, n_iter=20)

Training New Predictors

from zerotune.core.predictor_training import train_predictor_from_knowledge_base

# Train predictor from knowledge base
model_path = train_predictor_from_knowledge_base(
    kb_path='my_knowledge_base.json',
    model_name='xgboost',
    task_type='binary',
    top_k_per_seed=3
)

🀝 Contributing

See CONTRIBUTING.md for development setup and contribution guidelines.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors