Ferrolearn brings Rust's performance to Python's machine learning ecosystem. By implementing compute-intensive algorithms in Rust, we achieve significant speedups while maintaining the familiar scikit-learn API.
- 🚀 2-10x faster than pure Python implementations
- 🔧 Scikit-learn compatible API - drop-in replacement
- 🦀 Rust-powered - memory safe and blazingly fast
- 📊 Zero-copy operations - efficient NumPy integration
- ⚡ Automatic parallelization - scales with your CPU cores
- Python 3.8+
- Rust 1.70+
- pip
The easiest way to install Ferrolearn is via pip from the Python Package Index (PyPI):
pip install ferrolearnThis will download and install the pre-built wheel for your platform (if available) or build from source if necessary. Note that building from source requires a Rust compiler.
After installation, you can verify it by importing in Python:
import ferrolearn
print(ferrolearn.__version__) # Should print '0.1.0' or your current versionfrom ferrolearn import KMeans
import numpy as np
# Generate sample data
X = np.random.rand(10000, 50)
# Create and fit model - same API as scikit-learn
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(X)
# Get predictions
labels = kmeans.predict(X)
print(f"Cluster centers shape: {kmeans.cluster_centers_.shape}")
print(f"Iterations: {kmeans.n_iter_}")class KMeans(n_clusters=8, max_iters=300, tol=1e-4, random_state=None)Parameters:
n_clusters: Number of clusters (default: 8)max_iters: Maximum iterations (default: 300)tol: Convergence tolerance (default: 1e-4)random_state: Random seed for reproducibility
Methods:
fit(X): Fit the modelpredict(X): Predict cluster labelsfit_predict(X): Fit and predict in one call
Attributes:
cluster_centers_: Cluster centroidsn_iter_: Number of iterations runinertia_: Sum of squared distances to nearest cluster
ferrolearn leverages Rust's strengths where they matter most:
Python (API Layer) Rust (Compute Layer)
│ │
├─ KMeans.fit() ─────────────► │ Parallel distance computation
│ │ SIMD-ready operations
├─ NumPy arrays ◄────────────► │ Zero-copy array views
│ │ Cache-efficient algorithms
└─ Results ◄───────────────────┘
# Clone and setup
git clone https://github.com/Rafa-Gu98/ferrolearn.git
cd ferrolearn
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in development mode
make dev-install# All tests
make test
# Only Rust tests
cargo test
# Only Python tests
pytest tests/ferrolearn/
├── src/ # Rust source code
│ ├── lib.rs # PyO3 bindings
│ └── kmeans.rs # K-Means implementation
├── python/ # Python package
├── tests/ # Test suite
├── Cargo.toml # Rust dependencies
└── pyproject.toml # Python packaging
- ✅ K-Means clustering
- ✅ Scikit-learn compatible API
- ✅ Comprehensive benchmarks
- DBSCAN clustering
- Mini-batch K-Means
- Random Forest
- Gradient Boosting
- GPU acceleration
- Distributed computing
- More algorithms based on user feedback
We welcome contributions! ferrolearn is most impactful for:
- Algorithms with many iterations
- Embarrassingly parallel computations
- Memory-intensive operations
When ferrolearn shines:
- Medium to large datasets (>10k samples)
- Moderate dimensionality (20-100 features)
- Multiple iterations or clusters
Current limitations:
- Small datasets may not see significant speedup due to overhead
- Not all algorithms benefit equally from Rust implementation
MIT License - see LICENSE file for details.
Rafa_PyRs.dev
- Email: rafagr98.dev@gmail.com
- GitHub: @rafagr98
- Built with PyO3 - Rust bindings for Python
- Inspired by scikit-learn - API design
- Powered by ndarray and rayon
ferrolearn: Where Python meets Rust for machine learning performance
Made with 🐍 and 🦀