Skip to content

Rafa-Gu98/ferrolearn

Repository files navigation

Ferrolearn - High-performance machine learning library

Ferrolearn brings Rust's performance to Python's machine learning ecosystem. By implementing compute-intensive algorithms in Rust, we achieve significant speedups while maintaining the familiar scikit-learn API.

Key Features

  • 🚀 2-10x faster than pure Python implementations
  • 🔧 Scikit-learn compatible API - drop-in replacement
  • 🦀 Rust-powered - memory safe and blazingly fast
  • 📊 Zero-copy operations - efficient NumPy integration
  • Automatic parallelization - scales with your CPU cores

Installation

Prerequisites

  • Python 3.8+
  • Rust 1.70+
  • pip

From PyPI

The easiest way to install Ferrolearn is via pip from the Python Package Index (PyPI):

pip install ferrolearn

This will download and install the pre-built wheel for your platform (if available) or build from source if necessary. Note that building from source requires a Rust compiler.

After installation, you can verify it by importing in Python:

import ferrolearn
print(ferrolearn.__version__)  # Should print '0.1.0' or your current version

Quick Start

from ferrolearn import KMeans
import numpy as np

# Generate sample data
X = np.random.rand(10000, 50)

# Create and fit model - same API as scikit-learn
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(X)

# Get predictions
labels = kmeans.predict(X)
print(f"Cluster centers shape: {kmeans.cluster_centers_.shape}")
print(f"Iterations: {kmeans.n_iter_}")

API Reference

KMeans

class KMeans(n_clusters=8, max_iters=300, tol=1e-4, random_state=None)

Parameters:

  • n_clusters: Number of clusters (default: 8)
  • max_iters: Maximum iterations (default: 300)
  • tol: Convergence tolerance (default: 1e-4)
  • random_state: Random seed for reproducibility

Methods:

  • fit(X): Fit the model
  • predict(X): Predict cluster labels
  • fit_predict(X): Fit and predict in one call

Attributes:

  • cluster_centers_: Cluster centroids
  • n_iter_: Number of iterations run
  • inertia_: Sum of squared distances to nearest cluster

Architecture

ferrolearn leverages Rust's strengths where they matter most:

Python (API Layer)          Rust (Compute Layer)
    │                              │
    ├─ KMeans.fit() ─────────────► │ Parallel distance computation
    │                              │ SIMD-ready operations
    ├─ NumPy arrays ◄────────────► │ Zero-copy array views
    │                              │ Cache-efficient algorithms
    └─ Results ◄───────────────────┘

Development

Setup Development Environment

# Clone and setup
git clone https://github.com/Rafa-Gu98/ferrolearn.git
cd ferrolearn

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install in development mode
make dev-install

Running Tests

# All tests
make test

# Only Rust tests
cargo test

# Only Python tests
pytest tests/

Project Structure

ferrolearn/
├── src/                # Rust source code
│   ├── lib.rs          # PyO3 bindings
│   └── kmeans.rs       # K-Means implementation
├── python/             # Python package
├── tests/              # Test suite
├── Cargo.toml          # Rust dependencies
└── pyproject.toml      # Python packaging

Roadmap

Current (v0.1.0)

  • ✅ K-Means clustering
  • ✅ Scikit-learn compatible API
  • ✅ Comprehensive benchmarks

Upcoming

  • DBSCAN clustering
  • Mini-batch K-Means
  • Random Forest
  • Gradient Boosting

Future

  • GPU acceleration
  • Distributed computing
  • More algorithms based on user feedback

Contributing

We welcome contributions! ferrolearn is most impactful for:

  • Algorithms with many iterations
  • Embarrassingly parallel computations
  • Memory-intensive operations

Performance Notes

When ferrolearn shines:

  • Medium to large datasets (>10k samples)
  • Moderate dimensionality (20-100 features)
  • Multiple iterations or clusters

Current limitations:

  • Small datasets may not see significant speedup due to overhead
  • Not all algorithms benefit equally from Rust implementation

License

MIT License - see LICENSE file for details.

Author

Rafa_PyRs.dev

Acknowledgments


ferrolearn: Where Python meets Rust for machine learning performance

Made with 🐍 and 🦀

About

High-performance machine learning library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published