Skip to content

Commit e0d13da

Browse files
committed
introcuding copilot isntructions
1 parent 776f83c commit e0d13da

7 files changed

Lines changed: 462 additions & 0 deletions
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Build Configuration Files
2+
3+
## Core Build Files
4+
- `setup.py`: Main build script (500+ lines, complex configuration)
5+
- `pyproject.toml`: Python project metadata + linting configuration
6+
- `dependencies-dev`: Build-time dependencies (Cython, numpy, pybind11, cmake)
7+
- `requirements-test.txt`: Test dependencies with version constraints
8+
- `conda-recipe/meta.yaml`: Conda package build configuration
9+
10+
## Environment Variables (Critical)
11+
```bash
12+
# MANDATORY for building
13+
export DALROOT=/path/to/onedal # oneDAL installation path (required)
14+
15+
# OPTIONAL but commonly needed
16+
export MPIROOT=/path/to/mpi # MPI for distributed features
17+
export NO_DIST=1 # Disable distributed mode
18+
export NO_DPC=1 # Disable GPU/SYCL support
19+
export NO_STREAM=1 # Disable streaming mode
20+
export DEBUG_BUILD=1 # Debug symbols + no optimization
21+
export MAKEFLAGS=-j$(nproc) # Parallel build threads
22+
```
23+
24+
## Build Process (4 Stages)
25+
1. **Code Generation**: oneDAL C++ headers → Python/Cython sources
26+
2. **oneDAL Bindings**: cmake + pybind11 compilation
27+
3. **Cython Processing**: .pyx files → C++ sources
28+
4. **Final Compilation**: Link everything into Python extensions
29+
30+
## Dependencies
31+
**Build Dependencies (dependencies-dev):**
32+
- Cython==3.1.1 (exact version required)
33+
- numpy>=2.0 (version varies by Python version)
34+
- pybind11==2.13.6
35+
- cmake==4.0.2
36+
- setuptools==79.0.1
37+
38+
**Runtime Dependencies:**
39+
- Intel oneDAL 2021.1+ (backwards compatible)
40+
- numpy (version-specific, see requirements-test.txt)
41+
- scikit-learn 1.0-1.7 (see compatibility matrix)
42+
43+
## Build Commands
44+
```bash
45+
# Development build (RECOMMENDED)
46+
python setup.py develop # Creates .egg-link, editable
47+
48+
# Production builds
49+
python setup.py install # Full install
50+
python setup.py build_ext --inplace --force # Extensions only
51+
52+
# Special flags (Linux)
53+
python setup.py build --abs-rpath # Absolute RPATH for custom oneDAL
54+
55+
# Conda build
56+
conda build . # Uses conda-recipe/meta.yaml
57+
```
58+
59+
## Common Build Issues
60+
```bash
61+
# oneDAL not found
62+
RuntimeError: "Not set DALROOT variable"
63+
→ Solution: export DALROOT=/path/to/onedal
64+
65+
# MPI required but missing
66+
ValueError: "'MPIROOT' is not set, cannot build with distributed mode"
67+
→ Solution: export NO_DIST=1 or set MPIROOT
68+
69+
# Cython version mismatch
70+
→ Solution: pip install Cython==3.1.1 (exact version)
71+
72+
# Linking issues (Linux)
73+
→ Solution: Use --abs-rpath flag
74+
```
75+
76+
## CI/CD Configuration
77+
- **GitHub Actions**: `.github/workflows/ci.yml`
78+
- **Azure DevOps**: `.ci/pipeline/ci.yml` (main CI system)
79+
- **Pre-commit**: `.pre-commit-config.yaml` (code quality)
80+
81+
Build timeouts: 120 minutes in CI (can be slow due to oneDAL compilation)
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# daal4py/* - Direct oneDAL Python Bindings
2+
3+
## Purpose
4+
Direct Python bindings to Intel oneDAL for maximum performance and model builders for XGBoost/LightGBM conversion.
5+
6+
## Three Sub-APIs
7+
1. **Native oneDAL**: `import daal4py as d4p` - Direct algorithm access
8+
2. **sklearn-compatible**: `from daal4py.sklearn import ...` - sklearn API with oneDAL backend
9+
3. **Model Builders**: `from daal4py.mb import convert_model` - External model conversion
10+
11+
## Native oneDAL Pattern
12+
```python
13+
import daal4py as d4p
14+
import numpy as np
15+
16+
# Create algorithm with parameters
17+
algorithm = d4p.dbscan(epsilon=0.5, minObservations=5)
18+
19+
# Run computation
20+
result = algorithm.compute(data)
21+
22+
# Access results (algorithm-specific attributes)
23+
cluster_labels = result.assignments
24+
core_indices = result.coreIndices
25+
```
26+
27+
## Common Native Algorithms
28+
```python
29+
# Clustering
30+
d4p.dbscan(epsilon=0.5, minObservations=5)
31+
d4p.kmeans(nClusters=3, maxIterations=300)
32+
33+
# Decomposition
34+
d4p.pca(method="defaultDense")
35+
d4p.svd(method="defaultDense")
36+
37+
# Linear Models
38+
d4p.linear_regression_training()
39+
d4p.ridge_regression_training(ridgeParameters=1.0)
40+
```
41+
42+
## Model Builders (mb/)
43+
```python
44+
from daal4py.mb import convert_model
45+
46+
# Convert external models to oneDAL format
47+
d4p_model = convert_model(xgb_model) # XGBoost → oneDAL
48+
d4p_model = convert_model(lgb_model) # LightGBM → oneDAL
49+
d4p_model = convert_model(catboost_model) # CatBoost → oneDAL
50+
51+
# Use converted model for fast inference
52+
predictions = d4p_model.compute(test_data)
53+
```
54+
55+
## Testing
56+
```bash
57+
# Native daal4py tests
58+
pytest --verbose --pyargs daal4py
59+
pytest tests/test_daal4py_examples.py # Native API examples
60+
pytest tests/test_model_builders.py # Model conversion tests
61+
62+
# sklearn compatibility in daal4py
63+
pytest daal4py/sklearn/tests/ # sklearn-compatible API
64+
```
65+
66+
## Development Notes
67+
- Native API provides direct oneDAL algorithm access (fastest performance)
68+
- sklearn-compatible API in `daal4py/sklearn/` maintains full sklearn compatibility
69+
- Model builders enable oneDAL inference for models trained with other frameworks
70+
- See `daal4py/AGENTS.md` for detailed algorithm usage patterns
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# General Repository Instructions - Intel Extension for Scikit-learn
2+
3+
## Repository Overview
4+
5+
**Intel Extension for Scikit-learn** (scikit-learn-intelex) accelerates scikit-learn by 10-100x using Intel oneDAL. Zero code changes required for existing sklearn applications.
6+
7+
- **Languages**: Python (70%), C++ (25%), Cython (5%)
8+
- **Architecture**: 4-layer system (sklearnex → daal4py → onedal → Intel oneDAL C++)
9+
- **Platforms**: Linux, Windows, macOS; CPU (x86_64, ARM), GPU (Intel via SYCL)
10+
- **Python**: 3.9-3.13 supported
11+
12+
## Critical Build Requirements (ALWAYS REQUIRED)
13+
14+
```bash
15+
# Environment variables (MANDATORY)
16+
export DALROOT=/path/to/onedal # Required by setup.py:53-59
17+
export MPIROOT=/path/to/mpi # For distributed support
18+
19+
# Build dependencies (INSTALL FIRST)
20+
pip install -r dependencies-dev # Cython==3.1.1, numpy>=2.0, pybind11==2.13.6
21+
22+
# Development build (RECOMMENDED)
23+
python setup.py develop # Creates editable install
24+
```
25+
26+
## Testing & Validation (Run in Order)
27+
28+
```bash
29+
# 1. Install test dependencies
30+
pip install -r requirements-test.txt
31+
32+
# 2. Core test suites
33+
pytest --verbose -s tests/ # Legacy tests
34+
pytest --verbose --pyargs daal4py # Native oneDAL tests
35+
pytest --verbose --pyargs sklearnex # sklearn compatibility
36+
37+
# 3. Code quality (REQUIRED before commit)
38+
pre-commit install
39+
pre-commit run --all-files --show-diff-on-failure
40+
```
41+
42+
## Code Standards
43+
44+
- **Python**: Black (line-length=90) + isort
45+
- **C++**: clang-format version ≥14
46+
- **Commits**: Must be signed-off (`git commit -s`)
47+
- **Documentation**: numpydoc format
48+
49+
## Common Issues & Solutions
50+
51+
```bash
52+
# Build failures
53+
export NO_DIST=1 # Disable distributed mode if MPI issues
54+
export NO_DPC=1 # Disable GPU if driver issues
55+
python setup.py build_ext --inplace --force --abs-rpath # Linux linking
56+
57+
# Import/path issues
58+
export PYTHONPATH=$(pwd) # Add repo to path
59+
python setup.py develop # Ensure editable install
60+
```
61+
62+
For module-specific details, see the corresponding AGENTS.md files in each directory.
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# onedal/* - Low-Level C++ Bindings
2+
3+
## Purpose
4+
Pybind11-based C++ bindings providing the bridge between Python and Intel oneDAL C++ library.
5+
6+
## Key Components
7+
- `datatypes/`: Memory management and array conversions (NumPy, SYCL USM, DLPack)
8+
- `common/`: Policy management, device selection, serialization
9+
- `*/`: Algorithm-specific implementations (cluster/, decomposition/, linear_model/, etc.)
10+
- `spmd/`: Distributed computing interfaces
11+
12+
## Memory Management
13+
```python
14+
# Zero-copy conversions handled automatically
15+
import numpy as np
16+
from onedal.cluster import DBSCAN
17+
18+
# NumPy arrays converted to oneDAL tables without copying
19+
X = np.random.random((1000, 10))
20+
model = DBSCAN().fit(X) # Automatic NumPy → oneDAL conversion
21+
```
22+
23+
## Device Context
24+
```python
25+
# Device selection handled through dpctl integration
26+
import dpctl
27+
from onedal.cluster import DBSCAN
28+
29+
# GPU execution (requires Intel GPU)
30+
device = dpctl.SyclDevice("gpu:0")
31+
with dpctl.device_context(device):
32+
model = DBSCAN().fit(X)
33+
```
34+
35+
## Algorithm Structure
36+
- Each algorithm module follows consistent pattern:
37+
- `fit()` method for training
38+
- `predict()` method for inference (where applicable)
39+
- Parameters match oneDAL C++ API
40+
- Results as Python objects with named attributes
41+
42+
## Testing
43+
```bash
44+
# Low-level onedal tests
45+
pytest onedal/tests/ # Core functionality
46+
pytest onedal/datatypes/tests/ # Memory management
47+
pytest onedal/common/tests/ # Device/policy tests
48+
49+
# Algorithm-specific tests
50+
pytest onedal/cluster/tests/test_dbscan.py # DBSCAN implementation
51+
pytest onedal/linear_model/tests/ # Linear models
52+
```
53+
54+
## Development Notes
55+
- Direct interface to oneDAL C++ API through pybind11
56+
- Handles memory management between Python/C++ automatically
57+
- Provides foundation for both daal4py and sklearnex layers
58+
- SPMD module enables distributed computing with MPI
59+
- See `onedal/AGENTS.md` for detailed technical implementation
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# sklearnex/* - Primary sklearn-compatible Interface
2+
3+
## Purpose
4+
Primary user interface for sklearn acceleration with patching system and device offloading.
5+
6+
## Key Files & Functions
7+
- `dispatcher.py`: Patching system (`get_patch_map_core` line 36)
8+
- `_device_offload.py`: GPU/CPU dispatch (`dispatch` function line 72)
9+
- `_config.py`: Global configuration (target_offload, allow_fallback_to_host)
10+
- `base.py`: oneDALEstimator base class for all accelerated algorithms
11+
12+
## Usage Patterns
13+
14+
**Global Patching (Most Common):**
15+
```python
16+
from sklearnex import patch_sklearn
17+
patch_sklearn() # All sklearn imports now accelerated
18+
from sklearn.cluster import DBSCAN # Uses oneDAL implementation
19+
```
20+
21+
**Selective Patching:**
22+
```python
23+
patch_sklearn(["DBSCAN", "KMeans"]) # Only specific algorithms
24+
```
25+
26+
**Direct Import (No Patching):**
27+
```python
28+
from sklearnex.cluster import DBSCAN # Always oneDAL implementation
29+
```
30+
31+
**Device Control:**
32+
```python
33+
from sklearnex import config_context
34+
35+
# GPU acceleration (requires Intel GPU + drivers)
36+
with config_context(target_offload="gpu:0"):
37+
model.fit(X, y)
38+
39+
# Force CPU
40+
with config_context(target_offload="cpu"):
41+
model.fit(X, y)
42+
```
43+
44+
## Testing
45+
```bash
46+
# sklearnex-specific tests
47+
pytest --verbose --pyargs sklearnex
48+
pytest sklearnex/tests/test_patching.py # Core patching functionality
49+
pytest sklearnex/tests/test_config.py # Configuration system
50+
```
51+
52+
## Development Notes
53+
- All sklearn-compatible algorithms inherit from `base.oneDALEstimator`
54+
- Fallback to original sklearn if oneDAL implementation unavailable
55+
- Device offloading requires Intel GPU drivers and SYCL runtime
56+
- See `sklearnex/AGENTS.md` for detailed module information

0 commit comments

Comments
 (0)