[enhancement] Enable Array API in ensemble algos by icfaust · Pull Request #2201 · uxlfoundation/scikit-learn-intelex

icfaust · 2024-12-02T22:31:09Z

Description

This PR refactors the Ensemble algorithms (RandomForestRegressor, RandomForestClassifier, ExtraTreesRegressor and ExtraTreesClassifier) to follow repository standards and add array API support. This reduced the code by 500+ lines and required the following changes:

Remove BaseEstimator inheritance from onedal ensemble estimators
Change estimator __init__ signatures to remove sklearn conformant kwargs in onedal ensemble estimators
Inline code comments added for function of various aspects for future maintenance
Remove random_state use from onedal estimators
Add class_count kwarg to fit as calculating it in python is scikit-learn conformance (oneDAL expects it a priori)
Remove input parameter checks from the onedal estimators
generalize return of out of bag values from oneDAL for use by Classifiers and Regressors
Remove unused _create_model function
Centralized predict method
Create ForestRegressor and ForestClasssifier objects to minimize maintenance
swap away from max_samples to observations_per_tree_fraction to follow oneDAL values
Modify tests for onedal to use numpy arrays (which can be consumed, where lists cannot)
Reorder warnings and errors based on type (e.g. parameter checks vs input checks etc.)
Refactor _save_attributes method to be specific to Classifiers vs Regressors
Refactor _onedal_fit_ready, _onedal_cpu_supported and _onedal_gpu_supported to reduce code duplication via inheritance and make array API enabled
Add enable_array_api decorators to public-facing estimators
Place _check_parameters function behind sklearn_check_version for future removal
Remove check for min_impurity_split which was removed in sklearn 0.25
Add array API-enabled _validate_y_class_weight method designed specifically for sklearnex estimators (missing some functionality which is irrelevant to the sklearnex estimator)
Remove check_n_features from sklearnex.utils.validation as it is no longer necessary
Enable weighted fitting support for gpu
Removed sample_weight checks for sparsity (blocked by _check_sample_weight)
Added documentation to the nature of set attributes and array API support

PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

ethanglaser · 2025-12-06T00:45:52Z

/intelci: run

icfaust · 2025-12-06T20:58:52Z

/intelci: run

icfaust · 2025-12-06T22:53:04Z

/intelci: run

icfaust · 2025-12-07T08:04:14Z

/intelci: run

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-07T10:02:31Z

+            for i, v in enumerate(class_weights):
+                expanded_class_weight[y_store_unique_indices == i] *= v


The comment on line 688 warns about O(n*m) complexity. This nested iteration over classes and samples could be a significant performance bottleneck for datasets with many classes. Consider adding a more explicit warning in the docstring or raising a warning at runtime when the number of classes exceeds a threshold (e.g., >100).

Copilot · 2025-12-07T10:02:31Z

+                dtype=[xp.float64, xp.float32],
+                ensure_all_finite=not sklearn_check_version(
+                    "1.4"
+                ),  # completed in offload check


The comment 'completed in offload check' is unclear about where and how the finite check is completed. This should reference the specific location (e.g., line numbers or function name) where the check occurs to aid future maintenance.

Suggested change

), # completed in offload check

), # finite check is performed in support_input_format() in onedal._device_offload

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

icfaust · 2025-12-07T10:08:04Z

/intelci: run

icfaust · 2025-12-07T11:49:17Z

/intelci: run

icfaust · 2025-12-07T21:08:32Z

latest CI run: http://intel-ci.intel.com/f0d3b0ab-f859-f1ef-91c9-a4bf010d0e2d

david-cortes-intel · 2025-12-08T08:19:31Z

-  - tests/test_common.py::test_estimators[ExtraTreesClassifier()-check_sample_weights_invariance(kind=ones)]
  - tests/test_common.py::test_estimators[ExtraTreesClassifier()-check_sample_weights_invariance(kind=zeros)]
-  - tests/test_common.py::test_estimators[ExtraTreesRegressor()-check_sample_weights_invariance(kind=ones)]
+  - ensemble/tests/test_forest.py::test_min_weight_fraction_leaf


CC @Alexandr-Solovev - this test in particular is very straighforward and not expected to fail, yet it does here.

* add finiteness_checker pybind11 bindings * added finiteness checker * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Rename finiteness_checker.cpp to finiteness_checker.cpp * Update finiteness_checker.cpp * add next step * follow conventions * make xtable explicit * remove comment * Update validation.py * Update __init__.py * Update validation.py * Update __init__.py * Update __init__.py * Update validation.py * Update _data_conversion.py * Update _data_conversion.py * Update policy_common.cpp * Update policy_common.cpp * Update _policy.py * Update policy_common.cpp * Rename finiteness_checker.cpp to finiteness_checker.cpp * Create finiteness_checker.py * Update validation.py * Update __init__.py * attempt at fixing circular imports again * fix isort * remove __init__ changes * last move * Update policy_common.cpp * Update policy_common.cpp * Update policy_common.cpp * Update policy_common.cpp * Update validation.py * add testing * isort * attempt to fix module error * add fptype * fix typo * Update validation.py * remove sua_ifcae from to_table * isort and black * Update test_memory_usage.py * format * Update _data_conversion.py * Update _data_conversion.py * Update test_validation.py * remove unnecessary code * make reviewer changes * make dtype check change * add sparse testing * try again * try again * try again * temporary commit * first attempt * missing change? * modify DummyEstimator for testing * generalize DummyEstimator * switch test * further testing changes * add initial validate_data test, will be refactored * fixes for CI * Update validation.py * Update validation.py * Update test_memory_usage.py * Update base.py * Update base.py * improve tests * fix logic * fix logic * fix logic again * rename file * Revert "rename file" This reverts commit 8d47744. * remove duplication * fix imports * Rename test_finite.py to test_validation.py * Revert "Rename test_finite.py to test_validation.py" This reverts commit ee799f6. * updates * Update validation.py * fixes for some test failures * fix text * fixes for some failures * make consistent * fix bad logic * fix in string * attempt tp see if dataframe conversion is causing the issue * fix iter problem * fix testing issues * formatting * revert change * fixes for pandas * there is a slowdown with pandas that needs to be solved * swap to transpose for speed * more clarity * add _check_sample_weight * add more testing' * rename * remove unnecessary imports * fix test slowness * focus get_dataframes_and_queues * put config_context around * Update test_validation.py * Update base.py * Update test_validation.py * generalize regex * add fixes for sklearn 1.0 and input_name * fixes for test failures * Update validation.py * Update test_validation.py * Update validation.py * formattintg * make suggested changes * follow changes made in uxlfoundation#2126 * fix future device problem * Update validation.py * finished movement * fix first error * next mistake * remove bad dtypes check * updates * remove array * solve onedal issues * solve onedal issues * updates * updates * further fixes * further fixes * fix issues to see how it goes * oops * updates * add finite checks for predict and predict_proba * updates * centralize * further reduce code * updates * remove sklearn conformance from onedal estimator init signature * remove more * fixes * change away from sklearn `max_samples` in onedal estimators * fix error * move things * Update forest.py * Update forest.py * Update _forest.py * further fixes to onedal side * further fixes to onedal side * simplifications * attempt at classifiers support * further changes * fix error on onedal side * fix error on onedal side * fixes * fix pandas related error * remove unnecessary code: * try to fix issues related to regressor data * fixes necessary for CI * fixes for formatting * updates * push * push * fixes * remove upon request * remove upon request * further fixes * try to fix classifiers for array API inputs * try again * Update array_api.rst * Update sklearnex/ensemble/_forest.py Co-authored-by: david-cortes-intel <david.cortes@intel.com> * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update forest.py * Update _forest.py * Update sklearnex/ensemble/_forest.py Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com> * Update _forest.py * Update _forest.py * Update array_api.rst * Update array_api.rst * remove sparse checks for sample_weight * Update deselected_tests.yaml * Update deselected_tests.yaml --------- Co-authored-by: david-cortes-intel <david.cortes@intel.com> Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com>

icfaust and others added 30 commits October 23, 2024 13:02

add finiteness_checker pybind11 bindings

32fe269

added finiteness checker

cdbf1b5

Update finiteness_checker.cpp

62674a2

Update finiteness_checker.cpp

c75c23b

Update finiteness_checker.cpp

6a20938

Update finiteness_checker.cpp

382d7a1

Update finiteness_checker.cpp

c8ffd9c

Update finiteness_checker.cpp

9aa13d5

Rename finiteness_checker.cpp to finiteness_checker.cpp

84e15d5

Update finiteness_checker.cpp

63073c6

Merge branch 'intel:main' into dev/new_assert_all_fininte

d915da5

add next step

3dddf2d

follow conventions

1e1213e

make xtable explicit

0531713

remove comment

e831167

Update validation.py

d6eb1d0

Update __init__.py

fb30d6e

Update validation.py

63a18c2

Update __init__.py

76c0856

Update __init__.py

7deb2bb

Update validation.py

ed46b29

Update _data_conversion.py

67d6273

Merge branch 'main' into dev/new_assert_all_fininte

054f0a1

Update _data_conversion.py

8abead9

Update policy_common.cpp

47d0f8b

Update policy_common.cpp

e48c2bd

Update _policy.py

c6751c4

Update policy_common.cpp

f3e4a3a

Rename finiteness_checker.cpp to finiteness_checker.cpp

39cdb5f

Create finiteness_checker.py

0f39613

icfaust added 3 commits December 5, 2025 09:19

Update _forest.py

86bf80a

Update array_api.rst

6a83091

Update array_api.rst

7c439be

ethanglaser approved these changes Dec 6, 2025

View reviewed changes

icfaust added 2 commits December 6, 2025 21:48

remove sparse checks for sample_weight

fc77422

Merge branch 'uxlfoundation:main' into dev/new_RF

772fe68

icfaust added enhancement New feature or request Array API labels Dec 6, 2025

Update deselected_tests.yaml

e1ad668

Update deselected_tests.yaml

3922290

icfaust requested review from ahuber21, Copilot, david-cortes-intel and ethanglaser December 7, 2025 10:01

Merge branch 'uxlfoundation:main' into dev/new_RF

3b6f8de

Copilot AI reviewed Dec 7, 2025

View reviewed changes

icfaust requested review from Alexandr-Solovev and Copilot December 7, 2025 10:02

Copilot AI reviewed Dec 7, 2025

View reviewed changes

Comment thread sklearnex/ensemble/_forest.py

Comment thread sklearnex/ensemble/_forest.py

Comment thread sklearnex/ensemble/_forest.py

Comment thread sklearnex/ensemble/_forest.py

Comment thread onedal/ensemble/forest.py

Comment thread onedal/ensemble/tests/test_random_forest.py

david-cortes-intel reviewed Dec 8, 2025

View reviewed changes

david-cortes-intel approved these changes Dec 8, 2025

View reviewed changes

icfaust merged commit d769d14 into uxlfoundation:main Dec 8, 2025
31 checks passed

		for i, v in enumerate(class_weights):
		expanded_class_weight[y_store_unique_indices == i] *= v

	), # completed in offload check
	), # finite check is performed in support_input_format() in onedal._device_offload

Conversation

icfaust commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

ethanglaser commented Dec 6, 2025

Uh oh!

icfaust commented Dec 6, 2025

Uh oh!

icfaust commented Dec 6, 2025

Uh oh!

icfaust commented Dec 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icfaust commented Dec 7, 2025

Uh oh!

icfaust commented Dec 7, 2025

Uh oh!

icfaust commented Dec 7, 2025

Uh oh!

david-cortes-intel Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

icfaust commented Dec 2, 2024 •

edited

Loading