[enhancement] Enable Array API in ensemble algos#2201
[enhancement] Enable Array API in ensemble algos#2201icfaust merged 215 commits intouxlfoundation:mainfrom
Conversation
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
|
/intelci: run |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for i, v in enumerate(class_weights): | ||
| expanded_class_weight[y_store_unique_indices == i] *= v |
There was a problem hiding this comment.
The comment on line 688 warns about O(n*m) complexity. This nested iteration over classes and samples could be a significant performance bottleneck for datasets with many classes. Consider adding a more explicit warning in the docstring or raising a warning at runtime when the number of classes exceeds a threshold (e.g., >100).
| dtype=[xp.float64, xp.float32], | ||
| ensure_all_finite=not sklearn_check_version( | ||
| "1.4" | ||
| ), # completed in offload check |
There was a problem hiding this comment.
The comment 'completed in offload check' is unclear about where and how the finite check is completed. This should reference the specific location (e.g., line numbers or function name) where the check occurs to aid future maintenance.
| ), # completed in offload check | |
| ), # finite check is performed in support_input_format() in onedal._device_offload |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/intelci: run |
1 similar comment
|
/intelci: run |
| - tests/test_common.py::test_estimators[ExtraTreesClassifier()-check_sample_weights_invariance(kind=ones)] | ||
| - tests/test_common.py::test_estimators[ExtraTreesClassifier()-check_sample_weights_invariance(kind=zeros)] | ||
| - tests/test_common.py::test_estimators[ExtraTreesRegressor()-check_sample_weights_invariance(kind=ones)] | ||
| - ensemble/tests/test_forest.py::test_min_weight_fraction_leaf |
There was a problem hiding this comment.
CC @Alexandr-Solovev - this test in particular is very straighforward and not expected to fail, yet it does here.
* add finiteness_checker pybind11 bindings * added finiteness checker * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Update finiteness_checker.cpp * Rename finiteness_checker.cpp to finiteness_checker.cpp * Update finiteness_checker.cpp * add next step * follow conventions * make xtable explicit * remove comment * Update validation.py * Update __init__.py * Update validation.py * Update __init__.py * Update __init__.py * Update validation.py * Update _data_conversion.py * Update _data_conversion.py * Update policy_common.cpp * Update policy_common.cpp * Update _policy.py * Update policy_common.cpp * Rename finiteness_checker.cpp to finiteness_checker.cpp * Create finiteness_checker.py * Update validation.py * Update __init__.py * attempt at fixing circular imports again * fix isort * remove __init__ changes * last move * Update policy_common.cpp * Update policy_common.cpp * Update policy_common.cpp * Update policy_common.cpp * Update validation.py * add testing * isort * attempt to fix module error * add fptype * fix typo * Update validation.py * remove sua_ifcae from to_table * isort and black * Update test_memory_usage.py * format * Update _data_conversion.py * Update _data_conversion.py * Update test_validation.py * remove unnecessary code * make reviewer changes * make dtype check change * add sparse testing * try again * try again * try again * temporary commit * first attempt * missing change? * modify DummyEstimator for testing * generalize DummyEstimator * switch test * further testing changes * add initial validate_data test, will be refactored * fixes for CI * Update validation.py * Update validation.py * Update test_memory_usage.py * Update base.py * Update base.py * improve tests * fix logic * fix logic * fix logic again * rename file * Revert "rename file" This reverts commit 8d47744. * remove duplication * fix imports * Rename test_finite.py to test_validation.py * Revert "Rename test_finite.py to test_validation.py" This reverts commit ee799f6. * updates * Update validation.py * fixes for some test failures * fix text * fixes for some failures * make consistent * fix bad logic * fix in string * attempt tp see if dataframe conversion is causing the issue * fix iter problem * fix testing issues * formatting * revert change * fixes for pandas * there is a slowdown with pandas that needs to be solved * swap to transpose for speed * more clarity * add _check_sample_weight * add more testing' * rename * remove unnecessary imports * fix test slowness * focus get_dataframes_and_queues * put config_context around * Update test_validation.py * Update base.py * Update test_validation.py * generalize regex * add fixes for sklearn 1.0 and input_name * fixes for test failures * Update validation.py * Update test_validation.py * Update validation.py * formattintg * make suggested changes * follow changes made in uxlfoundation#2126 * fix future device problem * Update validation.py * finished movement * fix first error * next mistake * remove bad dtypes check * updates * remove array * solve onedal issues * solve onedal issues * updates * updates * further fixes * further fixes * fix issues to see how it goes * oops * updates * add finite checks for predict and predict_proba * updates * centralize * further reduce code * updates * remove sklearn conformance from onedal estimator init signature * remove more * fixes * change away from sklearn `max_samples` in onedal estimators * fix error * move things * Update forest.py * Update forest.py * Update _forest.py * further fixes to onedal side * further fixes to onedal side * simplifications * attempt at classifiers support * further changes * fix error on onedal side * fix error on onedal side * fixes * fix pandas related error * remove unnecessary code: * try to fix issues related to regressor data * fixes necessary for CI * fixes for formatting * updates * push * push * fixes * remove upon request * remove upon request * further fixes * try to fix classifiers for array API inputs * try again * Update array_api.rst * Update sklearnex/ensemble/_forest.py Co-authored-by: david-cortes-intel <david.cortes@intel.com> * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update _forest.py * Update forest.py * Update _forest.py * Update sklearnex/ensemble/_forest.py Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com> * Update _forest.py * Update _forest.py * Update array_api.rst * Update array_api.rst * remove sparse checks for sample_weight * Update deselected_tests.yaml * Update deselected_tests.yaml --------- Co-authored-by: david-cortes-intel <david.cortes@intel.com> Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com>
Description
This PR refactors the Ensemble algorithms (RandomForestRegressor, RandomForestClassifier, ExtraTreesRegressor and ExtraTreesClassifier) to follow repository standards and add array API support. This reduced the code by 500+ lines and required the following changes:
BaseEstimatorinheritance from onedal ensemble estimators__init__signatures to remove sklearn conformant kwargs in onedal ensemble estimatorsrandom_stateuse from onedal estimatorsclass_countkwarg tofitas calculating it in python is scikit-learn conformance (oneDAL expects it a priori)oneDALfor use by Classifiers and Regressors_create_modelfunctionpredictmethodForestRegressorandForestClasssifierobjects to minimize maintenancemax_samplestoobservations_per_tree_fractionto follow oneDAL values_save_attributesmethod to be specific to Classifiers vs Regressors_onedal_fit_ready,_onedal_cpu_supportedand_onedal_gpu_supportedto reduce code duplication via inheritance and make array API enabledenable_array_apidecorators to public-facing estimators_check_parametersfunction behindsklearn_check_versionfor future removalmin_impurity_splitwhich was removed in sklearn 0.25_validate_y_class_weightmethod designed specifically for sklearnex estimators (missing some functionality which is irrelevant to the sklearnex estimator)check_n_featuresfromsklearnex.utils.validationas it is no longer necessarysample_weightchecks for sparsity (blocked by_check_sample_weight)PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing