[FIX] Fixes for Tabular Regression by ravinkohli · Pull Request #235 · automl/Auto-PyTorch

ravinkohli · 2021-05-21T15:15:14Z

This PR allows reproducibility in Tabular Regression, enables traditional methods for tabular regression, and adds tests for these.

Specifically, it adds the following

set torch seed for tabular regression pipeline
Rename BaseClassifier in 'classifier_models' to BaseTraditionalLearner
Refactor TraditionalLearners to remove duplicate code.
adds score to TabularClassificationPipeline and improve documentation for TabularRegressionPipeline.
Adds TraditionalTabularRegressionPipeline
Adds test for traditional models
Refactor test for pipeline scores in test_tabular_classification and test_tabular_regression
Adds TabularRegressionTask documentation to github.io
Adds installation instruction for mac os

P.S. I couldn't think of better names so please feel free to suggest.

autoPyTorch/evaluation/abstract_evaluator.py

test/test_pipeline/test_tabular_regression.py

franchuterivera · 2021-05-25T16:02:49Z

autoPyTorch/pipeline/components/setup/traditional_ml/base_model.py

        self.model = self.build_model(input_shape=input_shape,
-                                      logger_port=X['logger_port'],
-                                      output_shape=output_shape)
+                                      logger_port=X['logger_port'] if 'logger_port' in X else None,


I Think we should always have this here, the logger port

I think in one of the tests we didn't have it

autoPyTorch/pipeline/components/setup/traditional_ml/base_model.py

...rch/pipeline/components/setup/traditional_ml/traditional_learner/base_traditional_learner.py

autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/learners.py

franchuterivera · 2021-05-25T17:43:07Z

test/test_pipeline/components/setup/test_setup_traditional_models.py

+        assert 'val_preds' in model.fit_output.keys()
+        assert isinstance(model.fit_output['val_preds'], list)
+        assert len(model.fit_output['val_preds']) == len(fit_dictionary_tabular['val_indices'])
+        if model.model.is_classification:


Can you please add a unit test that makes sure that is_classifiaction what set properly? Like I was not able to find where in the code we make sure that is properly setup up...

https://github.com/ravinkohli/Auto-PyTorch/blob/1f0e35d399ff41b0ceb12ff8ba952714fb6d498d/autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/base_traditional_learner.py#L68

franchuterivera · 2021-05-25T17:43:50Z

test/test_pipeline/components/setup/test_setup_traditional_models.py

+        assert y_pred.shape[0] == len(fit_dictionary_tabular['val_indices'])
+        # Test if classifier can score and
+        # the result is same as in results
+        score = model.score(fit_dictionary_tabular['X_train'][fit_dictionary_tabular['val_indices']],


Can you check the value of the score? I think this traditional classifier should achieve a pretty good score

unfortunately some of the classifiers fail to get a good score on some datasets. Sometimes its really low as well. In a later PR we can try and optimize the hyperparameters of the traditional classifiers to get good score for all scenarios but I feel for the purpose of this PR its fine.

test/test_pipeline/components/setup/test_setup_traditional_models.py

franchuterivera

thanks a lot for the PR, with this we will be able to compare to other automl systems on regression.

Some minor question/changes on this PR.

franchuterivera · 2021-06-01T13:12:58Z

I just started running with this, but the first fix we need so that I do not forget is that we have to update this file: https://github.com/automl/Auto-PyTorch/blob/refactor_development/MANIFEST.in with the new json files.

Also, i think we have to add the greedy portfolio here?

franchuterivera · 2021-06-01T15:07:24Z

The only other question that I have is what should we do with the greedy portfolio for regression?

One possibility can be to have in there the default configuration per neural network (default per mlp, per shaped, and so on). I see very good performance on the default configuration in Boston, but not so good in other configurations because the BO model has not yet learned what to do with the other configurations. The other is to generate the portfolio for regression.

What do you think?

ravinkohli · 2021-06-03T10:49:56Z

The only other question that I have is what should we do with the greedy portfolio for regression?

One possibility can be to have in there the default configuration per neural network (default per mlp, per shaped, and so on). I see very good performance on the default configuration in Boston, but not so good in other configurations because the BO model has not yet learned what to do with the other configurations. The other is to generate the portfolio for regression.

What do you think?

I think for now we can continue using the greedy portfolio json configs, and when we setup the scripts to build the portfolio ourselves, we can build one for regression as well. However, as you are saying that the default configs are giving good results, we can compare them with the portfolio we have right now and use the one which gives the best performance boost

autoPyTorch/pipeline/components/setup/traditional_ml/base_model.py

autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/utils.py

autoPyTorch/pipeline/components/training/trainer/base_trainer_choice.py

nabenabe0928 · 2021-06-07T11:11:57Z

autoPyTorch/pipeline/tabular_regression.py

+                results in a MemoryError.
+            y (np.ndarray):
+                Ground Truth labels
+            metric_name (str, default = 'r2'):


Is it from sklearn, right?
Can you add it?

no these are our names

nabenabe0928 · 2021-06-07T11:14:43Z

test/test_pipeline/test_tabular_regression.py


    pipeline = TabularRegressionPipeline(
        dataset_properties=fit_dictionary_tabular_dummy['dataset_properties'],
+        random_state=1


Why integer?

because we convert a seed with a random state instance with this line.

nabenabe0928 · 2021-06-07T11:23:52Z

autoPyTorch/pipeline/components/setup/traditional_ml/tabular_traditional_model.py

@@ -0,0 +1,70 @@
+from typing import Any, Dict, Optional, Tuple, Type


I will check later

nabenabe0928 · 2021-06-07T11:24:02Z

...rch/pipeline/components/setup/traditional_ml/traditional_learner/base_traditional_learner.py

@@ -0,0 +1,266 @@
+import json


I will check later

nabenabe0928 · 2021-06-07T11:24:12Z

autoPyTorch/pipeline/components/setup/traditional_ml/traditional_learner/learners.py

@@ -0,0 +1,366 @@
+import logging.handlers


I will check later

nabenabe0928 · 2021-06-07T11:24:54Z

autoPyTorch/pipeline/traditional_tabular_regression.py

@@ -0,0 +1,185 @@
+import warnings


I will check later

nabenabe0928 · 2021-06-07T11:25:15Z

test/test_pipeline/components/setup/test_setup_traditional_models.py

@@ -0,0 +1,134 @@
+import copy


I will check later

codecov · 2021-06-09T17:24:45Z

Codecov Report

Merging #235 (eb4b80e) into development (9a847e2) will increase coverage by 0.41%.
The diff coverage is 76.42%.

@@               Coverage Diff               @@
##           development     #235      +/-   ##
===============================================
+ Coverage        80.73%   81.14%   +0.41%     
===============================================
  Files              148      150       +2     
  Lines             8563     8559       -4     
  Branches          1323     1331       +8     
===============================================
+ Hits              6913     6945      +32     
+ Misses            1173     1131      -42     
- Partials           477      483       +6

Impacted Files	Coverage Δ
autoPyTorch/api/tabular_regression.py	`96.87% <ø> (ø)`
...h/pipeline/components/training/trainer/__init__.py	`69.56% <0.00%> (-0.77%)`	⬇️
autoPyTorch/utils/common.py	`87.09% <ø> (+19.23%)`	⬆️
...PyTorch/pipeline/traditional_tabular_regression.py	`25.42% <25.42%> (ø)`
autoPyTorch/evaluation/abstract_evaluator.py	`75.89% <45.16%> (-2.28%)`	⬇️
...tup/traditional_ml/traditional_learner/learners.py	`82.97% <82.97%> (ø)`
...line/components/setup/traditional_ml/base_model.py	`73.61% <88.88%> (+0.59%)`	⬆️
...tup/traditional_ml/traditional_learner/__init__.py	`91.66% <90.00%> (ø)`
...ml/traditional_learner/base_traditional_learner.py	`94.80% <94.80%> (ø)`
.../setup/traditional_ml/tabular_traditional_model.py	`96.66% <96.66%> (ø)`
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9a847e2...eb4b80e. Read the comment docs.

…eed in regression

…tructions

ravinkohli requested review from franchuterivera and nabenabe0928 May 21, 2021 15:15