-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Product vision: users should not interact with caching, they should only know it's there and enjoy the benefits.
Hence, I suggest to:
- remove
clear_cache: this should be our responsibility to clear the cache when needed / make it light in memory. - remove
cache_predictionsand instead compute predictions once for all at init time. You have no reason to build a report if you're not interested in prediction. And if we make reports immutable as I suggest here: Discussion: Make reports more immutable - remove setters forX_test,y_test,pos_label? #2553, you can do that in a much easier way.
Note that currently cache_predictions store many redundant information in cache for classification, even duplicates (and quite a lot of duplicates for multiclass classification). Indeed, _get_cached_response_values is called for each possible pos_label while the returned values for predict_proba or decision_function are the same in the case of multiclass classification... I can write a quick fix for that, but I'd prefer writing a clean refactor instead. Also predict result can be deduced from predict_proba/decision_function, this is an optimization we don't do but that could be valuable for some models like KNeighbors[...].
=> I feel that instead of helping users to speed things up, we're slowing them down... I could craft an example where cache_predictions takes 10 minutes while it should rather have taken 20s...