RFC: Remove training harness wrappers (`learn!`, `evaluate!`) from Lighthouse.jl?

I took a recent time-boxed* attempt at deprecating `EvaluationRow` (#71), and it was Not A Fun Time™. I'm not sure what the correct thing to do here is, and I think that it is very possible that we should remove some (all?) of the training harness aspects of Lighthouse.jl altogether---or at least, we should figure out the minimal set of them that users actually depend on.

 We've spent a certain amount of time maintaining the existing harness behavior, but afaik our internal usage of the harness itself is basically non-existent, as teams have combined the Lighthouse components in their own ways to support their specific use-cases. (Specifics to be left to a Beacon slack thread!) Externally, I would be surprised if anyone was depending on them (although if someone is, please say the word!!). Additionally, https://github.com/beacon-biosignals/LighthouseFlux.jl/blob/main/src/LighthouseFlux.jl does not use `learn!` or `evaluate!`.


*technically it was also [hurtling-through-]space-boxed as well, huzzah for transcontinental flight....

---

The particular hole I fell down was trying to cleanly swap out the use of `evaluation_metrics_row` (and its corresponding `log_evaluation_row!`) inside `evaluate!`. 

https://github.com/beacon-biosignals/Lighthouse.jl/blob/0540cdd5fcfc7a95cdd0b8c080febc42da736dbe/src/learn.jl#L233

Why? Well, the existing usage conflates tradeoff metrics and hardened metrics, and we shouldn't be computing (and logging) both there without the caller specifying things like how the per-class binarization should happen (to create the hardened metrics), and whether they want to additionally compute multirater label metrics (i.e., if `votes` exist), and whether they additionally want to compute additional metrics that (until now) have only been computed for binary classification problems. 

At the point that we expose separate interfaces for the various combinations of inputs that a user might want (mulitclass v binary, single rater v multirater, binarization choice, pre-computed `predicted_hard_labels` for multiclass input), we will have created a lot more code to maintain for a lot less benefit. At the end of the day, asking the caller to implement their _own_ `evaluate!(model::AbstractClassifier, ...)` would be easier.

...which seems like a viable option. Except what inputs would a template `evaluate!` function take? What combination of the existing
```
predicted_hard_labels::AbstractVector,
              predicted_soft_labels::AbstractMatrix,
              elected_hard_labels::AbstractVector,
              classes, logger;
              logger_prefix, logger_suffix,
              votes::Union{Nothing,AbstractMatrix}=nothing,
              thresholds=0.0:0.01:1.0, optimal_threshold_class::Union{Nothing,Integer}=nothing
```
? Because the choice depends on the type of metrics being computed in the first place.

If various projects were depending on this `evaluate!` function, it seems like choosing some minimal required arguments (
```
predicted_soft_labels::AbstractMatrix,
              elected_hard_labels::AbstractVector,
              classes, logger;
              logger_prefix, logger_suffix
```
?) and passing the others as kwargs from the `learn!` function might do it. Buuuut I'm not sure how many folks use this function in the first place!

----

Which brings me to: what is the correct thing to do here? And does anyone use the `learn!` and `evaluate!` functions, or could/should we remove them except as examples (in the docs) for how to implement a custom loop for one's own model?

To quote @ericphanson, when talking about this issue,
> I think a harness like that it just too rigid. We just need a lot of useful primitives like metrics, logging, plots, etc, and then we can combine them in different ways as makes sense for the project

I agree---and so I think losing the `evaluate!` and `learn!` calls as they currently exist would allow us to focus on those primitives and sink less time into maintaining code that isn't used. And if more flexible (or smaller scoped) harnesses make their way back into Lighthouse.jl, I think that would be cool also, but we should let that happen from the ground up once we create training loops that actively work for us and could be shared. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Remove training harness wrappers (`learn!`, `evaluate!`) from Lighthouse.jl? #81

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Remove training harness wrappers (learn!, evaluate!) from Lighthouse.jl? #81

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

RFC: Remove training harness wrappers (`learn!`, `evaluate!`) from Lighthouse.jl? #81