Skip to content

RFC: Remove training harness wrappers (learn!, evaluate!) from Lighthouse.jl? #81

@hannahilea

Description

@hannahilea

I took a recent time-boxed* attempt at deprecating EvaluationRow (#71), and it was Not A Fun Time™. I'm not sure what the correct thing to do here is, and I think that it is very possible that we should remove some (all?) of the training harness aspects of Lighthouse.jl altogether---or at least, we should figure out the minimal set of them that users actually depend on.

We've spent a certain amount of time maintaining the existing harness behavior, but afaik our internal usage of the harness itself is basically non-existent, as teams have combined the Lighthouse components in their own ways to support their specific use-cases. (Specifics to be left to a Beacon slack thread!) Externally, I would be surprised if anyone was depending on them (although if someone is, please say the word!!). Additionally, https://github.com/beacon-biosignals/LighthouseFlux.jl/blob/main/src/LighthouseFlux.jl does not use learn! or evaluate!.

*technically it was also [hurtling-through-]space-boxed as well, huzzah for transcontinental flight....


The particular hole I fell down was trying to cleanly swap out the use of evaluation_metrics_row (and its corresponding log_evaluation_row!) inside evaluate!.

metrics = evaluation_metrics_row(predicted_hard_labels, predicted_soft_labels,

Why? Well, the existing usage conflates tradeoff metrics and hardened metrics, and we shouldn't be computing (and logging) both there without the caller specifying things like how the per-class binarization should happen (to create the hardened metrics), and whether they want to additionally compute multirater label metrics (i.e., if votes exist), and whether they additionally want to compute additional metrics that (until now) have only been computed for binary classification problems.

At the point that we expose separate interfaces for the various combinations of inputs that a user might want (mulitclass v binary, single rater v multirater, binarization choice, pre-computed predicted_hard_labels for multiclass input), we will have created a lot more code to maintain for a lot less benefit. At the end of the day, asking the caller to implement their own evaluate!(model::AbstractClassifier, ...) would be easier.

...which seems like a viable option. Except what inputs would a template evaluate! function take? What combination of the existing

predicted_hard_labels::AbstractVector,
              predicted_soft_labels::AbstractMatrix,
              elected_hard_labels::AbstractVector,
              classes, logger;
              logger_prefix, logger_suffix,
              votes::Union{Nothing,AbstractMatrix}=nothing,
              thresholds=0.0:0.01:1.0, optimal_threshold_class::Union{Nothing,Integer}=nothing

? Because the choice depends on the type of metrics being computed in the first place.

If various projects were depending on this evaluate! function, it seems like choosing some minimal required arguments (

predicted_soft_labels::AbstractMatrix,
              elected_hard_labels::AbstractVector,
              classes, logger;
              logger_prefix, logger_suffix

?) and passing the others as kwargs from the learn! function might do it. Buuuut I'm not sure how many folks use this function in the first place!


Which brings me to: what is the correct thing to do here? And does anyone use the learn! and evaluate! functions, or could/should we remove them except as examples (in the docs) for how to implement a custom loop for one's own model?

To quote @ericphanson, when talking about this issue,

I think a harness like that it just too rigid. We just need a lot of useful primitives like metrics, logging, plots, etc, and then we can combine them in different ways as makes sense for the project

I agree---and so I think losing the evaluate! and learn! calls as they currently exist would allow us to focus on those primitives and sink less time into maintaining code that isn't used. And if more flexible (or smaller scoped) harnesses make their way back into Lighthouse.jl, I think that would be cool also, but we should let that happen from the ground up once we create training loops that actively work for us and could be shared.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions