Skip to content

Typed SDCA binary trainers #2506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 14, 2019
Merged

Typed SDCA binary trainers #2506

merged 10 commits into from
Feb 14, 2019

Conversation

wschin
Copy link
Member

@wschin wschin commented Feb 11, 2019

Make SdcaBinaryTrainer strongly-typed according to the type it produces. The existing SdcaBinaryTrainer can produce either calibrated linear model or linear model as mentioned in #2469. To fix #2469, we move common functionalities used in both cases to SdcaBinaryTrainerBase<T> where T is CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator for calibrated case and LinearBinaryModelParameters otherwise. Top-level APIs are changed accordingly. For dynamic/static APIs, we have 4 SDCA binary trainers for

  1. Calibrated linear model with simple arguments.
  2. Calibrated linear model with advanced options.
  3. Uncalibrated linear model with simple arguments.
  4. Uncalibrated linear model with advanced options.

In addition, because we don't like auto-calibration, the output schema in uncalibrated cases should not contain a probability column. This PR also fixes this in the two derived trainer classes' ComputeSdcaBinaryClassifierSchemaShape; the only case with a probability column generated is training logistic regression with Sdca. We also have two separated SdcaCalibratedBinaryTrainer.Options and SdcaBinaryTrainer.Options derived from BinaryArgumentBase. Legacy command-line tool and entry points are not affected by having LegacySdcaBinaryTrainer.

@wschin wschin changed the title Typed SDCA binary trainers [WIP] Typed SDCA binary trainers Feb 11, 2019
@wschin wschin force-pushed the typed-linear-trainers branch from 028d090 to 27536c3 Compare February 11, 2019 22:27
@wschin wschin changed the title [WIP] Typed SDCA binary trainers Typed SDCA binary trainers Feb 12, 2019
@wschin wschin force-pushed the typed-linear-trainers branch 2 times, most recently from cc7766d to b6135a9 Compare February 12, 2019 19:39
@codecov
Copy link

codecov bot commented Feb 12, 2019

Codecov Report

Merging #2506 into master will decrease coverage by 0.01%.
The diff coverage is 93%.

@@            Coverage Diff             @@
##           master    #2506      +/-   ##
==========================================
- Coverage   71.26%   71.24%   -0.02%     
==========================================
  Files         798      798              
  Lines      141231   141236       +5     
  Branches    16112    16108       -4     
==========================================
- Hits       100642   100627      -15     
- Misses      36119    36143      +24     
+ Partials     4470     4466       -4
Flag Coverage Δ
#Debug 71.24% <93%> (-0.02%) ⬇️
#production 67.57% <90.74%> (-0.02%) ⬇️
#test 85.35% <100%> (ø) ⬆️
#Resolved

@codecov
Copy link

codecov bot commented Feb 12, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@fd30559). Click here to learn what that means.
The diff coverage is 93%.

@@            Coverage Diff            @@
##             master    #2506   +/-   ##
=========================================
  Coverage          ?   71.24%           
=========================================
  Files             ?      798           
  Lines             ?   141236           
  Branches          ?    16108           
=========================================
  Hits              ?   100627           
  Misses            ?    36143           
  Partials          ?     4466
Flag Coverage Δ
#Debug 71.24% <93%> (?)
#production 67.57% <90.74%> (?)
#test 85.35% <100%> (?)

@wschin wschin self-assigned this Feb 12, 2019
@wschin wschin added the API Issues pertaining the friendly API label Feb 12, 2019
Fix some tests failed because of duplicated load names
1. Remove arguments to set up calibrator in all SDCA trainer
2. Clean up entry point base on argument's changes.
3. Update test files
@wschin wschin force-pushed the typed-linear-trainers branch from 407bdde to 267c387 Compare February 14, 2019 00:12

namespace Microsoft.ML.Samples.Dynamic
{
public class SDCASupportVectorMachine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDCASupportVectorMachine [](start = 17, length = 24)

This is a static class right? (Only if you happen to post another commit after this.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully #2548 can fix it.

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @wschin, this is definitely a strong step in the right direction. (And incidentally one of the last things I need to actually get rid of IPredictorProducing. 😄)

@wschin wschin merged commit 37acb6c into dotnet:master Feb 14, 2019
@wschin wschin deleted the typed-linear-trainers branch February 14, 2019 05:17
@wschin
Copy link
Member Author

wschin commented Feb 14, 2019

Thank you @TomFinley.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model produced by SdcaBinaryTrainer is not and can not be strongly-typed
4 participants