[r] add poc matrix projection interface #158
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add an sklearn-like interface for creating pipelines in terms of fitting, projection, and combining operations on matrices.
Current thoughts:
I deviated from the design docs a little bit to make the inheritance make more sense. I made a default
PipelineBase
, withPipeline
inheriting form it. I then made aPipelineStep
that inherits from PipelineBase.Estimator
andTransformer
both inherit from this class.PipelineBase
andPipeline
differ because Pipelines should have steps. This isn't true for single PipelineStepsPipelineBase
andPipelineStep
differ to indicate each step has a step_name associated with it. Also to allow for shared interface for transformers/predictors in how they are printed and how they can be concatenated to create a pipelineI had to change
transform()
toproject()
, given I found a generic base function with the same name. Additionally, I foundpredict()
in the stats package, and changed it toestimate()
Tests will be added in another sister PR, as there are no transformers/estimators that are built to test functionality here.
I'm not sure which methods I should provide detail to, given that we are not sure how much of this we want to expose. I provided them to the generics themselves, to allow for a meta-look on how to use methods in both
PipelineSteps
andPipeline
. However, it isn't clear to me whether I need to continue providing an extensive docstring for every overriden method in child classes.I'm not sure which Classes I should be exposing to the reference either. I found that previous BPCells classes (ie IterableMatrix) aren't heavily described in the reference. I provided some information on
Pipeline
,Estimator
, andTransformer
, and exposed them to the reference page. I also tried to provide information on how to create aTransformer
, andEstimator
yourself on the docstring.I don't think I'm completely sold on using the
show()
method as an analog of the python__repr__()
dunder. I think it could be more useful to make it act more similarly to what you used to displayIterableMatrix
, ie where we still have information on what steps are in a pipeline, but also macro information, like hyper params or details on what the step has fit to. In this case, we would have a__repr__()
analog somewhere else.Probably redundant to have both
project()
andestimate()
. What do you think for just combining them into one?