Skip to content

[r] add poc matrix projection interface #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

immanuelazn
Copy link
Collaborator

@immanuelazn immanuelazn commented Nov 13, 2024

Add an sklearn-like interface for creating pipelines in terms of fitting, projection, and combining operations on matrices.

Current thoughts:

  • I deviated from the design docs a little bit to make the inheritance make more sense. I made a default PipelineBase, with Pipeline inheriting form it. I then made a PipelineStep that inherits from PipelineBase. Estimator and Transformer both inherit from this class.

    • PipelineBase and Pipeline differ because Pipelines should have steps. This isn't true for single PipelineSteps
    • PipelineBase and PipelineStep differ to indicate each step has a step_name associated with it. Also to allow for shared interface for transformers/predictors in how they are printed and how they can be concatenated to create a pipeline
  • I had to change transform() to project(), given I found a generic base function with the same name. Additionally, I found predict() in the stats package, and changed it to estimate()

  • Tests will be added in another sister PR, as there are no transformers/estimators that are built to test functionality here.

  • I'm not sure which methods I should provide detail to, given that we are not sure how much of this we want to expose. I provided them to the generics themselves, to allow for a meta-look on how to use methods in both PipelineSteps and Pipeline. However, it isn't clear to me whether I need to continue providing an extensive docstring for every overriden method in child classes.

  • I'm not sure which Classes I should be exposing to the reference either. I found that previous BPCells classes (ie IterableMatrix) aren't heavily described in the reference. I provided some information on Pipeline, Estimator, and Transformer, and exposed them to the reference page. I also tried to provide information on how to create a Transformer, and Estimator yourself on the docstring.

  • I don't think I'm completely sold on using the show() method as an analog of the python __repr__() dunder. I think it could be more useful to make it act more similarly to what you used to display IterableMatrix, ie where we still have information on what steps are in a pipeline, but also macro information, like hyper params or details on what the step has fit to. In this case, we would have a __repr__() analog somewhere else.

  • Probably redundant to have both project() and estimate(). What do you think for just combining them into one?

@immanuelazn immanuelazn marked this pull request as ready for review November 18, 2024 23:35
@immanuelazn
Copy link
Collaborator Author

Closed as we are changing direction to #167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant