visualize verb should find all its inputs from config.#713
Conversation
Rather that assuming that the dataset provides its own breadcrumbs to its input, use the session-wide config to locate that data and its necessary metadata.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #713 +/- ##
==========================================
- Coverage 64.17% 64.09% -0.08%
==========================================
Files 61 61
Lines 5892 5902 +10
==========================================
+ Hits 3781 3783 +2
- Misses 2111 2119 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR updates the visualize verb to stop relying on InferenceDataSet “breadcrumbs” for locating its original inputs, and instead uses the runtime/session config to find the underlying data + metadata (addressing the shift toward Lance-backed ResultDataset outputs).
Changes:
- Switch UMAP results loading from
InferenceDataSet(...)toload_results_dataset(...)(auto-detecting Lance vs.npy). - Introduce a config-driven
DataProviderto source metadata fields/values instead of reading metadata via the results dataset.
Click here to view all benchmarks. |
drewoldag
left a comment
There was a problem hiding this comment.
This seems reasonable enough to me. I agree with most of the comments here. Once addressed, should be good to go.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@gitosaurus I've opened a new pull request, #714, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@gitosaurus I've opened a new pull request, #715, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@gitosaurus I've opened a new pull request, #716, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Use setup_dataset from pytorch_ignite in visualize verb --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
…numpy() (#715) * Replace .numpy() calls with np.asarray() at interface boundaries in visualize.py --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
* Add REQUIRED_SPLITS and OPTIONAL_SPLITS to Visualize verb * Use REQUIRED_SPLITS within Visualize.run() instead of hardcoded strings --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> Co-authored-by: Derek T. Jones <dtj1s@uw.edu> Co-authored-by: Derek T. Jones <dtj@mac.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This reverts commit b6039b9. On reflection, this is trying too hard.
* `visualize` verb should find all its inputs from config. Rather that assuming that the dataset provides its own breadcrumbs to its input, use the session-wide config to locate that data and its necessary metadata. * Insist that "infer" be in the data request * Use setup_dataset from pytorch_ignite in visualize verb * Replace .numpy() calls with np.asarray() at interface boundaries in visualize.py * Add REQUIRED_SPLITS and OPTIONAL_SPLITS to Visualize verb --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
* Adding first draft of common workflow notebook for working with ResultsDataset. * _torch_schedulers * initial commit * ran notebooks and deleted irrelevant cell outputs * addressed code review * addressed code review * fixed second plot not showing up * code review addressed * fixed pre-commit error * fixed typo in image name * `visualize` verb should find all its inputs from config. (#713) * `visualize` verb should find all its inputs from config. Rather that assuming that the dataset provides its own breadcrumbs to its input, use the session-wide config to locate that data and its necessary metadata. * Insist that "infer" be in the data request * Use setup_dataset from pytorch_ignite in visualize verb * Replace .numpy() calls with np.asarray() at interface boundaries in visualize.py * Add REQUIRED_SPLITS and OPTIONAL_SPLITS to Visualize verb --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com> * Get pre-executed notebooks running again, with needed Pydantic and Lance changes (#711) * Fix and run most pre-executed notebooks. Get visualize's h.config correct Chose model.name = HyraxAutoencoder; v.0.6.1 inspection confirms. Smaller batch size for HSC data Must suffix columns with data provider Get MPR demo working Quiet down Lance DB creation warnings * Respond to PR comments * Use data_location * export_model.ipynb needs to be rewritten * Restore hyrax_hats_cutout to main * xcxc blocks innocent images * Adding the notebook to the documentation. * Making the indexes consistent in the examples. * Polished with the help of an agentic notebook reviewers. * Fixing botch merge conflict resolution. * Getting Started notebook was rerun, and we did not clean up the superfluous output. I ran it again, and removed the output that might be distracting. --------- Co-authored-by: Samarth Venkatesh <samnsid7@uw.edu> Co-authored-by: Derek T. Jones <dtj1s@uw.edu> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Rather that assuming that the dataset provides its own breadcrumbs to its input, use the session-wide config to locate that data and its necessary metadata.
Change Description
Closes #709 .
Code Quality