Skip to content

visualize verb should find all its inputs from config.#713

Merged
gitosaurus merged 8 commits intomainfrom
dtj-visualize-config
Feb 20, 2026
Merged

visualize verb should find all its inputs from config.#713
gitosaurus merged 8 commits intomainfrom
dtj-visualize-config

Conversation

@gitosaurus
Copy link
Copy Markdown
Contributor

Rather that assuming that the dataset provides its own breadcrumbs to its input, use the session-wide config to locate that data and its necessary metadata.

Change Description

Closes #709 .

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Rather that assuming that the dataset provides its own breadcrumbs
to its input, use the session-wide config to locate that data and
its necessary metadata.
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 7.40741% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.09%. Comparing base (464960b) to head (2c177a7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/hyrax/verbs/visualize.py 7.40% 25 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #713      +/-   ##
==========================================
- Coverage   64.17%   64.09%   -0.08%     
==========================================
  Files          61       61              
  Lines        5892     5902      +10     
==========================================
+ Hits         3781     3783       +2     
- Misses       2111     2119       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the visualize verb to stop relying on InferenceDataSet “breadcrumbs” for locating its original inputs, and instead uses the runtime/session config to find the underlying data + metadata (addressing the shift toward Lance-backed ResultDataset outputs).

Changes:

  • Switch UMAP results loading from InferenceDataSet(...) to load_results_dataset(...) (auto-detecting Lance vs .npy).
  • Introduce a config-driven DataProvider to source metadata fields/values instead of reading metadata via the results dataset.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 19, 2026

Before [464960b] After [05b6a97] Ratio Benchmark (Parameter)
54.7±0.2s failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(16384, 'qdrant')
1.94±0.09s 2.02±0.02s 1.04 benchmarks.time_prepare_help
15.735343963148088 16.151989804173663 1.03 data_cache_benchmarks.DataCacheBenchmarks.track_cache_hsc1k_hyrax_size_undercount
1.97±0.1s 2.01±0.03s 1.02 benchmarks.time_infer_help
1.98±0.1s 2.00±0.03s 1.01 benchmarks.time_lookup_help
2.01±0.07s 2.02±0.02s 1.01 benchmarks.time_rebuild_manifest_help
7.58±0.04s 7.61±0.06s 1.01 data_cache_benchmarks.DataCacheBenchmarks.time_preload_cache_hsc1k
1.2G 1.21G 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(2048, 'chromadb')
1.46G 1.47G 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(2048, 'qdrant')
2.02±0.07s 2.02±0.03s 1.00 benchmarks.time_save_to_database_help

Click here to view all benchmarks.

Copy link
Copy Markdown
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable enough to me. I agree with most of the comments here. Once addressed, should be good to go.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 19, 2026

@gitosaurus I've opened a new pull request, #714, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 19, 2026

@gitosaurus I've opened a new pull request, #715, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 20, 2026

@gitosaurus I've opened a new pull request, #716, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 6 commits February 20, 2026 09:14
* Use setup_dataset from pytorch_ignite in visualize verb

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
…numpy() (#715)

* Replace .numpy() calls with np.asarray() at interface boundaries in visualize.py

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
* Add REQUIRED_SPLITS and OPTIONAL_SPLITS to Visualize verb
* Use REQUIRED_SPLITS within Visualize.run() instead of hardcoded strings

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Co-authored-by: Derek T. Jones <dtj1s@uw.edu>
Co-authored-by: Derek T. Jones <dtj@mac.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This reverts commit b6039b9.

On reflection, this is trying too hard.
@gitosaurus gitosaurus merged commit 3847cca into main Feb 20, 2026
7 checks passed
@gitosaurus gitosaurus deleted the dtj-visualize-config branch February 20, 2026 19:54
drewoldag pushed a commit that referenced this pull request Feb 21, 2026
* `visualize` verb should find all its inputs from config.

Rather that assuming that the dataset provides its own breadcrumbs
to its input, use the session-wide config to locate that data and
its necessary metadata.

* Insist that "infer" be in the data request
* Use setup_dataset from pytorch_ignite in visualize verb
* Replace .numpy() calls with np.asarray() at interface boundaries in visualize.py
* Add REQUIRED_SPLITS and OPTIONAL_SPLITS to Visualize verb

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
drewoldag added a commit that referenced this pull request Feb 24, 2026
* Adding first draft of common workflow notebook for working with ResultsDataset.

* _torch_schedulers

* initial commit

* ran notebooks and deleted irrelevant cell outputs

* addressed code review

* addressed code review

* fixed second plot not showing up

* code review addressed

* fixed pre-commit error

* fixed typo in image name

* `visualize` verb should find all its inputs from config. (#713)

* `visualize` verb should find all its inputs from config.

Rather that assuming that the dataset provides its own breadcrumbs
to its input, use the session-wide config to locate that data and
its necessary metadata.

* Insist that "infer" be in the data request
* Use setup_dataset from pytorch_ignite in visualize verb
* Replace .numpy() calls with np.asarray() at interface boundaries in visualize.py
* Add REQUIRED_SPLITS and OPTIONAL_SPLITS to Visualize verb

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>

* Get pre-executed notebooks running again, with needed Pydantic and Lance changes (#711)

* Fix and run most pre-executed notebooks.

Get visualize's h.config correct
Chose model.name = HyraxAutoencoder; v.0.6.1 inspection confirms.

Smaller batch size for HSC data

Must suffix columns with data provider

Get MPR demo working

Quiet down Lance DB creation warnings

* Respond to PR comments

* Use data_location

* export_model.ipynb needs to be rewritten

* Restore hyrax_hats_cutout to main

* xcxc blocks innocent images

* Adding the notebook to the documentation.

* Making the indexes consistent in the examples.

* Polished with the help of an agentic notebook reviewers.

* Fixing botch merge conflict resolution.

* Getting Started notebook was rerun, and we did not clean up the superfluous output. I ran it again, and removed the output that might be distracting.

---------

Co-authored-by: Samarth Venkatesh <samnsid7@uw.edu>
Co-authored-by: Derek T. Jones <dtj1s@uw.edu>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: gitosaurus <6794831+gitosaurus@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

visualize verb does not read new inference data sets

5 participants