[TRACKER]: Verify transformer_<catalog>.py output matches <catalog>.py output

This is a tracker issue for all catalog.py files.
Each catalog.py file can be picked up individually and should result in exactly one PR. The ` transform_scripts/transform_<catalog>_to_parquet.py` have been written by claude, so they need to be checked thoroughly since they can contain mistakes.
 
Here is a reference PR: https://github.com/UniverseTBD/mmu-hdf-to-hats/pull/53

To start, do the following:
 - choose a non-checked script (see list below) from https://github.com/UniverseTBD/mmu-hdf-to-hats/tree/main/catalog_download_scripts
 - then check the corresponding catalog on https://users.flatironinstitute.org/~polymathic/data/MultimodalUniverse/v1 and find a small healpix (<100MB is ideal)
 - write a download script, e.g. verification/download_sdss.sh using the healpix you found
 - write a process_<catalog>_using_datasets.py, execute it using the command `uv run --with-requirements=verification/requirements.in python verification/process_<catalog>_using_datasets.py` (this is important, since it will run with `datasets==3.6` which is the last version to support custom scripts)
 - write a catalog_functions/<catalog>_transformer.py
 - write a transform_scripts/transform_<catalog>_to_parquet.py and run it (simply using `python transform_scripts/transform_<catalog>_to_parquet.py` is fine here)
 - run `python verification/compare.py <path1> <path2>` where the paths are the output paths of `process_<catalog>_using_datasets.py` and ` transform_scripts/transform_<catalog>_to_parquet.py`
 - add the catalog to the CI workflow [here](https://github.com/UniverseTBD/mmu-hdf-to-hats/blob/main/.github/workflows/check_mmu_transformations.yaml#L54-L63)
 - add the corresponding files in `verify.py`, see [here](https://github.com/UniverseTBD/mmu-hdf-to-hats/blob/main/verify.py#L32-L70)

Problems that can arise:
 - in desi a negation operator for a boolean column was missing
 - float conversion can be problematic due to different float types
 - the object id matching when adding the coordinates can be off, since object_ids are differently formatted across the catalogs

Tracker list:
- [x] sdss.py
- [x] btsbot.py
- [x] cfa.py
- [x] csp.py
- [x] chandra.py (https://users.flatironinstitute.org/~polymathic/data/MultimodalUniverse/v1/chandra/chandra.py forbidden but can be retrieved from c[ommit `2dcff3d` of this repo](https://github.com/UniverseTBD/mmu-hdf-to-hats/tree/2dcff3dd5fb4d5160024bb8c1d8c05ac85676de9/scripts/chandra))
- [x] des_y3_sne_ia.py
- [x] desi.py
- [x] desi_provabgs.py
- [x] foundation.py
- [x] gaia.py
- ~[ ] gui.py~ (not a catalog script)
- [x] gz10.py
- [x] hsc.py
- [x] jwst.py
- [x] legacysurvey.py
- [x] manga.py
- [x] plasticc.py
- [x] ps1_sne_ia.py
- [x] snls.py
- [x] ssl_legacysurvey.py
- ~[ ] start.py~ (not a catalog script)
- [x] swift_sne_ia.py
- [x] tess.py
- [x] vipers.py
- [x] yse.py


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRACKER]: Verify transformer_<catalog>.py output matches <catalog>.py output #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TRACKER]: Verify transformer_<catalog>.py output matches <catalog>.py output #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions