Skip to content

Generate README.md for catalogs and collections #615

@hombit

Description

@hombit

Feature request

It would be really great if we could add an ability to generate human-readable README.md files for catalogs and collections. I also propose making the generation of this file the default behavior for catalog and collection outputs from hats-import and adding an option for that for LSDB. Markdown files are human-readable, even in raw form, and would be rendered by IDEs, Jupyter, Hugging Face, GitHub, etc. In the future, we can also consider generating an index.html page for HTTP-based hosting of the catalogs.

This is how the README.md may look like (my comments are in "[]"):

[This YAML metadata on the top of the readme is used by Hugging Face and datasets library, for both local and remote locations. It should be opt-in]


configs:

  • config_name: default
    data_dir: "ztf_dr23_lc-hats/dataset/"
  • config_name: margin_10arcs
    data_dir: "ztf_dr23_lc-hats_margin_10arcsec/dataset/"
  • config_name: index_objectid
    data_dir: "ztf_dr23_lc-hats_index_objectid/dataset/"

Zwicky Transient Facility Data Release 23 (lightcurves) catalog

This is the Zwicky Transient Facility Data Release 23 (lightcurves) catalog in HATS format. [OR USER-PROVIDED DESCRIPTION]

You can open the catalog using LSDB, see details at https://docs.lsdb.io:

import lsdb
lsdb.open_catalog(<path-to-the-catalog>)

Auxiliary catalogs

  • 10-arcsecond margin catalog: ztf_dr23_lc-hats_margin_10arcsec/
  • Secondary index catalog for column objectid: ztf_dr23_lc-hats_index_objectid/

Catalog metadata

Number of rows Number of columns Number of partitions Size on disk
4,973,896,193 13 9,933 8.0 TiB

Columns

Right accession column: objra.
Declination column: objdec.
Primary HEALPix column, order 29: _healpix_29.

[Some additional columns could go here, when we support descriptions, etc]

Name _healpix_19 _healpix_9 objectid filterid fieldid rcid objra objdec nepochs lightcurve.hmjd lightcurve.mag lightcurve.magerr lightcurve.clrcoeff lightcurve.catflags
Data type int64 int32 int64 int8 int16 int8 float32 float32 int64 list[float64] list[float32] list[float32] list[float32] list[int32]
Default? no no yes yes no no yes yes no yes yes yes yes yes
Nested? no no no no no no no no no lightcurve lightcurve lightcurve lightcurve lightcurve
First row 175802814 167 1447212400010477 2 1447 47 44.042023 1.264162 1 [58761.42485] [20.727491] [0.2088] [0.124223] [0]

Sky Maps

[I put plots to the end, because we are probably going to embed them, here I just uploaded to github]

Angular density map

Image

Partition map

Image

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions