Skip to content

Commit ef10ac9

Browse files
myandprbveeramani
andauthored
[Data] Improve appearance of repr(dataset) (#59631)
## Description improve the UX by making this look like polars when using Ray Dataset. #### test demo ```python >>> import numpy as np >>> import ray >>> from ray.data import DataContext ray.init(include_dashboard=False, ignore_reinit_error=True) >>> >>> ray.init(include_dashboard=False, ignore_reinit_error=True) 2025-12-25 03:38:11,358 INFO worker.py:2010 -- Started a local Ray instance. /Users/xxx/work/community/ray/python/ray/_private/worker.py:2049: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0 warnings.warn( RayContext(dashboard_url='', python_version='3.10.19', ray_version='3.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}') # Configure truncation: show 4 rows total (1 from the head, the rest from the tail) # and display up to 5 columns (2 from the head, 2 from the tail, plus an ellipsis). >>> ctx = DataContext.get_current() >>> ctx.dataset_repr_max_rows = 4 # Display a total of 4 rows >>> ctx.dataset_repr_head_rows = 1 # To display 1 row from the head and the remaining from the tail >>> ctx.dataset_repr_max_columns = 5 # Show 5 columns in total, with middle columns truncated (represented by ellipsis ...). >>> ctx.dataset_repr_head_columns = 2 # Display the first 2 columns at the head, and the remaining columns at the tail # Create a demo dataset with 10 rows and 6 columns. >>> items = [ ... {f"col{i}": i + row for i in range(6)} ... for row in range(10) ... ] >>> ds = ray.data.from_items(items) >>> >>> print("Before materialization (schema preview only):") Before materialization (schema preview only): >>> print(ds) shape: (10, 6) ╭───────┬───────┬─────┬───────┬───────╮ │ col0 ┆ col1 ┆ … ┆ col4 ┆ col5 │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ int64 ┆ int64 ┆ … ┆ int64 ┆ int64 │ ╰───────┴───────┴─────┴───────┴───────╯ (Showing 0 of 10 rows) (Showing 4 of 6 columns) >>> print("\nAfter materialization (shows head/tail rows):") After materialization (shows head/tail rows): >>> print(ds.materialize()) # To display a head/tail summary of 4 rows with truncated columns/ellipsis in a tabular format shape: (10, 6) ╭───────┬───────┬─────┬───────┬───────╮ │ col0 ┆ col1 ┆ … ┆ col4 ┆ col5 │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ int64 ┆ int64 ┆ … ┆ int64 ┆ int64 │ ╞═══════╪═══════╪═════╪═══════╪═══════╡ │ 0 ┆ 1 ┆ … ┆ 4 ┆ 5 │ │ … ┆ … ┆ … ┆ … ┆ … │ │ 7 ┆ 8 ┆ … ┆ 11 ┆ 12 │ │ 8 ┆ 9 ┆ … ┆ 12 ┆ 13 │ │ 9 ┆ 10 ┆ … ┆ 13 ┆ 14 │ ╰───────┴───────┴─────┴───────┴───────╯ (Showing 4 of 10 rows) (Showing 4 of 6 columns) ``` ## Related issues > Link related issues: #59482 Fixes #59482 --------- Signed-off-by: yaommen <myanstu@163.com> Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
1 parent 390ae76 commit ef10ac9

File tree

15 files changed

+864
-276
lines changed

15 files changed

+864
-276
lines changed

doc/source/data/loading-data.rst

Lines changed: 58 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -395,11 +395,17 @@ Ray Data interoperates with libraries like pandas, NumPy, and Arrow.
395395

396396
.. testoutput::
397397

398-
MaterializedDataset(
399-
num_blocks=3,
400-
num_rows=3,
401-
schema={food: string, price: double}
402-
)
398+
shape: (3, 2)
399+
╭────────┬────────╮
400+
│ food ┆ price │
401+
│ --- ┆ --- │
402+
│ string ┆ double │
403+
╞════════╪════════╡
404+
│ spam ┆ 9.34 │
405+
│ ham ┆ 5.37 │
406+
│ eggs ┆ 0.94 │
407+
╰────────┴────────╯
408+
(Showing 3 of 3 rows)
403409

404410
You can also create a :class:`~ray.data.dataset.Dataset` from a list of regular
405411
Python objects. In the schema, the column name defaults to "item".
@@ -414,7 +420,19 @@ Ray Data interoperates with libraries like pandas, NumPy, and Arrow.
414420

415421
.. testoutput::
416422

417-
MaterializedDataset(num_blocks=5, num_rows=5, schema={item: int64})
423+
shape: (5, 1)
424+
╭───────╮
425+
│ item │
426+
│ --- │
427+
│ int64 │
428+
╞═══════╡
429+
│ 1 │
430+
│ 2 │
431+
│ 3 │
432+
│ 4 │
433+
│ 5 │
434+
╰───────╯
435+
(Showing 5 of 5 rows)
418436

419437
.. tab-item:: NumPy
420438

@@ -427,18 +445,24 @@ Ray Data interoperates with libraries like pandas, NumPy, and Arrow.
427445
import numpy as np
428446
import ray
429447

430-
array = np.ones((3, 2, 2))
448+
array = np.arange(3)
431449
ds = ray.data.from_numpy(array)
432450

433451
print(ds)
434452

435453
.. testoutput::
436454

437-
MaterializedDataset(
438-
num_blocks=1,
439-
num_rows=3,
440-
schema={data: ArrowTensorTypeV2(shape=(2, 2), dtype=double)}
441-
)
455+
shape: (3, 1)
456+
╭───────╮
457+
│ data │
458+
│ --- │
459+
│ int64 │
460+
╞═══════╡
461+
│ 0 │
462+
│ 1 │
463+
│ 2 │
464+
╰───────╯
465+
(Showing 3 of 3 rows)
442466

443467
.. tab-item:: pandas
444468

@@ -460,11 +484,17 @@ Ray Data interoperates with libraries like pandas, NumPy, and Arrow.
460484

461485
.. testoutput::
462486

463-
MaterializedDataset(
464-
num_blocks=1,
465-
num_rows=3,
466-
schema={food: object, price: float64}
467-
)
487+
shape: (3, 2)
488+
╭────────┬────────╮
489+
│ food ┆ price │
490+
│ --- ┆ --- │
491+
│ object ┆ double │
492+
╞════════╪════════╡
493+
│ spam ┆ 9.34 │
494+
│ ham ┆ 5.37 │
495+
│ eggs ┆ 0.94 │
496+
╰────────┴────────╯
497+
(Showing 3 of 3 rows)
468498

469499
.. tab-item:: PyArrow
470500

@@ -485,11 +515,17 @@ Ray Data interoperates with libraries like pandas, NumPy, and Arrow.
485515

486516
.. testoutput::
487517

488-
MaterializedDataset(
489-
num_blocks=1,
490-
num_rows=3,
491-
schema={food: string, price: double}
492-
)
518+
shape: (3, 2)
519+
╭────────┬────────╮
520+
│ food ┆ price │
521+
│ --- ┆ --- │
522+
│ string ┆ double │
523+
╞════════╪════════╡
524+
│ spam ┆ 9.34 │
525+
│ ham ┆ 5.37 │
526+
│ eggs ┆ 0.94 │
527+
╰────────┴────────╯
528+
(Showing 3 of 3 rows)
493529

494530
.. _loading_datasets_from_distributed_df:
495531

doc/source/data/quickstart.rst

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ across your cluster for better performance.
6161
def transform_batch(batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
6262
vec_a = batch["petal length (cm)"]
6363
vec_b = batch["petal width (cm)"]
64-
batch["petal area (cm^2)"] = vec_a * vec_b
64+
batch["petal area (cm^2)"] = np.round(vec_a * vec_b, 2)
6565
return batch
6666

6767
# Apply the transformation to our dataset
@@ -74,18 +74,25 @@ across your cluster for better performance.
7474

7575
.. testoutput::
7676

77-
MaterializedDataset(
78-
num_blocks=...,
79-
num_rows=150,
80-
schema={
81-
sepal length (cm): double,
82-
sepal width (cm): double,
83-
petal length (cm): double,
84-
petal width (cm): double,
85-
target: int64,
86-
petal area (cm^2): double
87-
}
88-
)
77+
shape: (150, 6)
78+
╭───────────────────┬──────────────────┬───────────────────┬──────────────────┬────────┬───────────────────╮
79+
│ sepal length (cm) ┆ sepal width (cm) ┆ petal length (cm) ┆ petal width (cm) ┆ target ┆ petal area (cm^2) │
80+
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
81+
│ double ┆ double ┆ double ┆ double ┆ int64 ┆ double │
82+
╞═══════════════════╪══════════════════╪═══════════════════╪══════════════════╪════════╪═══════════════════╡
83+
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ 0 ┆ 0.28 │
84+
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ 0 ┆ 0.28 │
85+
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ 0 ┆ 0.26 │
86+
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ 0 ┆ 0.3 │
87+
│ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ 0 ┆ 0.28 │
88+
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
89+
│ 6.7 ┆ 3.0 ┆ 5.2 ┆ 2.3 ┆ 2 ┆ 11.96 │
90+
│ 6.3 ┆ 2.5 ┆ 5.0 ┆ 1.9 ┆ 2 ┆ 9.5 │
91+
│ 6.5 ┆ 3.0 ┆ 5.2 ┆ 2.0 ┆ 2 ┆ 10.4 │
92+
│ 6.2 ┆ 3.4 ┆ 5.4 ┆ 2.3 ┆ 2 ┆ 12.42 │
93+
│ 5.9 ┆ 3.0 ┆ 5.1 ┆ 1.8 ┆ 2 ┆ 9.18 │
94+
╰───────────────────┴──────────────────┴───────────────────┴──────────────────┴────────┴───────────────────╯
95+
(Showing 10 of 150 rows)
8996

9097
To explore more transformation capabilities, read :ref:`Transforming data <transforming_data>`.
9198

0 commit comments

Comments
 (0)