(Readme generated with help of ChatGPT 5)
In this repository, I followed the Pangeo OSM 2020 Tutorial
to try out Zarr-formatted scientific data stored in AWS S3 using Xarray and s3fs โ directly from a Jupyter Notebook,
without downloading large files locally.
The notebook demonstrates how to:
- Connect to an AWS S3 Zarr dataset
- Inspect coordinates and metadata
- Select a latitude/longitude range
- Visualize a spatial subset using
matplotlib - Try to save a small subset locally in Zarr format (not successful at this time)
This work is based on the Pangeo Open Science Meeting 2020 tutorial, which introduced cloud-native data access using
the Pangeo ecosystem โ xarray, zarr, and s3fs.
Acknowledgment
The original tutorial was created by the Pangeo community for the Open Science Meeting 2020.
This notebook builds on their example to practice reading and exploring AWS-hosted Sea Surface Temperature (SST) data
and to better understand cloud-native geoscience workflows.
git clone https://github.com/<your-username>/<your-repo-name>.git
cd <your-repo-name>python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtjupyter labOpen the notebook:
aws_zarr_exploration.ipynbThen run each cell step-by-step to:
- Connect to the public AWS S3 bucket that hosts Zarr SST data
- Examine dataset structure and coordinates
- Subset a region by latitude and longitude
- Plot a single time slice with
matplotlib
This notebook focuses on reading and subsetting cloud-hosted Zarr data.
While I explored writing small local Zarr subsets with to_zarr(), the operation can be slow or memory-intensive when working over public S3 connections.
Future updates may include examples using cached or AWS-local environments for efficient writes.
- Understand how to open and explore Zarr datasets directly from AWS S3
- Practice subsetting large, chunked datasets efficiently with
xarray - Learn the basics of cloud-native scientific data workflows
This example is for educational and research purposes.
Original materials ยฉ Pangeo Project (Open Science Meeting 2020).
Adapted and expanded by Jeanne Lane for exploration of AWS-hosted Zarr datasets
and Earth observation data workflows.