🌊 Exploring Zarr Datasets in AWS with Xarray

(Readme generated with help of ChatGPT 5)

🌊 Exploring Zarr Datasets in AWS with Xarray

In this repository, I followed the Pangeo OSM 2020 Tutorial
to try out Zarr-formatted scientific data stored in AWS S3 using Xarray and s3fs — directly from a Jupyter Notebook,
without downloading large files locally.

The notebook demonstrates how to:

Connect to an AWS S3 Zarr dataset
Inspect coordinates and metadata
Select a latitude/longitude range
Visualize a spatial subset using matplotlib
Try to save a small subset locally in Zarr format (not successful at this time)

🧭 Background

This work is based on the Pangeo Open Science Meeting 2020 tutorial, which introduced cloud-native data access using
the Pangeo ecosystem — xarray, zarr, and s3fs.

Acknowledgment
The original tutorial was created by the Pangeo community for the Open Science Meeting 2020.
This notebook builds on their example to practice reading and exploring AWS-hosted Sea Surface Temperature (SST) data
and to better understand cloud-native geoscience workflows.

⚙️ Environment Setup

1. Clone this repository

git clone https://github.com/<your-username>/<your-repo-name>.git
cd <your-repo-name>

2. Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate

3.Install dependencies

pip install -r requirements.txt

4. Launch Jupyter Lab

jupyter lab

📒 How to Use

Open the notebook:

aws_zarr_exploration.ipynb

Then run each cell step-by-step to:

Connect to the public AWS S3 bucket that hosts Zarr SST data
Examine dataset structure and coordinates
Subset a region by latitude and longitude
Plot a single time slice with matplotlib

🧩 Notes on Zarr Writing

This notebook focuses on reading and subsetting cloud-hosted Zarr data.
While I explored writing small local Zarr subsets with to_zarr(), the operation can be slow or memory-intensive when working over public S3 connections.
Future updates may include examples using cached or AWS-local environments for efficient writes.

🧠 Learning Goals

Understand how to open and explore Zarr datasets directly from AWS S3
Practice subsetting large, chunked datasets efficiently with xarray
Learn the basics of cloud-native scientific data workflows

🖋️ License & Attribution

This example is for educational and research purposes.
Original materials © Pangeo Project (Open Science Meeting 2020).
Adapted and expanded by Jeanne Lane for exploration of AWS-hosted Zarr datasets
and Earth observation data workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
aws_zarr_exploration.ipynb		aws_zarr_exploration.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌊 Exploring Zarr Datasets in AWS with Xarray

🧭 Background

⚙️ Environment Setup

1. Clone this repository

2. Create and activate a virtual environment

3.Install dependencies

4. Launch Jupyter Lab

📒 How to Use

🧩 Notes on Zarr Writing

🧠 Learning Goals

🖋️ License & Attribution

About

Uh oh!

Releases

Packages

Languages

jmlane8/SST_AWS_zarr

Folders and files

Latest commit

History

Repository files navigation

🌊 Exploring Zarr Datasets in AWS with Xarray

🧭 Background

⚙️ Environment Setup

1. Clone this repository

2. Create and activate a virtual environment

3.Install dependencies

4. Launch Jupyter Lab

📒 How to Use

🧩 Notes on Zarr Writing

🧠 Learning Goals

🖋️ License & Attribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages