Skip to content

jmlane8/SST_AWS_zarr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

(Readme generated with help of ChatGPT 5)

๐ŸŒŠ Exploring Zarr Datasets in AWS with Xarray

In this repository, I followed the Pangeo OSM 2020 Tutorial
to try out Zarr-formatted scientific data stored in AWS S3 using Xarray and s3fs โ€” directly from a Jupyter Notebook,
without downloading large files locally.

The notebook demonstrates how to:

  • Connect to an AWS S3 Zarr dataset
  • Inspect coordinates and metadata
  • Select a latitude/longitude range
  • Visualize a spatial subset using matplotlib
  • Try to save a small subset locally in Zarr format (not successful at this time)

๐Ÿงญ Background

This work is based on the Pangeo Open Science Meeting 2020 tutorial, which introduced cloud-native data access using
the Pangeo ecosystem โ€” xarray, zarr, and s3fs.

Acknowledgment
The original tutorial was created by the Pangeo community for the Open Science Meeting 2020.
This notebook builds on their example to practice reading and exploring AWS-hosted Sea Surface Temperature (SST) data
and to better understand cloud-native geoscience workflows.


โš™๏ธ Environment Setup

1. Clone this repository

git clone https://github.com/<your-username>/<your-repo-name>.git
cd <your-repo-name>

2. Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate   # On Windows: .venv\Scripts\activate

3.Install dependencies

pip install -r requirements.txt

4. Launch Jupyter Lab

jupyter lab

๐Ÿ“’ How to Use

Open the notebook:

aws_zarr_exploration.ipynb

Then run each cell step-by-step to:

  1. Connect to the public AWS S3 bucket that hosts Zarr SST data
  2. Examine dataset structure and coordinates
  3. Subset a region by latitude and longitude
  4. Plot a single time slice with matplotlib

๐Ÿงฉ Notes on Zarr Writing

This notebook focuses on reading and subsetting cloud-hosted Zarr data.
While I explored writing small local Zarr subsets with to_zarr(), the operation can be slow or memory-intensive when working over public S3 connections.
Future updates may include examples using cached or AWS-local environments for efficient writes.


๐Ÿง  Learning Goals

  • Understand how to open and explore Zarr datasets directly from AWS S3
  • Practice subsetting large, chunked datasets efficiently with xarray
  • Learn the basics of cloud-native scientific data workflows

๐Ÿ–‹๏ธ License & Attribution

This example is for educational and research purposes.
Original materials ยฉ Pangeo Project (Open Science Meeting 2020).
Adapted and expanded by Jeanne Lane for exploration of AWS-hosted Zarr datasets
and Earth observation data workflows.

About

Open SST data in zarr format from AWS in Jupyter from OSM 2020 Tutorial

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published