| license | cc0-1.0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| task_categories |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| tags |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| size_categories |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| dataset_info |
|
48,592 disaster events pulled from four US government databases into one flat JSON file. Plane crashes, shipwrecks, tornadoes, earthquakes -- all geocoded and categorized.
| Category | Records | Source | Date Range |
|---|---|---|---|
| Aviation Accidents | 26,427 | NTSB | 1974-2018 |
| Severe Storms | 14,770 | NOAA Storm Events | 1950-2025 |
| Earthquakes | 3,742 | USGS | 2020-2025 |
| Shipwrecks | 3,653 | NOAA AWOIS | Historical (1600s-1970s) |
Every record has core fields (category, latitude, longitude, name, date, subcategory). Additional fields depend on the category:
| Field | Coverage | Which Categories |
|---|---|---|
category |
100% | All |
latitude / longitude |
100% | All |
name |
100% | All -- event description or location |
subcategory |
100% | Tornado, Flash Flood, seismic, maritime, aviation, etc. |
date |
94% | All except some historical shipwrecks |
aircraft_type |
59% | Aviation only |
event_id |
59% | Aviation only (NTSB event IDs) |
magnitude |
20% | Storms (Fujita/EF scale) + Earthquakes (Richter) |
fatalities |
27% | Storms |
injuries |
27% | Storms |
damage |
26% | Storms (text format: "250K", "1.5M") |
state |
27% | Storms |
vessel_type |
<1% | Shipwrecks (sparse) |
Storm:
{
"category": "storm",
"latitude": 34.88,
"longitude": -99.28,
"name": "Tornado in OKLAHOMA, KIOWA",
"date": "1950-04-28",
"subcategory": "Tornado",
"magnitude": "0",
"fatalities": "1",
"injuries": "1",
"damage": "250K",
"state": "OKLAHOMA"
}Aviation:
{
"category": "aviation_accident",
"latitude": 20.000833,
"longitude": -155.6675,
"name": "Aviation Accident - SCHLEICHER ASH25M",
"date": "2012-01-01",
"subcategory": "aviation",
"aircraft_type": "SCHLEICHER ASH25M",
"event_id": "20121010X84549"
}A few things worth knowing if you're working with this data:
- Aviation dates are year-only. Aviation records show
YYYY-01-01. The actual dates are embedded in the event IDs (e.g.,20121010X84549= Oct 10, 2012) but the date field just has the year. - Earthquake dates are ISO format (
YYYY-MM-DD), converted from Unix timestamps. - Aviation records are deduplicated on
event_id(5,983 duplicates removed from source overlaps). - Coordinates extend beyond CONUS. Some records are in Hawaii, Alaska, territories, or international waters. Expected for aviation and maritime data.
depth_kmis always null. The field exists in the schema but was never populated.
import json
with open("disasters_mashup.json") as f:
disasters = json.load(f)
# Filter by category
storms = [d for d in disasters if d["category"] == "storm"]
aviation = [d for d in disasters if d["category"] == "aviation_accident"]
# Deduplicate aviation (optional)
seen = set()
unique_aviation = []
for d in aviation:
if d.get("event_id") not in seen:
seen.add(d.get("event_id"))
unique_aviation.append(d)All public domain, from US government agencies:
- NTSB Aviation Safety Data
- NOAA AWOIS Wrecks & Obstructions
- NOAA Storm Events Database
- USGS Earthquake Hazards
- GitHub: lukeslp/us-disasters-mashup
- HuggingFace: lukeslp/us-disasters-mashup
- Kaggle: lucassteuber/us-disasters-mashup
- Demo Notebook: Jupyter on GitHub Gist
CC0 1.0 (Public Domain). All source data comes from US government agencies.
Luke Steuber
- Website: lukesteuber.com
- Bluesky: @lukesteuber.com
@dataset{steuber2026disasters,
title={US Disasters Mashup},
author={Steuber, Luke},
year={2026},
publisher={GitHub/HuggingFace/Kaggle},
url={https://github.com/lukeslp/us-disasters-mashup}
}{
"@context": "https://schema.org",
"@type": "Dataset",
"name": "US Disasters Mashup",
"description": "54,575 disaster events from four US government databases (NTSB aviation accidents, NOAA shipwrecks, NOAA severe storms, USGS earthquakes) unified into a single geocoded JSON file.",
"url": "https://github.com/lukeslp/us-disasters-mashup",
"sameAs": [
"https://huggingface.co/datasets/lukeslp/us-disasters-mashup",
"https://www.kaggle.com/datasets/lucassteuber/us-disasters-mashup"
],
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"creator": {
"@type": "Person",
"name": "Luke Steuber",
"url": "https://lukesteuber.com"
},
"keywords": ["disasters", "aviation accidents", "shipwrecks", "storms", "earthquakes", "geospatial", "united states"],
"temporalCoverage": "1600/2025",
"spatialCoverage": {
"@type": "Place",
"name": "United States"
},
"distribution": [
{
"@type": "DataDownload",
"encodingFormat": "application/json",
"contentUrl": "https://github.com/lukeslp/us-disasters-mashup"
}
]
}