This repository contains the analysis code for a project that develops and evaluates a wildfire ignition danger rating system based on climatic water balance variables. The system is designed to be straightforward, computationally efficient, and applicable across different ecoregions in the conterminous United States (CONUS).
This work expands upon an analysis originally conducted for the Southern Rockies (Thoma et al., 2020) which was extended to the Middle Rockies as part the work for a Masters Thesis (Huysman et al., in prep) and is generalized here for all US L3 ecoregions.
The primary goal is to identify the most effective climatic indicators and temporal scales for predicting wildfire ignition. This allows for the creation of a flexible, forecastable, and projectable fire danger rating system that can be used for both short-term management decisions and long-term conservation planning, such as identifying potential climate-resilient wildfire refugia.
The analysis follows a systematic approach for each Level III ecoregion in the CONUS:
-
Data Ingestion: Historical wildfire ignition data is sourced from the Monitoring Trends in Burn Severity (MTBS) database. Climate and water balance time series (e.g., CWD, VPD, Temperature) are extracted for the centroid of each fire polygon from gridded datasets (gridMET, NPS Gridded Water Balance).
-
Indicator Calculation: Rolling sums (for flux variables like CWD) or means (for state variables like VPD) are calculated over a range of window widths (e.g., 1 to 31 days) preceding each day in the time series.
-
Normalization: To account for local climate variability, the rolling values are converted to a percentile rank. A custom percentile rank function (
my_percent_rank) is used for zero-inflated variables to improve model sensitivity at low-to-moderate levels of dryness. -
Classifier Evaluation: The performance of each climate indicator and rolling window width as a binary classifier of ignition (fire vs. no-fire ignition on that day) is evaluated using Receiver Operating Characteristic (ROC) curves. The ROC curves represent the performance of a binary classifier (ignition or no ignition) by plotting the trade-off between true- and false-positive rates at varying thresholds of classification. The Area Under the Curve (AUC) and partial AUC (pAUC) are used to identify the optimal predictor, prioritizing performance under the driest conditions (high pAUC) because misclassification (false negative) of fire danger under the driest conditions has the potential to be more costly than misclassifications under wetter conditions where fires are likely to be less severe.
Image Source: cmglee, MartinThoma, CC BY-SA 4.0, via Wikimedia Commons
- Danger Rating System: An empirical cumulative distribution function (eCDF) is generated for the best-performing indicator. This function maps a given dryness percentile to the historical proportion of wildfires that ignited at or below that level, creating a tunable, risk-based danger rating.
The "quantile raster" generated by save_quants_lyr.R acts as a rapid, spatial lookup table for local climate normals and can be used to rapidly assess fire ignition danger on a given historical or future date using the eCDF function.
The operational forecast system is packaged as a Docker container, which encapsulates all the necessary scripts, libraries, and dependencies into a single, portable image. This approach ensures a consistent and reproducible environment for running the forecasts, whether on a local machine or in the cloud.
The system is composed of several distinct processes that are run as commands within the container.
-
Main Forecast Generation
- What it does: This is the most computationally intensive step. It runs the
map_forecast_danger.Rscript to process raw climate data, apply the eCDF models, and generate the core forecast outputs: NetCDF data file (fire_danger_forecast.nc) and multi-day forecast map images for desktop and mobile views.
- What it does: This is the most computationally intensive step. It runs the
-
Park-Specific Visualizations
- What it does: Runs the
generate_threshold_plots.Rscript to create detailed fire danger analyses for each National Park Service unit within the ecoregion. For each park, it generates:- Forecast Distribution Plot: A stacked bar chart showing how the percentage of park area in each fire danger category (Normal, Elevated, High, Very High, Extreme) changes across the 7-day forecast period
- Threshold Plots: Three time series charts showing the percentage of park area at or above specific danger thresholds (0.25, 0.50, 0.75)
- These visualizations provide park managers with both intuitive category-based views and precise threshold-based trends for operational decision-making.
- What it does: Runs the
-
Lightning Map Generation
- What it does: This process runs the
hourly_lightning_map.shscript to provide near-real-time situational awareness. It fetches the latest lightning strike data and overlays it on the fire danger data from the main forecast, producing a self-contained, interactive HTML map (lightning_map_{date}.html).
- What it does: This process runs the
-
Frontend Assembly
- What it does: This final step runs the
generate_daily_html.shscript, which intelligently assembles the maindaily_forecast.htmlpage. It uses a template and populates it with the latest available assets, including the forecast maps, park-specific visualizations, and the lightning map. This script contains fallback logic to use older assets if the current day's are not yet available, preventing broken links.
- What it does: This final step runs the
graph TD
subgraph "Docker Container"
A[map_forecast_danger.R] -- Reads --> B[Input Data];
A -- Writes --> C[Forecast NetCDF];
A -- Writes --> D[Forecast Map PNG];
P[generate_threshold_plots.R] -- Reads --> C;
P -- Writes --> Q[Park Distribution Plots];
P -- Writes --> R[Park Threshold Plots];
E[hourly_lightning_map.sh] -- Reads --> C;
E -- Fetches --> F[Weatherbit API];
E -- Writes --> G[Lightning Map HTML];
H[generate_daily_html.sh] -- Reads --> D;
H -- Reads --> Q;
H -- Reads --> R;
H -- Reads --> G;
H -- Writes --> I[Main daily_forecast.html];
end
subgraph "Mounted Volumes"
J[Local ./data folder] -- mounted as --- B;
K[Local ./out folder] -- mounted for --- C;
K -- mounted for --- D;
K -- mounted for --- Q;
K -- mounted for --- R;
K -- mounted for --- G;
K -- mounted for --- I;
end
First, build the Docker image from the root of the repository:
docker build -t wildfire-forecast .To run the different processes, use docker run with volume mounts for the data and out directories. This makes the local data available inside the container and ensures output artifacts are written back to the local filesystem.
Run the complete daily pipeline:
docker run --rm \
-v $(pwd)/data:/app/data \
-v $(pwd)/out:/app/out \
-e ECOREGION=middle_rockies \
wildfire-forecast bash src/daily_forecast.shThis runs the full pipeline: forecast generation → validation → park visualizations → HTML assembly → COG creation.
Or run individual steps:
1. Generate the main forecast:
docker run --rm \
-v $(pwd)/data:/app/data \
-v $(pwd)/out:/app/out \
wildfire-forecast Rscript src/map_forecast_danger.R middle_rockies2. Generate park-specific visualizations:
docker run --rm \
-v $(pwd)/data:/app/data \
-v $(pwd)/out:/app/out \
wildfire-forecast Rscript src/generate_threshold_plots.R middle_rockies3. Assemble the final HTML page:
docker run --rm \
-v $(pwd)/out:/app/out \
wildfire-forecast bash src/generate_daily_html.sh middle_rockiesUpdate the lightning map (separate hourly process):
docker run --rm \
-v $(pwd)/data:/app/data \
-v $(pwd)/out:/app/out \
wildfire-forecast bash src/operational/html_generation/hourly_lightning_map.sh- Wildfire Data: Monitoring Trends in Burn Severity (MTBS)
- Historical Climate Data: gridMET
- Forecast Climate Data: CFSv2 metdata daily forecasts
- Water Balance Data: NPS 1-km Gridded Water Balance Product
- Vegetation Data: LANDFIRE Existing Vegetation Type (EVT)
- Ecoregions: EPA Level III Ecoregions of the Conterminous United States
src/retrospective/03_analysis/dryness_roc_analysis.R: The core retrospective analysis script. It iterates through ecoregions and cover types, calculates rolling climate metrics, performs the ROC/AUC analysis, and saves the best predictors and eCDF models.src/update_rotate_vpd_forecasts.sh: A shell script for the automated daily download of forecast data, featuring retry logic and file rotation.src/save_quants_lyr.R: A script for the one-time pre-computation step that generates the quantile rasters from the long-term historical climate record.src/map_forecast_danger.R: The operational script that combines recent historical data, new forecast data, the pre-computed quantile rasters, and the eCDF models to generate the final daily fire danger maps.data/: Directory for input data sources like shapefiles and pre-processed climate data.out/: Directory for all generated outputs, including plots, AUC results, and final eCDF models.assets/: Directory for static files for README file.
- Prepare the environment by installing the required R packages using renv:
renv::install() - Retrieve the required climate data. The analysis requires local copies of the required gridMET and NPSWB netCDF files.
00_download_gridmet.shcan be used to retrieve CONUS grids for the required gridMET variables (requires approximately 57 GB of disk space). A similar script to download the CONUS grids for the NPS 1 km gridded water balance variables is not currently provided (TODO). - Retrieve the required LANDFIRE Existing Vegetation Type (EVT) layer. This file is too large (8.96 GB) to be stored with Git LFS and must be downloaded separately. EVT 2023 is used in the analysis as the most recent cover data for the entire CONUS available in LANDFIRE: https://landfire.gov/data-downloads/US_240/LF2023_EVC_240_CONUS.zip . Extract this file to
data/LF2023_EVT_240_CONUS. - Prepare the cover type (
01_extract_cover.R) and climate data (01_extract_gridmet.Rand01_extract_npswb.R) for each US L3 ecoregion. - Prepare a list of bad sites based on missing or erroneous data using
02_data_qc.R. - Ensure input data is correctly placed in the
data/directory. The analysis expects pre-processed Parquet files of climate data linked to MTBS fireEvent_IDs. - Execute the main analysis script:
Rscript src/retrospective/03_analysis/dryness_roc_analysis.R - Results, including CSV files of AUC scores and the RDS files containing the eCDF objects for the best predictors for each ecoregion, will be saved in the
out/directory.
This work was supported by funding provided by the National Park Service through an agreement with the Northern Rockies Conservation Cooperative


