Climatic Drivers of Wildfire Ignition Across CONUS Ecoregions

This repository contains the analysis code for a project that develops and evaluates a wildfire ignition danger rating system based on climatic water balance variables. The system is designed to be straightforward, computationally efficient, and applicable across different ecoregions in the conterminous United States (CONUS).

This work expands upon an analysis originally conducted for the Southern Rockies (Thoma et al., 2020) which was extended to the Middle Rockies as part the work for a Masters Thesis (Huysman et al., in prep) and is generalized here for all US L3 ecoregions.

Project Goal

The primary goal is to identify the most effective climatic indicators and temporal scales for predicting wildfire ignition. This allows for the creation of a flexible, forecastable, and projectable fire danger rating system that can be used for both short-term management decisions and long-term conservation planning, such as identifying potential climate-resilient wildfire refugia.

Methodology

The analysis follows a systematic approach for each Level III ecoregion in the CONUS:

Data Ingestion: Historical wildfire ignition data is sourced from the Monitoring Trends in Burn Severity (MTBS) database. Climate and water balance time series (e.g., CWD, VPD, Temperature) are extracted for the centroid of each fire polygon from gridded datasets (gridMET, NPS Gridded Water Balance).
Indicator Calculation: Rolling sums (for flux variables like CWD) or means (for state variables like VPD) are calculated over a range of window widths (e.g., 1 to 31 days) preceding each day in the time series.
Normalization: To account for local climate variability, the rolling values are converted to a percentile rank. A custom percentile rank function (my_percent_rank) is used for zero-inflated variables to improve model sensitivity at low-to-moderate levels of dryness.
Classifier Evaluation: The performance of each climate indicator and rolling window width as a binary classifier of ignition (fire vs. no-fire ignition on that day) is evaluated using Receiver Operating Characteristic (ROC) curves. The ROC curves represent the performance of a binary classifier (ignition or no ignition) by plotting the trade-off between true- and false-positive rates at varying thresholds of classification. The Area Under the Curve (AUC) and partial AUC (pAUC) are used to identify the optimal predictor, prioritizing performance under the driest conditions (high pAUC) because misclassification (false negative) of fire danger under the driest conditions has the potential to be more costly than misclassifications under wetter conditions where fires are likely to be less severe.

Receiver Operating Characteristic (ROC) curve showing true and false positive rates. Performance of a random classifier is shown by the diagonal line which is analogous to predicting ignition using a coin flip. Three example classifiers are shown in blue, green, and orange. The best possible classification performance is represented by the point in the upper left of the plot, which has 100% sensitivity (no false negatives) and 100% specificity (no false positives).

Image Source: cmglee, MartinThoma, CC BY-SA 4.0, via Wikimedia Commons

Danger Rating System: An empirical cumulative distribution function (eCDF) is generated for the best-performing indicator. This function maps a given dryness percentile to the historical proportion of wildfires that ignited at or below that level, creating a tunable, risk-based danger rating.

Example Empirical Cumulative Distribution Function (eCDF) for the forest cover type in the Middle Rockies ecoregion. The curve shows the relationship between the percentile of dryness (based on a 4-day rolling sum of Climatic Water Deficit) and the cumulative proportion of historical wildfires that ignited at or below that dryness level. This function is used to establish a tunable, risk-based danger rating. For example, a manager can identify the dryness percentile that corresponds to a specific proportion of historical fire ignitions (e.g., 10%) and use it as a threshold for management actions.

The "quantile raster" generated by save_quants_lyr.R acts as a rapid, spatial lookup table for local climate normals and can be used to rapidly assess fire ignition danger on a given historical or future date using the eCDF function.

Wildfire ignition danger in Yellowstone National Park, Grand Teton National Park, and John D. Rockefeller, Jr. Memorial Parkway during a one-week period in June 1988, before the height of the Yellowstone fires. Wildfire ignition danger is calculated from 5-day (forest) and 3-day (non-forest) rolling means of daily vapor pressure deficit. Land cover data is retrieved from the LANDFIRE 2023 Existing Vegetation Type layer.

Daily Forecast System Architecture

The operational forecast system is packaged as a Docker container, which encapsulates all the necessary scripts, libraries, and dependencies into a single, portable image. This approach ensures a consistent and reproducible environment for running the forecasts, whether on a local machine or in the cloud.

The system is composed of several distinct processes that are run as commands within the container.

Core Processes

Main Forecast Generation
- What it does: This is the most computationally intensive step. It runs the map_forecast_danger.R script to process raw climate data, apply the eCDF models, and generate the core forecast outputs: NetCDF data file (fire_danger_forecast.nc) and multi-day forecast map images for desktop and mobile views.
Park-Specific Visualizations
- What it does: Runs the generate_threshold_plots.R script to create detailed fire danger analyses for each National Park Service unit within the ecoregion. For each park, it generates:
  - Forecast Distribution Plot: A stacked bar chart showing how the percentage of park area in each fire danger category (Normal, Elevated, High, Very High, Extreme) changes across the 7-day forecast period
  - Threshold Plots: Three time series charts showing the percentage of park area at or above specific danger thresholds (0.25, 0.50, 0.75)
- These visualizations provide park managers with both intuitive category-based views and precise threshold-based trends for operational decision-making.
Lightning Map Generation
- What it does: This process runs the hourly_lightning_map.sh script to provide near-real-time situational awareness. It fetches the latest lightning strike data and overlays it on the fire danger data from the main forecast, producing a self-contained, interactive HTML map (lightning_map_{date}.html).
Frontend Assembly
- What it does: This final step runs the generate_daily_html.sh script, which intelligently assembles the main daily_forecast.html page. It uses a template and populates it with the latest available assets, including the forecast maps, park-specific visualizations, and the lightning map. This script contains fallback logic to use older assets if the current day's are not yet available, preventing broken links.

graph TD
    subgraph "Docker Container"
        A[map_forecast_danger.R] -- Reads --> B[Input Data];
        A -- Writes --> C[Forecast NetCDF];
        A -- Writes --> D[Forecast Map PNG];

        P[generate_threshold_plots.R] -- Reads --> C;
        P -- Writes --> Q[Park Distribution Plots];
        P -- Writes --> R[Park Threshold Plots];

        E[hourly_lightning_map.sh] -- Reads --> C;
        E -- Fetches --> F[Weatherbit API];
        E -- Writes --> G[Lightning Map HTML];

        H[generate_daily_html.sh] -- Reads --> D;
        H -- Reads --> Q;
        H -- Reads --> R;
        H -- Reads --> G;
        H -- Writes --> I[Main daily_forecast.html];
    end

    subgraph "Mounted Volumes"
        J[Local ./data folder] -- mounted as --- B;
        K[Local ./out folder] -- mounted for --- C;
        K -- mounted for --- D;
        K -- mounted for --- Q;
        K -- mounted for --- R;
        K -- mounted for --- G;
        K -- mounted for --- I;
    end

How to Run the Forecast

First, build the Docker image from the root of the repository:

docker build -t wildfire-forecast .

To run the different processes, use docker run with volume mounts for the data and out directories. This makes the local data available inside the container and ensures output artifacts are written back to the local filesystem.

Run the complete daily pipeline:

docker run --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/out:/app/out \
  -e ECOREGION=middle_rockies \
  wildfire-forecast bash src/daily_forecast.sh

This runs the full pipeline: forecast generation → validation → park visualizations → HTML assembly → COG creation.

Or run individual steps:

1. Generate the main forecast:

docker run --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/out:/app/out \
  wildfire-forecast Rscript src/map_forecast_danger.R middle_rockies

2. Generate park-specific visualizations:

docker run --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/out:/app/out \
  wildfire-forecast Rscript src/generate_threshold_plots.R middle_rockies

3. Assemble the final HTML page:

docker run --rm \
  -v $(pwd)/out:/app/out \
  wildfire-forecast bash src/generate_daily_html.sh middle_rockies

Update the lightning map (separate hourly process):

docker run --rm \
  -v $(pwd)/data:/app/data \
  -v $(pwd)/out:/app/out \
  wildfire-forecast bash src/operational/html_generation/hourly_lightning_map.sh

Data Sources

Wildfire Data: Monitoring Trends in Burn Severity (MTBS)
Historical Climate Data: gridMET
Forecast Climate Data: CFSv2 metdata daily forecasts
Water Balance Data: NPS 1-km Gridded Water Balance Product
Vegetation Data: LANDFIRE Existing Vegetation Type (EVT)
Ecoregions: EPA Level III Ecoregions of the Conterminous United States

Code Structure

src/retrospective/03_analysis/dryness_roc_analysis.R: The core retrospective analysis script. It iterates through ecoregions and cover types, calculates rolling climate metrics, performs the ROC/AUC analysis, and saves the best predictors and eCDF models.
src/update_rotate_vpd_forecasts.sh: A shell script for the automated daily download of forecast data, featuring retry logic and file rotation.
src/save_quants_lyr.R: A script for the one-time pre-computation step that generates the quantile rasters from the long-term historical climate record.
src/map_forecast_danger.R: The operational script that combines recent historical data, new forecast data, the pre-computed quantile rasters, and the eCDF models to generate the final daily fire danger maps.
data/: Directory for input data sources like shapefiles and pre-processed climate data.
out/: Directory for all generated outputs, including plots, AUC results, and final eCDF models.
assets/: Directory for static files for README file.

How to Run Retrospective Analysis

Prepare the environment by installing the required R packages using renv: renv::install()
Retrieve the required climate data. The analysis requires local copies of the required gridMET and NPSWB netCDF files. 00_download_gridmet.sh can be used to retrieve CONUS grids for the required gridMET variables (requires approximately 57 GB of disk space). A similar script to download the CONUS grids for the NPS 1 km gridded water balance variables is not currently provided (TODO).
Retrieve the required LANDFIRE Existing Vegetation Type (EVT) layer. This file is too large (8.96 GB) to be stored with Git LFS and must be downloaded separately. EVT 2023 is used in the analysis as the most recent cover data for the entire CONUS available in LANDFIRE: https://landfire.gov/data-downloads/US_240/LF2023_EVC_240_CONUS.zip . Extract this file to data/LF2023_EVT_240_CONUS.
Prepare the cover type (01_extract_cover.R) and climate data (01_extract_gridmet.R and 01_extract_npswb.R) for each US L3 ecoregion.
Prepare a list of bad sites based on missing or erroneous data using 02_data_qc.R.
Ensure input data is correctly placed in the data/ directory. The analysis expects pre-processed Parquet files of climate data linked to MTBS fire Event_IDs.
Execute the main analysis script: Rscript src/retrospective/03_analysis/dryness_roc_analysis.R
Results, including CSV files of AUC scores and the RDS files containing the eCDF objects for the best predictors for each ecoregion, will be saved in the out/ directory.

Acknowledgments

This work was supported by funding provided by the National Park Service through an agreement with the Northern Rockies Conservation Cooperative

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
assets		assets
config		config
data		data
docs		docs
renv		renv
src		src
tests		tests
.Rprofile		.Rprofile
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
renv.lock		renv.lock
update-task-definition.json		update-task-definition.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Climatic Drivers of Wildfire Ignition Across CONUS Ecoregions

Project Goal

Methodology

Daily Forecast System Architecture

Core Processes

How to Run the Forecast

Data Sources

Code Structure

How to Run Retrospective Analysis

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

shuysman/climate-wildfire-ecoregions

Folders and files

Latest commit

History

Repository files navigation

Climatic Drivers of Wildfire Ignition Across CONUS Ecoregions

Project Goal

Methodology

Daily Forecast System Architecture

Core Processes

How to Run the Forecast

Data Sources

Code Structure

How to Run Retrospective Analysis

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages