Skip to content

AZX-PBC-OSS/upmu-dataframes

Repository files navigation

upmu-dataframes

IEEE C37.118.2 microPMU data frame toolkit for synchrophasor data conversion, generation, inspection, and compression verification.

Produces exact TCP wire-format IEEE C37.118.2 binary streams from multiple data sources: LBNL microPMU event CSVs, LBNL continuous archive channel files, and a built-in synthetic generator with 23 composable power system scenarios. Parses and inspects binary streams, exports to CSV, and verifies compression integrity with bit-exact or lossy tolerance comparison.

What It Does

  • Convert LBNL microPMU CSV files to IEEE C37.118.2 binary format
  • Batch convert entire event libraries with preserved directory structure
  • Generate synthetic PMU streams with 23 composable power system scenarios (voltage, current, frequency, fault, status, timing, encoding edge cases)
  • Inspect binary files with hex dumps, metadata summaries, and CSV export
  • Verify compression integrity (bit-exact or lossy tolerance comparison)
  • Import LBNL continuous archive data (streaming gzip channel files) with time slicing and chunking
  • Download LBNL archive channel files with progress bars and resume support
  • Corpus generation of diverse test datasets covering the full IEEE C37.118.2 parameter space

Output files are raw TCP wire-format -- byte-for-byte identical to what a PDC would receive from a real microPMU over TCP port 4712:

[CFG-2 frame][Data frame @ t=0][Data frame @ t=8.33ms][Data frame @ t=16.67ms]...

With session framing flags (--header-text, --include-cfg1, --include-cfg3, --include-commands), the full IEEE C37.118.2 session structure is produced:

[HDR][CFG-1][CFG-2][CFG-3][CMD:TurnOn][Data][Data]...[CFG-2 retransmit]...[CMD:TurnOff]

Prerequisites

  • Rust 1.85+ (Edition 2024)
  • Cargo

Building

cargo build --release

The optimized binary is at target/release/upmu-dataframes.

Quick Start

Convert LBNL CSV to C37.118.2 binary

upmu-dataframes convert \
  --input event_data.csv \
  --output event_data.c37

Generate synthetic data

# Random mix of scenarios (deterministic with seed)
upmu-dataframes generate \
  --output synthetic.c37 \
  --duration 60 \
  --scenario random_mix \
  --seed 42

# Specific scenarios
upmu-dataframes generate \
  --output synthetic.c37 \
  --duration 30 \
  --scenario sag,motor_start,freq_event

Inspect a binary file

upmu-dataframes inspect --input event_data.c37
upmu-dataframes inspect --input event_data.c37 --hexdump --max-frames 10
upmu-dataframes inspect --input event_data.c37 --csv exported.csv

Verify compression integrity

# Bit-exact comparison
upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37

# Lossy comparison with tolerances
upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37 \
  --mode lossy \
  --mag-tolerance 0.01 \
  --angle-tolerance 0.001 \
  --compressed compressed.bin \
  --json

Batch convert an event library

upmu-dataframes batch \
  --input-dir /path/to/lbnl-events \
  --output-dir /path/to/output

Import LBNL continuous archive

# Download archive files for a location
upmu-dataframes download-archive \
  --location a6_bus1 \
  --output-dir ./vendor/lbnl_archive

# Import 5 minutes of archive data starting 1 hour in
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output archive.c37 \
  --offset 3600 \
  --duration 300

CLI Reference

convert

Convert a single LBNL microPMU CSV to IEEE C37.118.2 binary.

Flag Default Description
--input (required) LBNL CSV input file
--output (required) Output .c37 binary file
--station-name LBNL-uPMU PMU station identifier (max 16 chars)
--idcode 1 PMU stream ID (1-65534)
--format polar Phasor notation: polar or rect
--encoding float32 Data encoding: float32 or int16
--data-rate 120 Reporting rate in frames per second

batch

Convert all CSV files in an LBNL event library directory tree.

Flag Default Description
--input-dir (required) Root of LBNL event library
--output-dir (required) Output directory for .c37 files
--format polar Phasor notation: polar or rect
--encoding float32 Data encoding: float32 or int16

generate

Generate synthetic IEEE C37.118.2 data with composable power system scenarios.

Flag Default Description
--output (required) Output .c37 binary file
--duration 60 Stream duration in seconds
--rate 120 Reporting rate in frames per second
--encoding float32 Data encoding: float32 or int16
--voltage-ln 7200 Base line-to-neutral voltage (Vrms)
--current 300 Base current (Arms)
--scenario none Comma-separated scenario names or preset (see below)
--seed none RNG seed for deterministic output
--inject-sag off Legacy flag (equivalent to --scenario sag)
--station-name SYNTH-PMU Station identifier
--phasor-count 8 Number of phasors (1-16)
--nominal-freq 60 Nominal grid frequency (50 or 60 Hz)
--notation polar Phasor notation: polar or rect
--phasor-encoding (from --encoding) Phasor data type: float32 or int16
--analog-encoding (from --encoding) Analog data type: float32 or int16
--freq-encoding (from --encoding) Frequency data type: float32 or int16
--analog-count 0 Number of analog channels
--analog-preset none Analog preset: substation
--digital-count 0 Number of digital words
--digital-preset none Digital preset: breaker
--pmu-count 1 Number of PMUs in aggregated stream (1-256)
--time-base 1000000 TIME_BASE for FRACSEC (1-16777215)
--cfg2-interval none Re-emit CFG-2 every N seconds
--config-count 1 CFGCNT field in CFG-2
--header-text none Prepend HDR frame with ASCII station description
--include-cfg1 off Prepend CFG-1 capabilities frame before CFG-2
--include-cfg3 off Include CFG-3 extended config frame after CFG-2
--include-commands off Wrap data with CMD TurnOnData/TurnOffData

Scenario names: sag, swell, cap_switching, motor_start, pv_cloud, freq_event, angle_jump, fault_lg, fault_ll, fault_llg, near_zero_current, int16_boundary, timestamp_rollover, sync_loss, config_change, data_quality, trigger, gps_unlock, leap_second, invalid_measurement, missed_frames, duplicate_frames, timing_jitter

Presets: random_mix (3-6 random scenarios with randomized parameters), all (all scenario types placed sequentially)

inspect

Parse and display IEEE C37.118.2 binary file contents.

Flag Default Description
--input (required) Input .c37 binary file
--hexdump off Print hex + ASCII dump of frames
--max-frames all Limit display to first N frames
--csv none Export parsed data frames as CSV

verify

Compare original vs. decompressed C37.118.2 streams.

Flag Default Description
--original (required) Original .c37 file (pre-compression)
--decompressed (required) Decompressed .c37 file
--compressed none Compressed file (for ratio calculation)
--mode exact Verification mode: exact or lossy
--mag-tolerance 0.01 Magnitude tolerance (lossy mode)
--angle-tolerance 0.001 Angle tolerance in radians (lossy mode)
--freq-tolerance 0.001 Frequency tolerance in Hz (lossy mode)
--json off Output report as JSON

import-archive

Import LBNL continuous archive gzip channel files into IEEE C37.118.2 binary. Streams data without loading into memory.

Flag Default Description
--input-dir (required) Directory containing .gz channel files
--output (required) Output .c37 file path
--location (required) Location: a6_bus1, bank_514, grizzly_bus1_2
--prefix none Custom file prefix (overrides --location)
--station-name LBNL-ARCHIVE Station name in CFG-2
--idcode 1 PMU stream ID (1-65534)
--format polar Phasor notation: polar or rect
--encoding float32 Data encoding: float32 or int16
--data-rate 120 Reporting rate in frames per second
--offset none Skip this many seconds from start
--duration none Process this many seconds
--chunk-duration none Split output into files of this duration

corpus

Generate a diverse test corpus covering the full IEEE C37.118.2 parameter space.

Flag Default Description
--output-dir (required) Output directory for .c37 files and manifest.json
--seed 42 Random seed for deterministic corpus
--preset medium Corpus size: small, medium, large

download-archive

Download LBNL continuous archive channel files from powerdata-download.lbl.gov.

Flag Default Description
--output-dir ./vendor/lbnl_archive Directory to save .gz files
--location (required) Location: a6_bus1, bank_514, grizzly_bus1_2
--channels all Channel set: all, voltage, current
--force off Re-download existing files

Data Format

Each output .c37 file contains:

  • 1+ CFG-2 frames -- configuration metadata (station name, phasor names, data rate, encoding); optionally retransmitted periodically via --cfg2-interval
  • N data frames -- one per sample at the configured reporting rate

Per-Frame Content

Each data frame contains one data block per PMU. A single-PMU frame with 8 float32 polar phasors, 0 analog, 0 digital = 90 bytes:

Field Bytes Description
Common header 14 SYNC + FRAMESIZE + IDCODE + SOC + FRACSEC
Per PMU (repeated for each PMU in multi-PMU streams):
STAT 2 Status word (data error, sync, time quality, trigger reason)
Phasors 8×N (float32) or 4×N (int16) N phasors, polar (mag+angle) or rectangular (real+imag)
FREQ 4 or 2 Frequency deviation from nominal (Hz)
DFREQ 4 or 2 Rate of change of frequency (Hz/s)
Analogs 4×A (float32) or 2×A (int16) A analog measurement channels
Digitals 2×D D digital status words (16 bits each)
CRC 2 CRC-16/IBM-3740 checksum

Frame size varies with configuration. Encoding can be set independently for phasors, analogs, and frequency.

IEEE C37.118.2 Compliance

Protocol:

  • CRC-16/IBM-3740 (poly=0x1021, init=0xFFFF) -- canonical test vector: "123456789" -> 0x29B1
  • Big-endian (network byte order) throughout
  • SYNC words: Data=0xAA02, CFG-2=0xAA32
  • Station names null-padded to 16 bytes per spec (not space-padded)
  • SOC is 32-bit Unix timestamp; FRACSEC scaled by TIME_BASE

Configuration (CFG-2):

  • Variable phasor count (1-16) with per-channel PHUNIT type (voltage/current) and scale
  • Variable analog channel count with per-channel ANUNIT type and scale
  • Variable digital word count with DIGUNIT normal-status/valid-bits masks
  • Multi-PMU config frames (NUM_PMU 1-256, each with independent channel layout)
  • Independent encoding per type: phasors, analogs, and frequency can each be float32 or int16
  • Polar or rectangular phasor notation
  • 50 Hz or 60 Hz nominal frequency (FNOM)
  • Positive and negative DATA_RATE (fps or seconds-per-frame)
  • Configurable TIME_BASE (default 1,000,000; supports any value 1-16,777,215)
  • CONFIG_COUNT (CFGCNT) version counter

Data frames:

  • Full 16-bit StatWord: data error, PMU sync, data sorting, trigger detected, config change pending, data modified, time quality indicator, unlocked time, trigger reason
  • FRACSEC time quality byte: 8 quality levels (Locked through Unreliable) + leap second pending/occurred flags
  • Positive sequence derived via Fortescue transformation
  • Round-trip validated: serialize → parse → compare field-by-field

LBNL CSV Format

Input CSVs follow the Lawrence Berkeley National Lab microPMU event library format:

  • 120 Hz reporting rate (8.333 ms between samples)
  • Columns: timestamp (ns), 3-phase voltage/current (angle + magnitude), power, sag/swell flags
  • 8 phasors derived: VA, VB, VC, V+ (Fortescue), IA, IB, IC, I+
  • Frequency derived from voltage phase-A angle rate of change
  • ROCOF derived from frequency deviation rate of change

See docs/reference/lbnl_pmu_event_library_README.md for the full LBNL format specification.

LBNL Continuous Archive

The LBNL continuous archive (powerdata-download.lbl.gov) provides ~11.6 days of real-world 120 Hz data from 3 distribution locations. Unlike the event library, the archive stores each measurement channel as a separate gzip-compressed file (timestamp_ns,value). 12 channels per location (3-phase voltage + current, magnitude + angle).

Use download-archive to fetch the files, then import-archive to convert them. The import pipeline streams data through gzip decompression without loading into memory, so it can handle the full ~3 GB per channel.

Documentation

Testing

cargo test
cargo clippy --all-targets -- -D warnings

Tests cover CRC computation, frame serialization/deserialization round-trips, CSV parsing, Fortescue transformation, synthetic scenario behavior, and end-to-end CLI workflows.

Project Structure

src/
├── main.rs              # CLI entry point (clap)
├── lib.rs               # Module declarations
├── c37118/              # IEEE C37.118.2 core
│   ├── types/           # Domain types (format, status, phasor, time)
│   ├── common.rs        # CRC-16 + common header serializer
│   ├── config_frame.rs  # CFG-2 frame serializer
│   ├── data_frame.rs    # Data frame serializer
│   ├── scales.rs        # Int16 scaling constants
│   └── parser/          # Binary deserializer
├── cli/                 # Subcommand handlers
├── csv_input/           # LBNL CSV parser + enrichment
├── archive/             # LBNL continuous archive streaming reader
├── synthetic/           # Synthetic waveform generator
├── phasor_math.rs       # Fortescue transform + frequency derivation
└── verify.rs            # Compression verification framework
docs/
└── reference/           # IEEE standards, LBNL docs, reference implementations

License

BSD 3-Clause

About

IEEE C37.118.2 synchrophasor data frame toolkit — generate, parse, inspect, and verify microPMU binary streams

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages