Skip to content

Incorrect missing value handling when converting geopotential height (gh) to geopotential (z) in GRIB files #397

@zhenkunl

Description

@zhenkunl

What happened?

Description:
I encountered an issue when converting 850hPa geopotential height (gh) to geopotential (z) using the formula z = gh * 9.80665 in NCEP GFS forecast GRIB files. After writing the converted data back to a GRIB file and reading it again, some values appear as 3.4028234663852886e+38 (IEEE 32-bit float max), while they were originally close to 9999 (likely missing values) before the conversion.

The issue persists across:

  • earthkit-data (which depends on eccodes-python)
  • eccodes-python directly
  • eccodes C library

This suggests a problem in the underlying eccodes C library when handling missing values during data transformation and write-back.

Attachments:
I've attached a sample GRIB file (gh850.grib) containing the 850hPa geopotential height field from NCEP GFS that reproduces the issue.

Additional notes:
The issue appears specifically when:

  • The original GRIB file uses a missing value marker (e.g., 9999)
  • After numerical transformation, the missing value handling seems to break
  • The output file contains IEEE max float values instead of correct values which close to 9999

I suspect this might be related to how eccodes handles missing value indicators during data modification. After further investigation, I tested earlier versions of the eccodes C library and found that version 2.26 processes the conversion correctly without generating the anomalous 3.4028234663852886e+38 values. This suggests that a regression was introduced in later versions of eccodes. Any insight or fix would be appreciated.

What are the steps to reproduce the bug?

Steps to reproduce:

  1. Read a GRIB file containing 850hPa geopotential height (gh) from NCEP GFS.
  2. Convert gh to z by multiplying by 9.80665.
  3. Write the modified field to a new GRIB file.
  4. Read the new file and observe that values originally near 9999 become 3.4028234663852886e+38.

Minimal reproducible example:
Using earthkit-data

import earthkit.data as ekd
# Read input GRIB file
gh = ekd.from_source("file", "gh850.grib")
f = gh[0]
# Convert gh to z
values = f.to_numpy() * 9.80665
f_new = f.clone(values=values, metadata=f.metadata().override(shortName="z"))
# Write back to new GRIB file
f_new.to_target("file", "z850.grib")

# Read back and check values
z = ekd.from_source("file", "z850.grib")
f = z[0]
md = f.metadata()
print(md["missingValue"], md["level"], f.values.max(), f.values.min())
# 9999 850 3.4028234663852886e+38 9277.9560546875

Version

v2.44.0

Platform (OS and architecture)

ubuntu 22.04/Mac OS 15.7.3

Relevant log output

In [1]: import earthkit.data as ekd

In [2]: import numpy as np

In [3]: z = ekd.from_source("file", "z850.grib")

In [4]: f = z[0]

In [5]: values = f.to_numpy()

In [6]: indices = np.where(values > 10000000.0)

In [7]: indices
Out[7]:
(array([140, 602, 604, 617, 618, 619, 638, 647, 668]),
 array([1245,  544,  484,  491, 1399,   12,   22, 1398,  802]))

In [8]: gh = ekd.from_source("file", "gh850.grib")

In [9]: f = gh[0]

In [10]: values_orig = f.to_numpy()

In [11]: values_new = f.to_numpy() * 9.80665

In [12]: values_orig[indices]
Out[12]:
array([1019.62425, 1019.62425, 1019.62425, 1019.62425, 1019.62425,
       1019.62425, 1019.60825, 1019.60825, 1019.62425])

In [13]: values_new[indices]
Out[13]:
array([9999.09815126, 9999.09815126, 9999.09815126, 9999.09815126,
       9999.09815126, 9999.09815126, 9998.94124486, 9998.94124486,
       9999.09815126])

Accompanying data

The sample GRIB file is gh850.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions