Skip to content

Writing a netCDF file is unexpectedly slow #2912

Closed
@msaharia

Description

@msaharia
ncdat=xr.open_mfdataset(nclist, concat_dim='time')

ncdat['lat']=ncdat['lat'].isel(time=0).drop('time')
ncdat['lon']=ncdat['lon'].isel(time=0).drop('time')
ncdat=ncdat.rename({'north_south':'lat', 'east_west':'lon'})

lat_coords=ncdat.lat[:,0] #Extract latitudes
lon_coords=ncdat.lon[0,:] #Extract longitudes

ncdat=ncdat.drop(['lat','lon'])

reformatted_ncdat=ncdat.assign_coords(lat=lat_coords,lon=lon_coords, time=ncdat.coords['time'])

ncdat = reformatted_ncdat.sortby('time')
ncdat.to_netcdf('testing.nc')

Problem description

After some processing, I am left with this xarray dataset ncdat which I want to export to a netCDF file.

<xarray.Dataset>
Dimensions:                 (lat: 59, lon: 75, time: 500)
Coordinates:
  * time                    (time) datetime64[ns] 2007-01-22 ... 2008-06-04
  * lat                     (lat) float32 -4.25 -4.15 ... 1.4500003 1.5500002
  * lon                     (lon) float32 29.049988 29.149994 ... 36.450012
Data variables:
    Streamflow_tavg         (time, lat, lon) float32 dask.array<shape=(500, 59, 75), chunksize=(1, 59, 75)>
    RiverDepth_tavg         (time, lat, lon) float32 dask.array<shape=(500, 59, 75), chunksize=(1, 59, 75)>
    RiverFlowVelocity_tavg  (time, lat, lon) float32 dask.array<shape=(500, 59, 75), chunksize=(1, 59, 75)>
    FloodedFrac_tavg        (time, lat, lon) float32 dask.array<shape=(500, 59, 75), chunksize=(1, 59, 75)>
    SurfElev_tavg           (time, lat, lon) float32 dask.array<shape=(500, 59, 75), chunksize=(1, 59, 75)>
    SWS_tavg                (time, lat, lon) float32 dask.array<shape=(500, 59, 75), chunksize=(1, 59, 75)>
Attributes:
    missing_value:           -9999.0
    NUM_SOIL_LAYERS:         1
    SOIL_LAYER_THICKNESSES:  1.0
    title:                   LIS land surface model output
    institution:             NASA GSFC
    source:                  model_not_specified
    history:                 created on date: 2019-04-19T09:11:12.992
    references:              Kumar_etal_EMS_2006, Peters-Lidard_etal_ISSE_2007
    conventions:             CF-1.6
    comment:                 website: http://lis.gsfc.nasa.gov/
    MAP_PROJECTION:          EQUIDISTANT CYLINDRICAL
    SOUTH_WEST_CORNER_LAT:   -4.25
    SOUTH_WEST_CORNER_LON:   29.05
    DX:                      0.1
    DY:                      0.1

But the problem is it takes an inordinately long time to export. Almost 10 mins for this particular file which is only 35M.

How can I expedite this process? Is there anything wrong with the structure of ncdat?

Expected Output

A netCDF file

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.0.101-0.47.105-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.5.0.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.2.0
distributed: 1.27.0
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.0
pip: 19.0.3
conda: None
pytest: None
IPython: 7.4.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions