Skip to content

Save 'S1' array without the char_dim_name dimension #3407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zxdawn opened this issue Oct 16, 2019 · 2 comments
Closed

Save 'S1' array without the char_dim_name dimension #3407

zxdawn opened this issue Oct 16, 2019 · 2 comments

Comments

@zxdawn
Copy link

zxdawn commented Oct 16, 2019

MCVE Code Sample

import numpy as np
import xarray as xr
tstr='2019-07-25_00:00:00'
Times = xr.DataArray(np.array([" ".join(tstr).split()], dtype = 'S1'), dims = ['Time', 'DateStrLen'])
ds = xr.Dataset({'Times':Times})
ds.to_netcdf('test.nc', format='NETCDF4',encoding={'Times': {'zlib':True, 'complevel':5}}, unlimited_dims={'Time':True})

Expected Output

Because I want to use the nc file as the input of WRF model,
I just need Time and DateStrLen two dimensions.

ncdump -h test.nc:

netcdf test {
dimensions:
        Time = UNLIMITED ; // (1 currently)
        DateStrLen = 19 ;
variables:
        char Times(Time, DateStrLen) ;
}

Although it's possible to set the exact char_dim_name to Time like #2895,
but I need the unlimited Time dimension as the first one.

Problem Description

This is the actual output of ncdump -h test.nc:

netcdf test {
dimensions:
        Time = UNLIMITED ; // (1 currently)
        DateStrLen = 19 ;
        string1 = 1 ;
variables:
        char Times(Time, DateStrLen, string1) ;
}

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.0.76-0.11-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.1 libnetcdf: 4.4.1.1

xarray: 0.13.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: 1.3.1
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: 2.3.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.5.0
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: 7.8.0
sphinx: None

@DocOtak
Copy link
Contributor

DocOtak commented Oct 16, 2019

Hi @zxdawn

Does this modified version of your code do what you want?:

import numpy as np
import xarray as xr
tstr='2019-07-25_00:00:00'
Times = xr.DataArray(np.array([tstr], dtype = np.dtype(('S', 16))), dims = ['Time'])
ds = xr.Dataset({'Times':Times})
ds.to_netcdf(
   'test.nc', 
   format='NETCDF4',
   encoding={
      'Times': {
         'zlib':True, 
         'complevel':5,
         'char_dim_name':'DateStrLen'
      }
   },
   unlimited_dims={'Time':True}
)

Output of ncdump:

netcdf test {
dimensions:
	Time = UNLIMITED ; // (1 currently)
	DateStrLen = 19 ;
variables:
	char Times(Time, DateStrLen) ;
data:

 Times =
  "2019-07-25_00:00:00" ;
}

Some explanation of what is going on:
Strings in numpy aren't the most friendly thing to work with, and the data types can be a little confusing. In your code, the "S1" data type is saying "this array has null terminated strings of length 1". That 1 in the "S1" is the string length. This resulted in you having an array of one character strings that was 19 elements long:

array([[b'2', b'0', b'1', b'9', b'-', b'0', b'7', b'-', b'2', b'5', b'_',
        b'0', b'0', b':', b'0', b'0', b':', b'0', b'0']], dtype='|S1')

vs what I think you want:

array([b'2019-07-25_00:00:00'], dtype='|S19')

Since you know that your string length is going to be 19, you should tell numpy about this instead of xarray by either specifying the data type as "S19" or using the data type constructor (which I prefer): np.dtype(("S", 19))

@zxdawn
Copy link
Author

zxdawn commented Oct 16, 2019

@DocOtak Thank you for your explanation! It works well now :)

@zxdawn zxdawn closed this as completed Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants