Skip to content

Redesign of the load_earth_relief() function #489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
seisman opened this issue Jun 23, 2020 · 5 comments · Fixed by #542
Closed

Redesign of the load_earth_relief() function #489

seisman opened this issue Jun 23, 2020 · 5 comments · Fixed by #542
Labels
bug Something isn't working enhancement Improving an existing feature help wanted Helping hands are appreciated

Comments

@seisman
Copy link
Member

seisman commented Jun 23, 2020

Description of the problem

PyGMT provides the load_earth_relief() function to load GMT's earth_relief data as a xarray.DataArray object, so that users can plot it or directly manipulate the data. The function won't work in GMT 6.1.0.

Currently, the load_earth_relief() function first calls the which() function to get the full path of the data (which will download the data if it's not available locally), and then read the netCDF file in by grid = xr.open_dataarray(fname). It works well with GMT 6.0.0 because each resolution is stored as a single grid.

In GMT 6.1.0, we will have many problems:

  1. earth relief data with resolutions >= 06m are stored as a single grid, e.g., earth_relief_06m_p.grd. The function still works
  2. earth relief data with resolutions <= 05m are stored as several smaller tiles. If you want to plot an earth relief map in a 3°x3° region using the 03s data, you don't have to download the ~6.8 GB large file anymore, i.e., the command
    gmt grdimage @earth_relief_03s -R120/123/30/33 -pdf map will only need to download 8 tiles (~3 MB) instead of the old 6.8 GB single grid. However, gmt which @earth_relief_03s -Ga (-Ga means automatically putting the data in the appropriate directory, which is new in GMT 6.1.) will download all tiles (hundreds or thousands of tiles, ~6.8 GB in total). That's definitely what we want to avoid.

Solutions

AFAIK, there are no easy solutions.

  1. We can do gmt grdinfo @earth_relief_03s -R120/123/30/33 to download the tiles (ignore the output from grdinfo) and use gmt which to return the full paths that exist locally (it will return all existing tiles, including tiles outside our target region). However, since the data are stored as tiles, we have to read several files and merge them. It's more complicated for 03s and 01s data. The original 03s and 01s data don't have data in oceans and GMT fills the ocean with the 15s data. Thus we have to resample the data and merge them.

  2. GMT provides a C API function (perhaps also new in GMT 6.1.0) GMT_Get_FilePath() to return the full path of a file. As per the documentation, it can also download remote files (e.g., earth relief data) if it's not available locally. However, (1) the API is not tested by us yet; (2) it's not clear if it can accept a region argument to download tiles in a specified region; (3) even though the tiles in the specified region are correctly downloaded with full paths returned, we still have to manually resample and merge them.

We need a better and working solution here, or we have to seek support from the upstream GMT, and ask them to provide a new API function which returns the final grid (resampled and merged).

@seisman
Copy link
Member Author

seisman commented Jun 23, 2020

Ping @PaulWessel @joa-quim @leouieda @weiji14 for thoughts.

@PaulWessel
Copy link
Member

First, just some minor corrections. 6.1 does not affect how 01s and 03s are handled which were always tiled and only tiles in the region were downloaded. Presumably, you never called load_earth_relief () on those SRTM datasets. However, it is true for 15s through 05m that these are now also tiled.
Second, could you make load_earth_relief (which I assume has an implicit region of -180/80/-90/90) call gmt grdcut -Rd on the resolution remtoe data you want, and then it will assemble the global grid for you, including the resampling of 15s if 03s or 01s is used. First time it will download all tiles (if tiled), the next time it is local. I don't know how pyGMT handles grid internally so you could either have grdcut write to a netCDF file and then you read that as you did before, else you set up a virtual grid and that is what grdcut should return directly.

@seisman
Copy link
Member Author

seisman commented Jun 23, 2020

Presumably, you never called load_earth_relief () on those SRTM datasets.

Yes, 03s and 01s are never implemented or supported in PyGMT.

Second, could you make load_earth_relief (which I assume has an implicit region of -180/80/-90/90) call gmt grdcut -Rd on the resolution remtoe data you want, and then it will assemble the global grid for you, including the resampling of 15s if 03s or 01s is used. First time it will download all tiles (if tiled), the next time it is local. I don't know how pyGMT handles grid internally so you could either have grdcut write to a netCDF file and then you read that as you did before, else you set up a virtual grid and that is what grdcut should return directly.

It sounds a great solution!

@seisman seisman added bug Something isn't working enhancement Improving an existing feature help wanted Helping hands are appreciated labels Jun 23, 2020
@seisman seisman mentioned this issue Jun 24, 2020
9 tasks
@lhoupert
Copy link
Contributor

lhoupert commented Jul 9, 2020

Hi guys,
Just a comment from a new user.
It works fine for me when I run

topo=pygmt.datasets.load_earth_relief(resolution='06m')

However for a resolution of '05m', pygmt cannot find the files. I got this error message

gmtwhich [NOTICE]: Remote data courtesy of GMT data server OCEANIA [https://oceania.generic-mapping-tools.org]
gmtwhich [NOTICE]: Earth Relief at 5x5 arc minutes from Gaussian Cartesian filtering (9 km fullwidth) of SRTM15+V2.1 [Tozer et al., 2019].
gmtwhich [NOTICE]:   -> Download 180x180 degree grid tile (earth_relief_05m_p): S90W180
gmtwhich [NOTICE]:   -> Download 180x180 degree grid tile (earth_relief_05m_p): S90E000
gmtwhich [ERROR]: Tile @S90W180.earth_relief_05m_p.nc not found!
gmtwhich [ERROR]: Tile @S90E000.earth_relief_05m_p.nc not found!
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-46-49018afdaf86> in <module>
----> 1 topo=pygmt.datasets.load_earth_relief(resolution='05m')

~/anaconda2/envs/analysis_eel_data/lib/python3.8/site-packages/pygmt/datasets/earth_relief.py in load_earth_relief(resolution)
     37     """
     38     _is_valid_resolution(resolution)
---> 39     fname = which("@earth_relief_{}".format(resolution), download="u")
     40     grid = xr.open_dataarray(fname)
     41     # Add some metadata to the grid

~/anaconda2/envs/analysis_eel_data/lib/python3.8/site-packages/pygmt/helpers/decorators.py in new_module(*args, **kwargs)
    192                 if alias in kwargs:
    193                     kwargs[arg] = kwargs.pop(alias)
--> 194             return module_func(*args, **kwargs)
    195 
    196         new_module.aliases = aliases

~/anaconda2/envs/analysis_eel_data/lib/python3.8/site-packages/pygmt/modules.py in which(fname, **kwargs)
    144         path = tmpfile.read().strip()
    145     if not path:
--> 146         raise FileNotFoundError("File '{}' not found.".format(fname))
    147     return path
    148 

FileNotFoundError: File '@earth_relief_05m' not found.

I am wondering if it is because pygmt is looking for files named @*********** while my local files don't have the same exact name (they dont start with a @. See what a which command return for me:

pygmt.which('S90W180.earth_relief_05m_p.nc')
'/Users/locupe/.gmt/S90W180.earth_relief_05m_p.nc'

I will just use the 6m files right now, but I thought I would flag this up

@weiji14
Copy link
Member

weiji14 commented Jul 9, 2020

Thanks for the detailed writeup! As @seisman mentioned above, the problem with the "earth_relief_05m" grid (and higher resolution ones) is that it's actually tiled (specifically into two tiles, S90W180 and S90E000 as you've discovered), but we're trying to read it using xr.open_dataarray that only expects one NetCDF file. We'll definitely have to fix this in the redesign.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Improving an existing feature help wanted Helping hands are appreciated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants