Skip to content

region-id inconsistent between depressions.shp and regions.shp #33

@tlohde

Description

@tlohde

Firstly, thank you for providing an excellent package/resource.

  • lidar version: 0.7.1
  • Python version: 3.11.0
  • Operating System: Windows 10

Description

Describe what you were trying to get done.

  • short-term goal, and where the problem is happening: aligning depression regions, with individual nested depressions

Tell us what happened, what went wrong, and what you expected to happen.

  • the region-id values in regions.shp/regions_info.csv that is generated using ExtractSinks() differ from those in the depressions.shp/depressions_info.csv generated using DelineateDepressions().
  • I would expect the region-id to be consistent between these.

What I Did

outdir='tmp/'
min_size = 50           # minimum number of pixels as a depression
min_depth = 30           # minimum depth as a depression
interval = 10         # slicing interval for the level-set method
bool_shp = False        # output shapefiles for each individual level

sink_path = ExtractSinks('sample.tif', min_size, outdir)

dep_id_path, dep_level_path = DelineateDepressions(sink_path, min_size, min_depth, interval, outdir, bool_shp)

# read in output and combine info csv with geometry from shapefile
# depressions
depressions = gpd.read_file('tmp/depressions.shp')
depressions = depressions.dissolve('id').reset_index() # makes for a tidier merge with depressions_info.csv (creates multipolygons)
dep_info = pd.read_csv('tmp/depressions_info.csv')
gdf = depressions.merge(dep_info, on='id')

# regions
regions = gpd.read_file('tmp/regions.shp').sort_values(by='id')
reg_info = pd.read_csv('tmp/regions_info.csv')
regions = regions.merge(reg_info, left_on='id', right_on='region-id').sort_values(by='id').reset_index(drop=True)

at this stage I would expect the region-id in regions to match that in gdf...however, they do not. Below is an illustration, which I think also provides a way to handle the discrepency (spatial join)

# read in raster (for plotting)
region_raster = rio.open_rasterio('tmp/region.tif')

# to enable comparison of region-ids intersection of depressions and regions shapefiles
overlaid = (gdf
            .dissolve('region-id')   # aggregate by region-id
            .reset_index() # to ensure result gdf has both region-id_1 and region-id_2 (where _2 is the value that correctly corresponds with those in regions.shp)
            .overlay(regions, keep_geom_type=False)
)

###### plotting

# random region number
R = 310

fig, axs = plt.subplots(figsize=[15,6],ncols=5)

# plot raster 
(region_raster==R).plot(ax=axs[0], add_colorbar=False)

# plot from regions shapefile
regions.loc[regions['region-id']==R].plot(ax=axs[1])

# plot from depreesion shapefile
gdf.loc[gdf['region-id']==R].plot(column='level', ax=axs[2])

# plot from intersection of depressions and regions
overlaid.loc[overlaid['region-id_1']==R].plot(ax=axs[3])

# plot from intersection of depressions and regions
overlaid.loc[overlaid['region-id_2']==R].plot(ax=axs[4])

# tidy up axes limits, and labels etc...
axs[0].set_xlim(axs[1].get_xlim())
axs[0].set_ylim(axs[1].get_ylim())
axs[0].set_aspect('equal')

axs[0].set_title(f'region:{R}\nfrom region.tif')
axs[1].set_title(f'region:{R}\nfrom regions.shp')
axs[2].set_title(f'region:{R}\nfrom depressions.shp')
axs[3].set_title(f'region:{overlaid.loc[overlaid["region-id_1"]==R,"region-id_2"].values[0]}\nfrom regions.shp')
axs[4].set_title(f'region:{overlaid.loc[overlaid["region-id_2"]==R,"region-id_1"].values[0]}\nfrom depressions.shp')

plt.subplots_adjust(wspace=0.35)

print(f"region: {R} in the depressions.shp file corresponds to region: {overlaid.loc[overlaid['region-id_1']==R,'region-id_2'].values[0]} in regions.shp")
print(f"region: {R} in the regions.shp file corresponds to region: {overlaid.loc[overlaid['region-id_2']==R,'region-id_1'].values[0]} in depressions.shp")

region: 310 in the depressions.shp file corresponds to region: 394 in regions.shp
region: 310 in the regions.shp file corresponds to region: 245 in depressions.shp

image

the question

So, i think the question is... is the discrepency between the two region_ids expected/normal?
If yes, is the use of .overlay() the best way to handle it and reconcile regions.shp/regions_info.csv and depressions.shp/depressions_info.csv, or is there an even more straightforward way?
If no, have I done something wrong?

thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions