Description
I was reading this DoorDash blog and came across the term isochrone.
In our context, an isochrone is a polygon of the area a driver can reach in a given amount of time.
Currently, we're querying for points in near k-rings at fine resolutions (which is kind of like a concentric circle approach; points in smaller concentric circles are ranked higher). These concentric circles are like isochores ("same space") since they consider distance, not duration.
What if 💭
What if we could avoid the Distance Matrix call to Google Maps (127ms) or OSRM (680ms) entirely?
What if we pre-computed and cached the distance matrix such that we could say what the duration was between any two hex cells?
Importance of indexing
The set of latitude/longitude points (real numbers) is uncountably infinite. We can use geospatial indexing to transform this space into something finite and countable. That's where something like H3 comes in. Our isochrones will be stored as a list of cell indexes, not as a polygon. The cache key is the geohash of the origin.
Williamsburg, Brooklyn consists of ~49 hex cells at Resolution 9.
Brooklyn consists of ~2500 of those cells.
Algorithm for Building the Matrix
For each k=1 k-ring at Resolution 7...
Get the pairwise Distance Matrix for all (343) descendant resolution 9 cells within the k-ring...
This will result 117,000+ pairwise elements.
Finally, persist these durations in a datastore (e.g., Postgres). Also persist hex elements in appropriate isochrone based on time (1m, 2m, 3m, 4m, 5m, etc)... (e.g., "the sub-3-minute isochrone from Cell A (res 9) is a list of all of these other res-9 cells").
Query time
Take the example of querying nearby trips for a driver. In this case, the driver is the origin.
Instead of going to Google Maps to get a distance matrix (which costs money and time, and isn't something we can legally cache), we can check if we've cached an isochrone for the driver's location. If we have one, we can rank based on the isochrone time (e.g., 5m, 10m, etc). What we need for this to work is the point-in-polygon function.
Accounting for traffic
If we want a final sorting/ranking pass that factors in traffic, that's also possible.
Running an OSRM server in Docker
The demo OSRM endpoint's terms of usage limits you to 1 request per second. To actually do this preprocessing, it would be ideal to have a high-resource machine with lots of memory and compute power.
It takes significantly more RAM to process the data than it does to host it. As long as you keep the operating systems/machine architectures the same, you can rent a huge box for a short period to generate the routing files, then move them to a smaller box to actually perform route calculations on.
We can run the osrm-backend container.
First, download NY OSM data for Geofabrik. (This doesn't include any parts of NJ across the river, or any parts of CT).
wget http://download.geofabrik.de/north-america/us/new-york-latest.osm.pbf
Pre-process the extract with the car profile and start a routing engine HTTP server on port 5000
docker run --platform linux/amd64 \
-t \
-v "${PWD}:/data" \
osrm/osrm-backend osrm-extract \
-p /opt/car.lua \
/data/new-york-latest.osm.pbf
I needed the --platform
flag on an M1 Mac.