|
| 1 | +# Efficient API Design: Avoiding Costly External Requests |
| 2 | + |
| 3 | +On a recent project, I built an API that leverages OpenStreetMap data, currently |
| 4 | +focusing on the United States and Canada. The API helps users locate |
| 5 | +neighborhoods they might want to live in. This journey involved several |
| 6 | +iterations of building a web application in read-only mode—a self-imposed |
| 7 | +constraint to avoid relying on external services. Everything operates within a |
| 8 | +contained environment, such as a container or VM. |
| 9 | + |
| 10 | +This constraint led me to optimize an SQLite database containing OpenStreetMap |
| 11 | +data. While I’ve written about |
| 12 | +[compressing SQLite data for read-only databases](/posts/2024-07-02-optimizing-large-scale-openstreetmap-data-with-sqlite) |
| 13 | +in another blog post (link placeholder), the main challenge was enabling users |
| 14 | +to query and correlate geospatial data efficiently. |
| 15 | + |
| 16 | +Initially, my front-end implementation included an endpoint that allowed users |
| 17 | +to query OpenStreetMap data in a manner similar to other OpenStreetMap systems. |
| 18 | +The queries focused on nodes, ways, or relations and their associated tags. |
| 19 | + |
| 20 | +For example, to find all the Starbucks locations in Colorado, you could use the |
| 21 | +following endpoint and query parameter: |
| 22 | + |
| 23 | +**Endpoint:** `/api/search`\ |
| 24 | +**Query parameter:** `search=nwr[name=Starbucks](area=colorado)` |
| 25 | + |
| 26 | +The response would return a GeoJSON FeatureCollection, as shown below: |
| 27 | + |
| 28 | +```json |
| 29 | +{ |
| 30 | + "type": "FeatureCollection", |
| 31 | + "features": [ |
| 32 | + { |
| 33 | + "type": "Feature", |
| 34 | + "geometry": { |
| 35 | + "type": "Point", |
| 36 | + "coordinates": [125.6, 10.1] |
| 37 | + }, |
| 38 | + "properties": { |
| 39 | + "name": "Starbucks" |
| 40 | + } |
| 41 | + } |
| 42 | + ] |
| 43 | +} |
| 44 | +``` |
| 45 | + |
| 46 | +While this worked, the objective was to allow correlations between entities, |
| 47 | +such as identifying schools near coffee shops or places not within a certain |
| 48 | +distance of each other. I used TurfJS for these operations in the first |
| 49 | +iteration. |
| 50 | + |
| 51 | +```js |
| 52 | +import turf from "@turf/turf"; |
| 53 | +const coffeeShops = await query(`nwr[amenity=cafe][name](area=colorado)`); |
| 54 | +const highSchools = await query(`nwr[amenity=school][name](area=colorado)`); |
| 55 | + |
| 56 | +const nearbySchools = []; |
| 57 | +coffeeShops.features.forEach((coffeeShop) => { |
| 58 | + highSchools.features.forEach((school) => { |
| 59 | + const distance = turf.distance(coffeeShop, school, { units: "kilometers" }); |
| 60 | + if (distance < 1) { // assuming 1 kilometer as the proximity threshold |
| 61 | + nearbySchools.push({ |
| 62 | + coffeeShop: coffeeShop.properties.name, |
| 63 | + school: school.properties.name, |
| 64 | + distance: distance, |
| 65 | + }); |
| 66 | + } |
| 67 | + }); |
| 68 | +}); |
| 69 | +``` |
| 70 | + |
| 71 | +The initial approach was functional but slow. For example, querying for schools |
| 72 | +in a large state like California could return thousands of results. Transferring |
| 73 | +this data from the server to the client and performing geospatial analyses in |
| 74 | +the browser with TurfJS was inefficient. Queries often took between 5 and 10 |
| 75 | +seconds, depending on complexity. |
| 76 | + |
| 77 | +The request-response cycle looked like this: |
| 78 | + |
| 79 | +```mermaid |
| 80 | +sequenceDiagram |
| 81 | + participant Client |
| 82 | + participant Server |
| 83 | + Client->>Server: GET /api/search?search=nwr[amenity=cafe][name](area=colorado) |
| 84 | + Note right of Server: Returns large GeoJSON<br>FeatureCollection for coffeeShops |
| 85 | + Server-->>Client: GeoJSON (size: ~500KB) |
| 86 | + Client->>Server: GET /api/search?search=nwr[amenity=school][name](area=colorado) |
| 87 | + Note right of Server: Returns large GeoJSON<br>FeatureCollection for highSchools |
| 88 | + Server-->>Client: GeoJSON (size: ~1MB) |
| 89 | + Note left of Client: Client processes data<br>with TurfJS |
| 90 | + Client->>Client: Calculate distances and filter |
| 91 | + Note left of Client: Result: nearbySchools |
| 92 | +``` |
| 93 | + |
| 94 | +While this was acceptable for a prototype, I wanted to improve performance. The |
| 95 | +idea of moving correlation and geospatial analysis to the server side intrigued |
| 96 | +me, inspired by conversations with a former coworker who extended their Go |
| 97 | +runtime to support scripting. This led to an exploration of embedding JavaScript |
| 98 | +execution on the server using [Goja](https://github.com/dop251/goja). |
| 99 | + |
| 100 | +Goja is a lightweight JavaScript interpreter written in Go. |
| 101 | +[Embedding it into my project was straightforward](/posts/2024-08-30-exploring-goja-a-golang-javascript-runtime), |
| 102 | +allowing me to expose application infrastructure for server-side JavaScript |
| 103 | +execution. This approach let users write JavaScript code to query and correlate |
| 104 | +data without sending large datasets over the wire. Instead, the server executed |
| 105 | +these queries in a sandboxed environment and returned only the necessary |
| 106 | +results. |
| 107 | + |
| 108 | +Initially, I attempted to reuse the same client-side logic, including TurfJS, on |
| 109 | +the server. However, TurfJS, being computationally intensive, struggled within |
| 110 | +Goja's interpreted environment, leading to query times of 15 to 20 seconds. |
| 111 | +Clearly, this was not a viable solution. |
| 112 | + |
| 113 | +To address this, I offloaded the heavy computations to optimized Go functions. |
| 114 | +My project already used the Orb library for geospatial operations, so I exposed |
| 115 | +Orb's functionality to Goja. This allowed JavaScript code to call underlying Go |
| 116 | +functions for tasks like clustering and distance calculations. The result was a |
| 117 | +dramatic improvement in performance, reducing query times to sub-second ranges |
| 118 | +while minimizing data transfer. |
| 119 | + |
| 120 | +**Endpoint:** `/api/runtime`\ |
| 121 | +**Query parameter:** `source=<url encoded JavaScript file>` |
| 122 | + |
| 123 | +```js |
| 124 | +const coffeeShops = query.execute(`nwr[amenity=cafe][name](area=colorado)`); |
| 125 | +const highSchools = query.execute(`nwr[amenity=school][name](area=colorado)`); |
| 126 | + |
| 127 | +const clusteredShops = coffeeShops.cluster(10); // find coffee shops within 10m of each other |
| 128 | + |
| 129 | +const results = clusteredShops.flatMap((shop) => { |
| 130 | + // find nearby high schools within 1km, returning at most 1 entry |
| 131 | + const nearbyHighSchools = clusteredShops.overlap(highSchools, 1_000, 0, 1); |
| 132 | + return [shop, nearbyHighSchools]; |
| 133 | +}); |
| 134 | + |
| 135 | +const payload = results.asGeoJSON(); |
| 136 | + |
| 137 | +export { payload }; |
| 138 | +``` |
| 139 | + |
| 140 | +The system now supports submitting JavaScript code as a parameter to an API |
| 141 | +endpoint. This code runs in a sandboxed environment with strict timeouts and |
| 142 | +safely interacts with the underlying data. The data remains read-only, and the |
| 143 | +platform operates entirely on ephemeral infrastructure hosted by me. |
| 144 | + |
| 145 | +The current platform enables users to test and experiment with running |
| 146 | +JavaScript in a sandboxed environment for geospatial queries. It draws parallels |
| 147 | +to Overpass QL (OQL) used in OpenStreetMap's Turbo, but I find JavaScript easier |
| 148 | +to reason with, especially with TypeScript support. While the API transpiles |
| 149 | +TypeScript to JavaScript without type checking, it includes a TypeScript |
| 150 | +definition for the sandbox environment. This combination provides a flexible and |
| 151 | +user-friendly way to explore geospatial data. |
| 152 | + |
| 153 | +There is more information in the [documentation](https://knowhere.live/docs) for |
| 154 | +this project. It should be open to more people soon. |
0 commit comments