Skip to content

Commit dbc3900

Browse files
committed
add blog post
1 parent 4c9a316 commit dbc3900

File tree

2 files changed

+192
-0
lines changed

2 files changed

+192
-0
lines changed
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Voice transcript to blog post
2+
3+
I am providing you with a voice transcript intended to become a blog post about
4+
code. Please organize the content into a clear and polished markdown document.
5+
Follow these guidelines strictly:
6+
7+
1. **Structure:**
8+
- Include a title as the only heading in the markdown file, using `#` (H1).
9+
- Do not add subheadings (no H2, H3, etc.).
10+
11+
2. **Editing Rules:**
12+
- Correct grammar and vocabulary while preserving my tone and voice as much
13+
as possible.
14+
- Reorganize and optimize the content to ensure clarity and flow, avoiding
15+
redundancy.
16+
- Do not invent or add content that isn't implied or directly stated in the
17+
transcript.
18+
19+
3. **Placeholders:**
20+
- If I mention a placeholder for code (e.g., “Insert a placeholder here for
21+
code”), include a clear markdown comment (e.g.,
22+
`<!-- Code placeholder: Add example here -->`) in the relevant part of the
23+
text.
24+
25+
4. **Conclusion:**
26+
- End the blog post with a simple, conversational concluding sentence. For
27+
example: “That's it. Let me know if you have any questions. Thanks for
28+
reading.” Do not add additional automated-sounding conclusions.
29+
30+
5. **Markdown Format:**
31+
- Ensure the output is in valid markdown with clean formatting.
32+
33+
Please focus on keeping the content organized, improving readability, and
34+
aligning with the tone and intent of my original voice transcript. Do not add
35+
any subheadings or embellishments beyond what is specified.
36+
37+
**Transcript:**\
38+
[Insert the transcript here]

written/posts/2025-01-26.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Efficient API Design: Avoiding Costly External Requests
2+
3+
On a recent project, I built an API that leverages OpenStreetMap data, currently
4+
focusing on the United States and Canada. The API helps users locate
5+
neighborhoods they might want to live in. This journey involved several
6+
iterations of building a web application in read-only mode—a self-imposed
7+
constraint to avoid relying on external services. Everything operates within a
8+
contained environment, such as a container or VM.
9+
10+
This constraint led me to optimize an SQLite database containing OpenStreetMap
11+
data. While I’ve written about
12+
[compressing SQLite data for read-only databases](/posts/2024-07-02-optimizing-large-scale-openstreetmap-data-with-sqlite)
13+
in another blog post (link placeholder), the main challenge was enabling users
14+
to query and correlate geospatial data efficiently.
15+
16+
Initially, my front-end implementation included an endpoint that allowed users
17+
to query OpenStreetMap data in a manner similar to other OpenStreetMap systems.
18+
The queries focused on nodes, ways, or relations and their associated tags.
19+
20+
For example, to find all the Starbucks locations in Colorado, you could use the
21+
following endpoint and query parameter:
22+
23+
**Endpoint:** `/api/search`\
24+
**Query parameter:** `search=nwr[name=Starbucks](area=colorado)`
25+
26+
The response would return a GeoJSON FeatureCollection, as shown below:
27+
28+
```json
29+
{
30+
"type": "FeatureCollection",
31+
"features": [
32+
{
33+
"type": "Feature",
34+
"geometry": {
35+
"type": "Point",
36+
"coordinates": [125.6, 10.1]
37+
},
38+
"properties": {
39+
"name": "Starbucks"
40+
}
41+
}
42+
]
43+
}
44+
```
45+
46+
While this worked, the objective was to allow correlations between entities,
47+
such as identifying schools near coffee shops or places not within a certain
48+
distance of each other. I used TurfJS for these operations in the first
49+
iteration.
50+
51+
```js
52+
import turf from "@turf/turf";
53+
const coffeeShops = await query(`nwr[amenity=cafe][name](area=colorado)`);
54+
const highSchools = await query(`nwr[amenity=school][name](area=colorado)`);
55+
56+
const nearbySchools = [];
57+
coffeeShops.features.forEach((coffeeShop) => {
58+
highSchools.features.forEach((school) => {
59+
const distance = turf.distance(coffeeShop, school, { units: "kilometers" });
60+
if (distance < 1) { // assuming 1 kilometer as the proximity threshold
61+
nearbySchools.push({
62+
coffeeShop: coffeeShop.properties.name,
63+
school: school.properties.name,
64+
distance: distance,
65+
});
66+
}
67+
});
68+
});
69+
```
70+
71+
The initial approach was functional but slow. For example, querying for schools
72+
in a large state like California could return thousands of results. Transferring
73+
this data from the server to the client and performing geospatial analyses in
74+
the browser with TurfJS was inefficient. Queries often took between 5 and 10
75+
seconds, depending on complexity.
76+
77+
The request-response cycle looked like this:
78+
79+
```mermaid
80+
sequenceDiagram
81+
participant Client
82+
participant Server
83+
Client->>Server: GET /api/search?search=nwr[amenity=cafe][name](area=colorado)
84+
Note right of Server: Returns large GeoJSON<br>FeatureCollection for coffeeShops
85+
Server-->>Client: GeoJSON (size: ~500KB)
86+
Client->>Server: GET /api/search?search=nwr[amenity=school][name](area=colorado)
87+
Note right of Server: Returns large GeoJSON<br>FeatureCollection for highSchools
88+
Server-->>Client: GeoJSON (size: ~1MB)
89+
Note left of Client: Client processes data<br>with TurfJS
90+
Client->>Client: Calculate distances and filter
91+
Note left of Client: Result: nearbySchools
92+
```
93+
94+
While this was acceptable for a prototype, I wanted to improve performance. The
95+
idea of moving correlation and geospatial analysis to the server side intrigued
96+
me, inspired by conversations with a former coworker who extended their Go
97+
runtime to support scripting. This led to an exploration of embedding JavaScript
98+
execution on the server using [Goja](https://github.com/dop251/goja).
99+
100+
Goja is a lightweight JavaScript interpreter written in Go.
101+
[Embedding it into my project was straightforward](/posts/2024-08-30-exploring-goja-a-golang-javascript-runtime),
102+
allowing me to expose application infrastructure for server-side JavaScript
103+
execution. This approach let users write JavaScript code to query and correlate
104+
data without sending large datasets over the wire. Instead, the server executed
105+
these queries in a sandboxed environment and returned only the necessary
106+
results.
107+
108+
Initially, I attempted to reuse the same client-side logic, including TurfJS, on
109+
the server. However, TurfJS, being computationally intensive, struggled within
110+
Goja's interpreted environment, leading to query times of 15 to 20 seconds.
111+
Clearly, this was not a viable solution.
112+
113+
To address this, I offloaded the heavy computations to optimized Go functions.
114+
My project already used the Orb library for geospatial operations, so I exposed
115+
Orb's functionality to Goja. This allowed JavaScript code to call underlying Go
116+
functions for tasks like clustering and distance calculations. The result was a
117+
dramatic improvement in performance, reducing query times to sub-second ranges
118+
while minimizing data transfer.
119+
120+
**Endpoint:** `/api/runtime`\
121+
**Query parameter:** `source=<url encoded JavaScript file>`
122+
123+
```js
124+
const coffeeShops = query.execute(`nwr[amenity=cafe][name](area=colorado)`);
125+
const highSchools = query.execute(`nwr[amenity=school][name](area=colorado)`);
126+
127+
const clusteredShops = coffeeShops.cluster(10); // find coffee shops within 10m of each other
128+
129+
const results = clusteredShops.flatMap((shop) => {
130+
// find nearby high schools within 1km, returning at most 1 entry
131+
const nearbyHighSchools = clusteredShops.overlap(highSchools, 1_000, 0, 1);
132+
return [shop, nearbyHighSchools];
133+
});
134+
135+
const payload = results.asGeoJSON();
136+
137+
export { payload };
138+
```
139+
140+
The system now supports submitting JavaScript code as a parameter to an API
141+
endpoint. This code runs in a sandboxed environment with strict timeouts and
142+
safely interacts with the underlying data. The data remains read-only, and the
143+
platform operates entirely on ephemeral infrastructure hosted by me.
144+
145+
The current platform enables users to test and experiment with running
146+
JavaScript in a sandboxed environment for geospatial queries. It draws parallels
147+
to Overpass QL (OQL) used in OpenStreetMap's Turbo, but I find JavaScript easier
148+
to reason with, especially with TypeScript support. While the API transpiles
149+
TypeScript to JavaScript without type checking, it includes a TypeScript
150+
definition for the sandbox environment. This combination provides a flexible and
151+
user-friendly way to explore geospatial data.
152+
153+
There is more information in the [documentation](https://knowhere.live/docs) for
154+
this project. It should be open to more people soon.

0 commit comments

Comments
 (0)