Skip to content

Redis #15

Open
Open
@kevinmichaelchen

Description

@kevinmichaelchen

Why Redis?

Redis is more performant than Postgres.

Redis will be faster than Postgres because Pg offers reliability guarantees on your data (when the transaction is committed, it is guaranteed to be on disk), whereas Redis has a concept of writing to disk when it feels like it, so shouldn't be used for critical data.

How does our SQL query currently work?

Currently we do separate queries for all k-values and resolutions we store:

SELECT id, latitude, longitude, scheduled_for, expected_pay 
FROM trip 
WHERE 
  $1 = ANY (r%d_k%d_neighbors)

The $1 is the driver location's cell index. Basically this query is saying: "Get me all trips where the driver's cell is one of the trip's k-ring neighbors".

TODO: measure performance of GIN-indexed text array columns.

How could Redis help?

At the end of the day, (I assume) a Redis lookup has to beat a SQL query over a text array (even one that has a GIN index).

I'm imagining the use of reverse indexes in Redis to map from cell index to a list of trip IDs.

If a driver wants to see nearby trips, first we get the k-ring cells.

Suppose we have all cells in the k=2 k-rings; that's 19 (1+6+12) cells total.

That means we could do 19 parallel look-ups to Redis.

Each lookup would return a list of trips in that cell.

lookup(tripsIn-cell1) => { trip34, trip87, trip49 }

Technically, you'd do this at multiple resolutions (7 through 10) — so really it would be 76 (19*4) parallel look-ups.

Keeping the cache fresh

For trips

The harder part is keeping the cache fresh (e.g., removing a trip ID from those reverse indexes when a trip reaches a terminal state: no show, canceled, completed, etc). These operations would need to happen eventually, but not necessarily quickly since our ranking algorithm that does the final sorting of results would filter out any terminal-state trips.

For drivers

For the driver cache, it's slightly more involved/convoluted. It means keeping track of a driver's cells at all 4 resolutions:

lookup(driverCells-driver1) => { cell43, cell65, cell768, cell7394 }

When a driver location ping is received, that index gets updated, as well as all of the reverse indexes in which the driver's ID appears. (Redis doesn't support expiration on the element-level, so we have to implement that cleanup ourselves).

Note: I can see this leading to a lot of concurrent reads and writes. ("We thought the trip was near the driver, but the driver has since moved!") I think that should be fine. Like a bartender who is able to look after multiple patrons at once (concurrent) but only able to prepare one drink at a time (non-parallel), Redis is able to support concurrent IO.

Ranking

As hinted at before, the persistence layer would only know about geospatial information. It wouldn't know about terminal-state trips, trip payment, trip start times, driver eligibility, driver seniority, etc. A call to a Trip service or Driver service would allow the ranker to enrich models and make ranking decisions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions