Skip to content

Shapes storage: add option to keep shape data#318

Merged
felixguendling merged 2 commits intomasterfrom
shapes-cache
Feb 18, 2026
Merged

Shapes storage: add option to keep shape data#318
felixguendling merged 2 commits intomasterfrom
shapes-cache

Conversation

@pablohoch
Copy link
Copy Markdown
Member

This adds an option keep_shape_data to shapes_storage. If set to true and the data files already exist, the existing shapes data files are reused and new shapes are appended. This only applies to the shapes themselves (shapes_data.bin, shapes_idx.bin, shape_sources.bin), as the other files depend on route or trip indices.

The goal of this change is to enable the reuse of existing routed shapes from previous timetable imports (via an additional cache file that maps stop sequences to shape indices - implemented in motis).

Note that shapes from the timetable (shapes.txt) are always appended, so the shapes data files grow with every import if this flag is enabled.

@MichaelKutzner
Copy link
Copy Markdown
Contributor

While the code looks fine, I'm worried about the constantly growing file size in the long run. Especially if the import runs daily.
My first thought was ordering the shapes, so we could easily drop all shapes from timetables. But there's probably a lot that can go wrong. And it will be expensive on the first reuse, when it's not sorted at all.
A maybe better option might be using different files for shapes from timetables and computed shapes? I assume, that having a few more files is a better trade-off than spending a lot of disk space for unused data. But these are just my first thoughts on that.

@felixguendling
Copy link
Copy Markdown
Member

Garbage collection via "copy everything that's still referenced" should be more or less trivial to implement?

@MichaelKutzner
Copy link
Copy Markdown
Contributor

I think, that should be a viable solution. While copying will temporary require some additional space, it's probably less than what we get from old, unused shapes from timetables. Depending on the feeds and configuration.
If there are hardly any shapes from timetables, it might be fine to keep them anyway. Or at least for some time, before a cleanup happens. So it really depends on the use case. But as I currently can only assume how the disk usage will change over time, we can start simple and improve it later, when we have more data.

@felixguendling felixguendling merged commit d0fb57c into master Feb 18, 2026
8 checks passed
@felixguendling felixguendling deleted the shapes-cache branch February 18, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants