-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
Filestream's state store is scoped per input ID, which gives each input fully control over all entries. While this allows for different inputs to ingest the same file without interfering with each other, this causes data re-ingestion if the input ID changes.
Currently we have an option to take over states from the log
input and other Filestream inputs, however for Filestream inputs the IDs need to be known and explicitly set.
All those constraints end up hindering development when it is required to change the ID of an Filestream input instance or even aggregate multiple input instances into one.
Another problem is the fact that, by default, Filestream is eager to remove states of files it cannot see any more, this happens at startup when it removes all entries from deleted files and when a file is deleted it also removes its state from the store. Keeping the states from a longer period of time can be beneficial.
The goal of this issue is to try aggregating all those individual discussions and issues into a more holistic investigation of a set of features and constraints for the state store and how Filebeat/Filestream track the state of files. This could also even include changes on how we represent file identity.
Related issues:
- filestream: track state for deleted files #46834
- [Filebeat] Clean up Log input registry entries for removed inputs #46738
- [Filestream] Take over and file identity migration can re-ingest rotated log files #43650
- [Filestream] Files can be re-ingested on start up because of
clean_removed: true
(that's the default) #43649