Skip to content

Rethink Filebeat/Filestream store requirements and constraints #46939

@belimawr

Description

@belimawr

Filestream's state store is scoped per input ID, which gives each input fully control over all entries. While this allows for different inputs to ingest the same file without interfering with each other, this causes data re-ingestion if the input ID changes.

Currently we have an option to take over states from the log input and other Filestream inputs, however for Filestream inputs the IDs need to be known and explicitly set.

All those constraints end up hindering development when it is required to change the ID of an Filestream input instance or even aggregate multiple input instances into one.

Another problem is the fact that, by default, Filestream is eager to remove states of files it cannot see any more, this happens at startup when it removes all entries from deleted files and when a file is deleted it also removes its state from the store. Keeping the states from a longer period of time can be beneficial.

The goal of this issue is to try aggregating all those individual discussions and issues into a more holistic investigation of a set of features and constraints for the state store and how Filebeat/Filestream track the state of files. This could also even include changes on how we represent file identity.

Related issues:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions