Skip to content

Thesis project: continuous streaming #5

@amosr

Description

@amosr

Icicle is a streaming query language for machine-learning feature generation. Icicle must currently be run in batch mode over the day or the week's data set. We would like to be able to run Icicle on realtime streaming data. Ideally, we could point Icicle at an input stream to read from, and Icicle would run on-line, continuously consuming and processing data from the stream.

One possibility is to write a Haskell program that consumes an input stream, and for each new input, passes this input to the C code generated by Icicle. However, our generated C code currently executes in batches, which performs a potentially expensive 'aggregation' step at the end of the batch. It may be beneficial to modify the code generation to split out the aggregation step into a separate function, so it can be applied only when necessary. For the implementation of streaming, Apache Kafka[1] may be a suitable streaming platform.

This project would involve some low-level compiler engineering and code generation. There is a video of a talk by Jacob Stanley [2] about some of the code generation internals.

[1] https://kafka.apache.org/ , https://hackage.haskell.org/package/milena

[2] https://www.youtube.com/watch?v=ZuCRgghVR1Q

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions