Thesis project: continuous streaming

Icicle is a streaming query language for machine-learning feature generation. Icicle must currently be run in batch mode over the day or the week's data set. We would like to be able to run Icicle on realtime streaming data. Ideally, we could point Icicle at an input stream to read from, and Icicle would run on-line, continuously consuming and processing data from the stream.

One possibility is to write a Haskell program that consumes an input stream, and for each new input, passes this input to the C code generated by Icicle. However, our generated C code currently executes in batches, which performs a potentially expensive 'aggregation' step at the end of the batch. It may be beneficial to modify the code generation to split out the aggregation step into a separate function, so it can be applied only when necessary. For the implementation of streaming, Apache Kafka[1] may be a suitable streaming platform.

This project would involve some low-level compiler engineering and code generation. There is a video of a talk by Jacob Stanley [2] about some of the code generation internals.

[1] https://kafka.apache.org/ , https://hackage.haskell.org/package/milena

[2] https://www.youtube.com/watch?v=ZuCRgghVR1Q

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thesis project: continuous streaming #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Thesis project: continuous streaming #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions