GroupBy "time gap" Issue

The `groupBy` operator has a "time gap" issue when used with `subscribeOn` and `observeOn`. This exists in Rx.Net as well and was written about at http://blogs.msdn.com/b/rxteam/archive/2012/06/14/testing-rx-queries-using-virtual-time-scheduling.aspx

> However, if you introduce asynchrony in the pipeline – e.g. by adding an ObserveOn operator to the mix – you’re effectively introducing a time gap during which we’ve handed out the sequence to you, control has been released on the OnNext channel, but subscription happens at a later point in time, causing you to miss elements. We can’t do any caching of elements because we don’t know when – if ever – someone will subscribe to the inner sequence, so the cache could grow in an unbounded fashion.

In discussion with @headinthebox I have decided to alter the behavior to remove this "time gap" issue so that non-deterministic data loss does not happen for the common use cases of using `observeOn` and `subscribeOn` with `GroupedObservables` from `groupBy`. 

Why? It is common to want to use `observeOn` or `subscribeOn` with `GroupedObservable` do process different groups in parallel.

It comes with a trade-off though: all `GroupedObservable` instances emitted by `groupBy` **must** be subscribed to otherwise it will block. The reason for this is that to solve the "time gap" one of two things must be done:

a) use unbounded buffering (such as `ReplaySubject`)
b) block the `onNext` calls until `GroupedObservable` is subscribed to and receiving the data

We can not choose (a) for the reasons given in the Rx.Net blog post because it breaks backpressure and could buffer bloat until the system fails.

In general it is an appropriate thing to expect people to subscribe to all groups, except in one case where it will be expected to work – using `filter`. 

In this case we can solve the common case by special-casing `filter` to be aware of `GroupedObservable`. It's not decoupled or elegant, but it solves the common problem.

Thus, the trade-offs are:

1) Allow for non-deterministic data loss if `observeOn`/`subscribeOn` are used and expect people to learn about this by reading docs.

2) Behave deterministically when `observeOn`/`subscribeOn` are used but block if groups are manually skipped. 

Option 2 seems to be easier for developers to run into during dev and solve than option 1 which could often show up randomly – in prod – and be difficult to figure out and solve.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GroupBy "time gap" Issue #844

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GroupBy "time gap" Issue #844

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions