Description
The groupBy
operator has a "time gap" issue when used with subscribeOn
and observeOn
. This exists in Rx.Net as well and was written about at http://blogs.msdn.com/b/rxteam/archive/2012/06/14/testing-rx-queries-using-virtual-time-scheduling.aspx
However, if you introduce asynchrony in the pipeline – e.g. by adding an ObserveOn operator to the mix – you’re effectively introducing a time gap during which we’ve handed out the sequence to you, control has been released on the OnNext channel, but subscription happens at a later point in time, causing you to miss elements. We can’t do any caching of elements because we don’t know when – if ever – someone will subscribe to the inner sequence, so the cache could grow in an unbounded fashion.
In discussion with @headinthebox I have decided to alter the behavior to remove this "time gap" issue so that non-deterministic data loss does not happen for the common use cases of using observeOn
and subscribeOn
with GroupedObservables
from groupBy
.
Why? It is common to want to use observeOn
or subscribeOn
with GroupedObservable
do process different groups in parallel.
It comes with a trade-off though: all GroupedObservable
instances emitted by groupBy
must be subscribed to otherwise it will block. The reason for this is that to solve the "time gap" one of two things must be done:
a) use unbounded buffering (such as ReplaySubject
)
b) block the onNext
calls until GroupedObservable
is subscribed to and receiving the data
We can not choose (a) for the reasons given in the Rx.Net blog post because it breaks backpressure and could buffer bloat until the system fails.
In general it is an appropriate thing to expect people to subscribe to all groups, except in one case where it will be expected to work – using filter
.
In this case we can solve the common case by special-casing filter
to be aware of GroupedObservable
. It's not decoupled or elegant, but it solves the common problem.
Thus, the trade-offs are:
-
Allow for non-deterministic data loss if
observeOn
/subscribeOn
are used and expect people to learn about this by reading docs. -
Behave deterministically when
observeOn
/subscribeOn
are used but block if groups are manually skipped.
Option 2 seems to be easier for developers to run into during dev and solve than option 1 which could often show up randomly – in prod – and be difficult to figure out and solve.