Abstractions for constrained collection types #45

szeiger · 2017-03-08T17:06:24Z

This PR takes the result of #39 and the discussion there to implement the necessary abstractions properly. This integrates the () => Builder implicits that were added to master in the meantime and unifies these builder factories with type-constrained/unconstrained collection factories and with monomorphic collection factories that do not need hacks like BitSet with Iterable[E] anymore.

Add an upper bound in FromIterable and IterableFactory. This
allows us to express type-constrained iterables like BitSet without
requiring an implicit evidence.
Drop the C[X] <: Iterable[X] constraint in FromIterable.
Guaranteeing an Iterable[X] result is not necessary anywhere and
it makes monomorphic collection types awkward, for example producing
a BitSet with Iterable[X] for any X <: Int. What we really want
is to be able to drop the precise element type in C[X] = BitSet.
Add FromIterableBase which contains the target type constructor as
a type member. It is extended by FromIterable which pulls the
type constructor out into a type parameter. This allows us to write
polymorphic code for arbitrary FromIterableBase subtypes without
having to infer the type constructor (which is not possible when it
is not a real type constructor but a type lambda computation like
C[X] = BitSet via MonomorphicIterableFactory.
Add ConstrainedFromIterable and ConstrainedIterableFactory to
mirror FromIterable and IterableFactory but with an additional
implicit evidence for the element type. An upper bound and a
monomorphic version are not necessary: Constrained collection types
are always polymorphic (otherwise they would already have the
correct evidence value for the element type) and any type bound can
be rolled into the evidence type.
Overload to for FromIterable and ConstrainedFromIterable.
Add ConstrainedIterablePolyTransforms which mirrors
IterablePolyTransforms with an additional implicit evidence (so it
is based on ConstrainedFromIterable instead of FromIterable). It
provides an additional method unconstrained to perform a widening
conversion to the most specific unconstrained collection type.
Replace the implicit () => Builder with a more generic implicit
MonomorphicIterableFactory. It works via implicit lookup for all
unconstrained, type-constrained and value-constrained unary type
constructors.
Add ConstrainedMapFactory which mirrors MapFactory with an
additional implicit evidence for the key type. I considered using
a binary type constructor that allows the evidence to be based on
both, key and value types but it would complicate the API with a
projection type for the key-only constraints and I do not expect that
we will need it for any collection.
Implement a builder-based fromIterable in MapFactory and
ConstrainedMapFactory and add implicit MonomorphicIterableFactory
implementations to both.

- Add an upper bound in `FromIterable` and `IterableFactory`. This allows us to express type-constrained iterables like `BitSet` without requiring an implicit evidence. - Drop the `C[X] <: Iterable[X]` constraint in `FromIterable`. Guaranteeing an `Iterable[X]` result is not necessary anywhere and it makes monomorphic collection types awkward, for example producing a `BitSet with Iterable[X]` for any `X <: Int`. What we really want is to be able to drop the precise element type in `C[X] = BitSet`. - Add `FromIterableBase` which contains the target type constructor as a type member. It is extended by `FromIterable` which pulls the type constructor out into a type parameter. This allows us to write polymorphic code for arbitrary `FromIterableBase` subtypes without having to infer the type constructor (which is not possible when it is not a real type constructor but a type lambda computation like `C[X] = BitSet` via `MonomorphicIterableFactory`. - Add `ConstrainedFromIterable` and `ConstrainedIterableFactory` to mirror `FromIterable` and `IterableFactory` but with an additional implicit evidence for the element type. An upper bound and a monomorphic version are not necessary: Constrained collection types are always polymorphic (otherwise they would already have the correct evidence value for the element type) and any type bound can be rolled into the evidence type. - Overload `to` for `FromIterable` and `ConstrainedFromIterable`. - Add `ConstrainedIterablePolyTransforms` which mirrors `IterablePolyTransforms` with an additional implicit evidence (so it is based on `ConstrainedFromIterable` instead of `FromIterable`). It provides an additional method `unconstrained` to perform a widening conversion to the most specific unconstrained collection type. - Replace the implicit `() => Builder` with a more generic implicit `MonomorphicIterableFactory`. It works via implicit lookup for all unconstrained, type-constrained and value-constrained unary type constructors. - Add `ConstrainedMapFactory` which mirrors `MapFactory` with an additional implicit evidence for the key type. I considered using a binary type constructor that allows the evidence to be based on both, key and value types but it would complicate the API with a projection type for the key-only constraints and I do not expect that we will need it for any collection. - Implement a builder-based `fromIterable` in `MapFactory` and `ConstrainedMapFactory` and add implicit `MonomorphicIterableFactory` implementations to both.

szeiger · 2017-03-08T17:08:31Z

src/main/scala/strawman/collection/IterableFactories.scala

+    monomorphicIterableFactoryProto.asInstanceOf[MonomorphicIterableFactory[E, C[E]]]
+}
+
+trait MonomorphicIterableFactory[-B, +Repr] extends IterableFactory[B, ({ type L[X] = Repr })#L]


MonomorphicIterableFactory makes sense from the POV of collection implementations. Calling it CanBuild would better convey the meaning to users.

szeiger · 2017-03-08T17:10:49Z

src/main/scala/strawman/collection/immutable/TreeSet.scala

@@ -32,14 +42,11 @@ final class TreeSet[A]()(implicit val ordering: Ordering[A])

  // From SortedLike
  def range(from: A, until: A): TreeSet[A] = ???
-
-  // From SortedPolyTransforms
-  def map[B](f: (A) => B)(implicit ordering: Ordering[B]): TreeSet[B] = ???


ConstrainedIterablePolyTransforms extends IterablePolyTransforms in order to get the linearization right for the purpose of overload resolution, so this override is no longer required.

This was not an override, just an implementation, FYI.

Trying to compile master without this line:

[info] Compiling 37 Scala sources to /Users/szeiger/code/collection-strawman/target/scala-2.12/classes... [error] /Users/szeiger/code/collection-strawman/src/main/scala/strawman/collection/immutable/TreeSet.scala:10: class TreeSet needs to be abstract, since method map in trait SortedPolyTransforms of type [B](f: A => B)(implicit ordering: Ordering[B])strawman.collection.immutable.TreeSet[B] is not defined [error] (Note that A => B does not match A => B: their type parameters differ) [error] final class TreeSet[A]()(implicit val ordering: Ordering[A]) [error] ^ [error] one error found

Yes, that’s what I meant by “this was an implementation”.

It's not though. The error message is misleading. I haven't tried to track this down further but there is no abstract map method anywhere in the inheritance hierarchy of TreeSet. It inherits two concrete implementations but without enforcing CIPS <: IPS the types don't line up.

szeiger · 2017-03-08T17:12:11Z

src/test/scala/strawman/collection/test/TraverseTest.scala

@@ -29,14 +30,21 @@ class TraverseTest {
  def traverseTest: Unit = {

    val xs1 = immutable.List(Some(1), None, Some(2))
-    optionSequence(xs1)(immutable.List.canBuild[Int]) // TODO Remove explicit CanBuild parameter after https://issues.scala-lang.org/browse/SI-10081 is fixed?


Would still require an explicit parameter even with SI-10081 fixed

szeiger · 2017-03-08T17:13:25Z

src/test/scala/strawman/collection/test/TraverseTest.scala


    val xs2 = immutable.TreeSet(Some("foo"), Some("bar"), None)
-    optionSequence(xs2)(immutable.TreeSet.canBuild[String])
+    val o2 = optionSequence(xs2)


No explicit factory required, even for constrained collections. Both the previous implementation here and my old attempt in #39 needed it.

szeiger · 2017-03-08T17:17:01Z

src/test/scala/strawman/collection/test/TraverseTest.scala


+    // This use case from https://github.com/scala/scala/pull/5233 still eludes us


I'm considering to resurrect the idea of path-dependent types for CanBuild scoping, essentially unifying the new way I implement to (which doesn't need to infer a unary type constructor anymore) with the implicit builder factories. This would not enable a type-driven breakOut but a value-driven approach with an explicit factory similar to the syntax below (like in to).

Indeed, I agree that we should try something like this.

szeiger · 2017-03-09T17:43:43Z

I tried this trick for a CanBuildFrom-like abstraction:

trait Iterable[+A] extends IterableOnce[A] with IterableLike[A, Iterable] {
  final type Build[-B, +Repr] = MonomorphicIterableFactory[B, Repr]
  ...
}

// Testing:
def pair[A, Repr <: Iterable[A], To](xs: Repr with Iterable[A])(implicit fi: Repr#Build[(A, A), To]): To = ???

val xs5 = immutable.List(1, 2, 3)
val p1 = pair(xs5)
val p1t: immutable.List[(Int, Int)] = p1

My understanding of the implicit scopes in 7.2 (http://www.scala-lang.org/files/archive/spec/2.11/07-implicits.html) is that Repr#Build[(A, A), To] first puts the companion object of Repr into the implicit scope and then dealiases Build but scalac disagrees. The Repr companion is not part of the scope. Am I misunderstanding the spec or is this a bug?

Ichoran · 2017-03-10T04:51:21Z

You know, I'm getting worried that this level of abstraction is no better than and possibly even worse than CanBuildFrom. ConstrainedIterablePolyTransforms is particularly bad--I can't imagine that anyone who had trouble with CBF would find CIPT simple. (And the various self-types and factories are not terribly transparent either, I don't think, to a less expert user.)

I think if this rewrite is going to be a success on the simplification dimension, we're going to have to rely less on clever and elaborate type encodings and more on the compiler to do the thing we want by default. If this means altering the scope or search strategy of implicits, I think that's fine. If we drop the simplification part--that is, understanding how collections work remains a dark art of the high wizards of Scala--then the existing design I think is quite good. And I'm not totally opposed to dropping that part; having better behavior and better implementation improves things quite nicely. But perhaps we do want the simplification, and then I think this is getting to be too weighty.

In particular, I would take a completely orthogonal approach to constraints: a constrained collection does not even live in the same hierarchy, but requires boxing to become a full-fledged collection member. We're partway there with ConstrainedIterablePolyTransforms, and I think if you don't care to understand the four type parameters very carefully that using CIPT is actually quite straightforward. But having coexisting constrained and unconstrained methods makes things especially tricky to implement.

I'll try to come up with a sketch this weekend where the hierarchy is held together more loosely, and where to get the full power of collections you may need to box into them. It's more of a breaking change, so maybe it's not going to work. But this experiment is convincing me that you can't really get CBF-like functionality without CBF-like complexity.

lrytz · 2017-03-10T09:46:36Z

In its defense, CIPT almost doesn't leak to the parts of the collections hierarchy which have nothing to do with constrained collections. This is very different from CBF, which are everywhere.

It does add complexity related to overriding order and overloading resolution. But this looks good to me:

Compared to the 2.12 version

lrytz · 2017-03-10T10:01:33Z

src/main/scala/strawman/collection/Iterable.scala

+/** Transforms over iterables that can return collections of different element types for which an
+  * implicit evidence is required.
+  */
+trait ConstrainedIterablePolyTransforms[+A, +C[A], +CC[X] <: C[X], Ev[A]] extends ConstrainedFromIterable[CC, Ev] with IterablePolyTransforms[A, C] {


+C[A] could be +C[_] (also in IterablePolyTransforms)

I already removed a bunch of unnecessary names. I must have overlooked these ones.

szeiger · 2017-03-10T14:39:58Z

The simplification vs CanBuildFrom is two-fold:

You only see the constraint-related complexity in the collection types that need them (i.e. arrays and sorted collections).
Compiler errors and auto-complete results look the way you expect despite abstracting over the evidence type because you're always dealing with a concrete instantiation where the evidence type is known.

For users I think this is vastly simpler than CanBuildFrom. For collection implementors it makes things somewhat easier because they can reuse ConstrainedIterablePolyTransforms for new evidence types. I agree that implementing collections is still difficult but that seems to be inherent complexity that we cannot remove. Either we abstract over possible constraints everywhere (CanBuildFrom) or we have two versions of all poly-transforms for constrained and unconstrained collections.

Note that adding the unconstrained collection type constructor as another new type parameter is independent of abstracting over the evidence type. SortedPolyTransforms should also have done it so it can extend IterablePolyTransforms, otherwise you need to override all methods in all implementations to get the linearization right.

But this experiment is convincing me that you can't really get CBF-like functionality without CBF-like complexity.

We're already sacrificing functionality in this design: There's no automatic fallback to building an unconstrained collection and you can't implement a breakOut equivalent. Maybe more importantly, while we can implement methods like Future.sequence with the current level of functionality we cannot implement the improved version from scala/scala#5233, which is what I was trying to do with the Repr#Build trick above. This requires something similar to breakOut, not because we want to build a target type that is different from the source type but because the target type may require a binary type constructor.

Ichoran · 2017-03-10T15:27:18Z

@lrytz - Yes, I like the improvement of usage you get from CIPT, as I alluded to earlier! And I do like that it's kept away from many of the collections.

@szeiger - Actually I view the lack of automatic fallback an advantage, and I'm not saying that we shouldn't have constrained and unconstrained collections inherit different things. I'm actually wondering whether we can additionally simplify things by allowing the inheritance to vary even more. I'm reasonably happy with the existing design. I'm just looking for additional ways to simplify.

For instance, with CIPT we have a fourth type parameter, C[_] or C[A], that exists solely to implement a single widening method. Nothing in CIPT cares (yet) that this method exists, but it complicates the type signature of the trait (both by having it and by requiring CC[X] <: C[X]). Can we just drop it? Or do we need it as a bridge to the less constrained methods? If we need it as a bridge, can we instead require a box? (Maybe this has too big a performance hit.)

I think actually a good bit of the problem I sense intuitively is the VeryLongAndExplicitClassNames combined with [A, C[_], X]. It feels like there are these giant things you must focus on (why else would they be so long?) and then there's cryptic type muddle where you can't tell what is for what and why. Maybe shorter trait names and longer generic parameter names would make it more self-documenting. (But I'm trying to put myself in the shoes of someone less familiar, which is a difficult thing to do accurately.)

szeiger · 2017-03-10T15:37:55Z

Actually I view the lack of automatic fallback an advantage

I don't have strong feelings either way, but I'm OK with removing automatic fallback.

For instance, with CIPT we have a fourth type parameter, C[_] or C[A], that exists solely to implement a single widening method.

Implementing the unconstrained method there is just a side-effect of having C available. It is needed for with IterablePolyTransforms[A, C] to get the linearization right.

It feels like there are these giant things you must focus on (why else would they be so long?) and then there's cryptic type muddle where you can't tell what is for what and why

I wrote this code in the established Scala style of one-letter type parameters (already deviating a bit with CC and Ev because I didn't just want to pick some random single letter), but in general I would often prefer more expressive names (see also scala/scala#5265).

Ichoran · 2017-03-10T15:41:46Z

Ah, right. So, if you didn't need to inherit IterablePolyTransforms, you wouldn't need that C. Which gets back to my thought of making things less tied into the hierarchy.

There is precedent for longer type parameter names in CanBuildFrom.

plokhotnyuk · 2017-03-10T18:51:04Z

Could you, please, state performance as one of main requirements for new collections.
Currently, in hot path, replacing usage of Scala's collections by Java's, because client want to pay less AWS bills

SethTisue · 2017-03-10T19:02:43Z

@plokhotnyuk use a separate ticket, please

Ichoran · 2017-03-13T04:01:05Z

@szeiger - I played with a few variants of CIPT that have only three type parameters, and though I didn't get all the way through the necessary changes they were promising enough that I am now happy with this approach: I think there is a round or two of simplification that could be applied probably without any compiler changes. So that makes going with this exactly and then slightly refactoring a pretty safe bet. (Without getting everything compiling I couldn't tell whether type inference would work on the new case, but we can tweak type inference I imagine, as long as the inference is a sensible thing to do in general.

Ichoran

This strategy looks a little imposing for less-expert users but is nicely usable, and I think there are likely to be refactorings that somewhat reduce how imposing it is. This is working, so let's go with it, absent good ideas for how to improve it.

julienrf · 2017-03-13T09:56:56Z

src/main/scala/strawman/collection/Iterable.scala

+  */
+trait ConstrainedIterablePolyTransforms[+A, +C[A], +CC[X] <: C[X], Ev[A]] extends ConstrainedFromIterable[CC, Ev] with IterablePolyTransforms[A, C] {
+
+  protected def coll: Iterable[A]


What’s the advantage of that compared to having a self type annotation?

This is the same as in IterablePolyTransforms. It allows the use for pseudo-collection types like String and Array.

julienrf · 2017-03-13T10:18:02Z

src/main/scala/strawman/collection/Iterable.scala

    */
-  def to[C[X] <: Iterable[X]](fi: FromIterable[C]): C[A @uncheckedVariance] =
+  def to[F <: FromIterableBase[A]](fi: F): F#To[A @uncheckedVariance] =


Why not use fi.To[A] as the return type instead of the F#To[A] projection?

In this case it shouldn't matter but I generally try to avoid path-dependent types when type projections are sufficient. With path-dependent types you always need to take care to get the paths right. It is easy to get into a situation where two types are "obviously" the same but the compiler doesn't agree. This is easier with type projections. Everything's fine as long as they dealias to the same type.

Note that Dotty will not accept F#To because it might be unsound. So we should definitely stick with fi.To[A].

I think that this kind of type projections is going to be removed from Dotty because they are unsound. Since in our case the fi parameter would always be a static object I think that using a path-dependent type would not suffer from the problems you are mentioning.

Ah, right, this looked safe enough for me but I haven't tested it on Dotty. It's pretty much the same as scala/scala3#1051 (comment) though, so it won't work. I'll change it to use the path-dependent type instead.

julienrf · 2017-03-13T10:19:59Z

src/main/scala/strawman/collection/IterableFactories.scala


-  implicit def canBuild[A]: () => Builder[A, C[A]] = () => newBuilder[A] // TODO Reuse the same instance
+  def constrainedNewBuilder[A : Ev]: Builder[A, CC[A]]


I would name it newConstrainedBuilder instead of constrainedNewBuilder.

I considered that, too, but it sounds like a method that creates a ConstrainedBuilder, whereas constrainedNewBuilder is consistent with constrainedFromIterable.

julienrf · 2017-03-13T10:22:54Z

src/main/scala/strawman/collection/Iterable.scala

  // variance seems sound because `to` could just as well have been added
  // as a decorator. We should investigate this further to be sure.
    fi.fromIterable(coll)

+  def to[C[_], Ev[_]](fi: ConstrainedFromIterable[C, Ev])(implicit ev: Ev[A @uncheckedVariance]): C[A @uncheckedVariance] =
+    fi.constrainedFromIterable(coll)


Maybe that’s unrelated to this PR, but I guess this method does not work with collections having binary type constructors (e.g. TreeMap), right? BTW, does the unconstrained one work with HashMap?

Right, this still needs to be improved. The unconstrained version cannot work for binary type constructors. We only have an element type A in Iterable. While we can restrict it to work only for (A1,A2) there is no way to compute these types A1 and A2 only through FromIterable's upper bound. Conversions to maps either need another special case or a ConstrainedFromIterable. Since we're special-casing Map in other places we should probably overload to with two additional cases for unconstrained and constrained maps.

A CanBuildFrom-like would nicely solve these issues. It’s probably OK if we have it only at this place instead of having it everywhere as in the current collections.

julienrf · 2017-03-13T10:29:16Z

👍 to this work.

odersky

Overall, I think this goes in the right direction. But we need to correct for complexity. For me a good rule of thumb is that complexity grows with the product of number of type parameters on a given class or method. So 4 type parameters for CIPT is a lot! It would be great if we could get rid of at least one. This seems to be possible since we seem to agree that automatic widening is not needed.

I also dislike the B parameter on FromIterable. We should be particularly careful to minimize type parameter lists of widely used base traits like it. It seems in most cases the parameter is instantiated to Any. Which makes me wonder:

can we find a less visible way to represent the parameter, e.g. a as a type member.
or, should we not bother at all about this? How much code duplication would we get if we special cased those cases where the bound is not Any?

Let's also try to not go overboard with long names.

odersky · 2017-03-13T13:54:55Z

src/main/scala/strawman/collection/IterableFactories.scala

+
+  def newBuilder[A <: B]: Builder[A, C[A]]
+
+  protected[this] lazy val monomorphicIterableFactoryProto: MonomorphicIterableFactory[B, C[B]] = new MonomorphicIterableFactory[B, C[B]] {


I would try to shorten this name, unless we want to give a great opportunity for ridicule. So my vote goes to CanBuild and canBuildProto (or maybe we can even leave out the proto?)

odersky · 2017-03-13T14:32:11Z

src/main/scala/strawman/collection/Iterable.scala

    */
-  def to[C[X] <: Iterable[X]](fi: FromIterable[C]): C[A @uncheckedVariance] =
+  def to[F <: FromIterableBase[A]](fi: F): F#To[A @uncheckedVariance] =


Note that Dotty will not accept F#To because it might be unsound. So we should definitely stick with fi.To[A].

odersky · 2017-03-13T17:06:40Z

src/main/scala/strawman/collection/IterableFactories.scala


 /** Base trait for instances that can construct a collection from an iterable */
-trait FromIterable[+C[X] <: Iterable[X]] {
-  def fromIterable[B](it: Iterable[B]): C[B]
+trait FromIterable[-B, +C[_]] extends FromIterableBase[B] {


Why do we need the bound here? It would be great if we could avoid the additonal B parameter.

szeiger · 2017-03-13T19:11:38Z

@odersky

It would be great if we could get rid of at least one. This seems to be possible since we seem to agree that automatic widening is not needed.

I don't see how. We need the unconstrained type constructor for manual widening and to get the linearization right. There is no automatic widening in this version.

can we find a less visible way to represent the parameter, e.g. a as a type member.

I would have preferred that, too, but due to variance the default would be Nothing if you don't specify anything else but we need Any. We could have a type alias for the Any case but I think it would only complicate things further. OTOH, how often do you need to use FromIterable? Most users will only see it in to and in the factory methods (if they look at the actual source; scaladoc and IDEs show the instantiated Any bound in most cases).

or, should we not bother at all about this? How much code duplication would we get if we special cased those cases where the bound is not Any

String and BitSet are the cases that we know about. They would either need special-casing or be modeled as constrained types with implicit evidence. I would like to avoid the latter due to the performance implications.

odersky · 2017-03-13T19:44:21Z

@szeiger My tendency would be to special case String and BitSet. The reasoning is that it's better to require some duplication for implementers of non-parametric collection types than to complicate the framework to a point where it is too hard for them to extend it anyway. The existing collections were a lesson in that respect.

Ichoran · 2017-03-13T19:55:02Z

Also in favor of special-casing: it is important that String especially but also BitSet be really fast, which means that many things will be written as a special case anyway.

I'll try to finish up some of my 3-parameter CIPTs tonight or the next day. The basic idea is to drop the inheritance of IPT in CIPT and push responsibility for putting something sensible for C[A] down to the leaf class implementations and/or use protected methods to enforce sensible linearization. I got multiple things that compiled with very little work, but haven't had time to test them all in enough depth to know what flaws they might have.

szeiger · 2017-03-19T14:36:47Z

I've just pushed a new commit. Ever since the implicit builders were merged I had this nagging feeling that we had the whole thing backwards. In the new design the source collection is responsible for building a target collection, we do not rely on an implicit CanBuildFrom for that, so why should we do that for implicit builders? It comes with the same old problem of getting different results depending on the static type (because we rely on implicit search in companion objects to decide which builder to use) and it also caused me all kinds of problems trying to unify implicit builders and companion objects.

So I went back to the drawing board and started with the assumption that the source collection has to decide what to build. We already had Buildable for collections that allow eager building of the same collection with the same element type. A more consistent name for it is MonoBuildable and the newBuilder method is now called newBuilderWithSameElemType. This is consistent with IterableMonoTransforms and fromIterableWithSameElemType. What we were missing though was the poly version of it, so I added PolyBuildable. Unlike MonoBuildable it is not limited by variance, so its newBuilder method can be public. This allows anyone who has a strict unconstrained collection (i.e. a PolyBuildable) to build the same collection type with any element type. For constrained collection types I added ConstrainedPolyBuildable with an extra implicit evidence to mirror ConstrainedFromIterable.

Now we have everything in place to write methods outside the collection framework (like Future.sequence) in the same style as methods within, i.e. by overloading for constrained and unconstrained collection types:

  def optionSequence1[C[X] <: Iterable[X] with PolyBuildable[X, C], A](xs: C[Option[A]]): Option[C[A]] =
    xs.foldLeft[Option[Builder[A, C[A]]]](Some(xs.newBuilder)) {
      case (Some(builder), Some(a)) => Some(builder += a)
      case _ => None
    }.map(_.result)

  def optionSequence1[Ev[_], CC[_], A](xs: ConstrainedPolyBuildable[Option[A], CC, Ev] with Iterable[Option[A]])(implicit ev: Ev[A]): Option[CC[A]] =
    xs.foldLeft[Option[Builder[A, CC[A]]]](Some(xs.newConstrainedBuilder)) {
      case (Some(builder), Some(a)) => Some(builder += a)
      case _ => None
    }.map(_.result)

Since having to overload everything (if you want to support constrained collection types) is rather awkward, the next step was how to unify both through an implicit value. The result is essentially CanBuildFrom but I changed the To type from a type parameter to a type member because it is always a computed result type and not a type that guides the computation:

trait BuildFrom[-From, -Elem] {
  type To
  def newBuilder(from: From): Builder[Elem, To]
  def fromIterable(from: From)(it: Iterable[Elem]): To
}

Note that there is no method to get a Builder without a From instance. The source collection is still responsible for building!

While usage is essentially the same as for CanBuildFrom (we could even define a type alias to make it look the same) the way the implicits are materialized is very different. The companion object of BuildFrom has two methods to get a BuildFrom from a PolyBuildable and from a ConstrainedPolyBuildable (if an evidence value is available). This is defined in a self-contained way on top of the collection framework. No collection type has to define its own BuildFrom instances.

All implicit builders in companion objects are gone and so are the BuildStrict, BuildConstrained and BuildStrictConstrained abstractions. None of that is needed anymore.

Now we can write a single method like this that abstracts over PolyBuildable and ConstrainedPolyBuildable:

  def optionSequence[CC[X] <: Iterable[X], A](xs: CC[Option[A]])(implicit bf: BuildFrom[CC[Option[A]], A]): Option[bf.To] = ...

We can even do breakOut, and in way that is much more consistent with the new design than my previous attempt. In addition to the implicit values that BuildFrom provides for buildables, it also defines implicit conversions from all four basic collection factory types (IterableFactory, ConstrainedIterableFactory, MapFactory and ConstrainedMapFactory) to BuildFrom. This allows you to pass any collection factory that you could use in a .to call to a method that expects a BuildFrom:

    val xs4 = immutable.List[Option[(Int, String)]](Some((1 -> "a")), Some((2 -> "b")))
    val o4 = optionSequence(xs4)(immutable.TreeMap)

In fact, we can use it to define to:

  def to[F <: TypeConstrainedFromIterable[A]](fi: F): fi.To[A @uncheckedVariance] =
    fi.fromIterable(coll)

  // Generic version of the method above that can build anything with BuildFrom. Note that `bf` is not implicit.
  // We never want it to be inferred (because a collection could only rebuild itself that way) but we do rely on
  // the implicit conversions from the various factory types to BuildFrom.
  def to(bf: BuildFrom[Iterable[A], A]): bf.To =
    bf.fromIterable(coll)(coll)

The first version is purely for performance: It allows collection types without an evidence value to be built in a more direct way. Everything else uses the implicit conversion to BuildFrom.

FromIterable still extends TypeConstrainedFromIterable (which was called Build in the previous version). We could easily change that so that most collection types do not have to extend TypeConstrainedFromIterable. It would require an extra overload for to to have efficient implementations for both.

Ichoran · 2017-03-20T00:19:27Z

Aha! I like this one. I think it still needs a couple of rounds of simplification and renaming to make it comprehensible and to get rid of some of the veryLongNamesEspeciallyWhenTheyRepeat: LongNameEspeciallyWhenTheyRepeat = new LongNameEspeciallyWhenTheyRepeat things.

But this seems like a better design overall. I have to think through and test cases where the source collection being more in-charge might be a problem. But from first principles, as you say, this seems better: if the source is in charge, dynamic dispatch takes care of loss of type information. Conceptually, then, the issue would be creating collections from scratch. I need to look into that in more detail.

julienrf · 2017-03-20T08:35:12Z

Note that there is no method to get a Builder without a From instance.

That could be useful to implement unfolds, though.

julienrf · 2017-03-20T10:10:53Z

src/main/scala/strawman/collection/BuildFrom.scala

+    type To = C[E]
+    def newBuilder(from: Any): Builder[E, To] = fact.newBuilder
+    def fromIterable(from: Any)(it: Iterable[E]): To = fact.fromIterable(it)
+  }


This sounds like smell: we should not have to introduce this Any parameter if we don’t need it.

We do need for the "proper" cases (implicit values for poly builders). When you do a manual breakout, the from value is ignored.

Alternatively, we could have a subclass like BuildFromAny[-Elem] extends BuildFrom[Any, Elem] with overloads that do not take a from value. This would be enough for the non-implicit use case in to. Is getting rid of the Any parameter here worth the extra complexity of a new class with two new methods?

julienrf · 2017-03-20T10:18:23Z

src/main/scala/strawman/collection/IterableFactories.scala

-trait IterableFactories[+C[X] <: Iterable[X]] extends FromIterable[C] {
+/** Base trait for instances that can construct a collection from an iterable for certain types
+  * (but without needing an evidence value). */
+trait TypeConstrainedFromIterable[-B] extends Any {


UpperBoundedFromIterable?

I would say BoundedFromIterable. The upper is implied if left out.

julienrf · 2017-03-20T10:27:00Z

src/main/scala/strawman/collection/mutable/SortedSet.scala

 trait SortedSet[A]
  extends collection.SortedSet[A]
    with Set[A]
    with SortedSetLike[A, SortedSet]

 trait SortedSetLike[A, +C[X] <: SortedSet[X]]
  extends collection.SortedSetLike[A, C]
+    with ConstrainedIterablePolyTransforms[A, Set, SortedSet]


with ConstrainedIterablePolyTransforms[A, Set, C]

By the way, would it be possible to pull up this line at the level of collection.SortedSet so that we don’t repeat it for collection.mutable.SortedSet and collection.immutable.SortedSet?

Yes, it's possible.

julienrf · 2017-03-20T10:29:17Z

src/test/scala/strawman/collection/test/Test.scala

@@ -331,6 +336,8 @@ class StrawmanTest {
 //    val xs7 = xs.map((k: String, v: Int) => (new Foo, v)) Error: No implicit Ordering defined for Foo
    val xs7 = (xs: immutable.Map[String, Int]).map((k, v) => (new Foo, v))
    val xs8: immutable.Map[Foo, Int] = xs7
+    val xs9 = xs6.to(List).to(mutable.HashMap)


julienrf · 2017-03-20T10:30:42Z

src/test/scala/strawman/collection/test/TraverseTest.scala

+    // Breakout-like use case from https://github.com/scala/scala/pull/5233:
+    val xs4 = immutable.List[Option[(Int, String)]](Some((1 -> "a")), Some((2 -> "b")))
+    val o4 = optionSequence(xs4)(immutable.TreeMap) // same syntax as in `.to`
+    val o4t: Option[immutable.TreeMap[Int, String]] = o4


lrytz · 2017-03-20T10:31:29Z

Also like the latest change!

I'm a bit worried about keeping / promoting the "implicit buildFrom" pattern. We are moving away from it for the internal collections implementation and use overloading instead, why should we recommend it for external methods? I see that collection types don't have to define their own BuildFrom instances, which certainly makes things simpler. But there are other issues with implicit BuildFrom, like the complex signatures, Scaladoc, IDE discovery. For example, I defined your optionSequence methods as extension methods, here's how it looks:

szeiger · 2017-03-20T12:30:09Z

@Ichoran @julienrf Building an arbitrary target collection will require a factory object. Just as in the case of to it is not type-driven. You have to pass the actual object. You can write it as a single method by using a non-implicit BuildFrom the same way the new to overload does. The alternative is to overload it for the different factory kinds. In fact, this is something I also considered for to but you have to support four factory kinds (unconstrained iterable/map, constrained iterable/map) if you want it to be comprehensive (unlike the to method in the current collections library which does not support map types).

@lrytz I tried it with To as a type parameter instead of a type member but it didn't help, either. I don't think there's much we can do other than using BuildFrom-based methods sparingly. If you want a single definition that can abstract over kinds, implicits are the only way to do it.

julienrf · 2017-03-20T13:01:29Z

Building an arbitrary target collection will require a factory object.

Then we will not be able to write generic code as you did with optionSequence, but in cases where we have no “from” collection. A typical use case would be “unfolds”.

With the current collections we can write the following:

  class Unfolder[That] {
    def apply[S, A](init: S)(next: S => Option[(A, S)])(implicit cbf: CanBuildFrom[Nothing, A, That]): That = {
      val builder = cbf.apply()
      …
    }
  }

And then That can be List[X] as well as TreeMap[X, Y].

szeiger · 2017-03-20T13:20:39Z

Yes, that's correct. It is consistent with to and with the new way of doing breakOut-like operations and it doesn't look like a huge loss to me. In most cases you want to build a specific collection type, so you just use the correct factory object. In my experience with using breakOut the result is usually assigned to a variable, in which case the new syntax is actually shorter because you don't have to spell out the element type (in addition to importing and using breakOut itself). If you want to abstract over collection types you have to pass a CanBuildFrom around and repeat the necessary type parameters for that everywhere. This was non-trivial in the old design and doesn't change much here (it may be slightly simpler now with To being a type member).

julienrf · 2017-03-21T10:57:21Z

My global feeling about this PR: I definitely think we need a CanBuildFrom abstraction as you proposed, because it makes it possible to write powerful generic methods such as def sequence[F[X] <: Iterable[X], A](xs: F[G[A]])(implicit bf: BuildFrom[F[G[A]], A]): G[bf.To]. However, the hierarchy really becomes more complex with this PR. Maybe we should keep the CanBuildFrom-like thing but keep a simpler hierarchy (at the price of duplicating more things between the four kinds of factories).

Ichoran · 2017-03-21T13:58:45Z

@julienrf - I concur. That was one of the "rounds of simplification" I envisioned, along with one more attempt to check whether any more type parameters can be made type members instead. (Along with a renaming to things that might technically be a little less accurate but are more intuitive, e.g. IterableOfSame instead of IterableMonoTransforms. Same/Other or Alike/Unlike can be useful here.)

odersky

This looks pretty good! I am happy for this to go in after some polishings.

odersky · 2017-03-22T11:13:56Z

src/main/scala/strawman/collection/Iterable.scala

-  def to[C[X] <: Iterable[X]](fi: FromIterable[C]): C[A @uncheckedVariance] =
-  // variance seems sound because `to` could just as well have been added
-  // as a decorator. We should investigate this further to be sure.
+  def to[F <: TypeConstrainedFromIterable[A]](fi: F): fi.To[A @uncheckedVariance] =
    fi.fromIterable(coll)


I don't think you need the F parameter here.

def to(fi: TypeConstrainedFromIterable[A]): fi.to[A]

should work, no?

I'll give it a try. (I'm used to writing type projections where this wouldn't work)

odersky · 2017-03-22T11:31:31Z

src/main/scala/strawman/collection/Iterable.scala

+  // the implicit conversions from the various factory types to BuildFrom.
+  def to(bf: BuildFrom[Iterable[A], A]): bf.To =
+    bf.fromIterable(coll)(coll)
+
  /** Convert collection to array. */


I am not sure about overloading to. Why do we need both versions? Also, naming conventions vary a lot
TypeConstrainedFromIterable and BuildFrom seem to have nothing in common.

We don't need both. The TypeConstrainedFromIterable version covers the most common case and doesn't require an implicit conversion from the factory type to a BuildFrom. It's only there for performance (but I haven't benchmarked it). The BuildFrom version alone is sufficient.

Hm, actually BitSet doesn't currently work with the BuildFrom version. I'll try to fix it.

odersky · 2017-03-22T11:36:29Z

src/main/scala/strawman/collection/Iterable.scala

+
+/** Transforms over iterables that can return collections of different element types for which an
+  * implicit evidence is required.
+  */


Should explain in comment with examples why we need both C and CC. It looked non-senscial at first to me, and I started to understand only when I looked at instantiations of the class. (of course if we found a way to merge C and CC that would be even better).

odersky · 2017-03-22T11:39:36Z

src/main/scala/strawman/collection/IterableFactories.scala

-trait IterableFactories[+C[X] <: Iterable[X]] extends FromIterable[C] {
+/** Base trait for instances that can construct a collection from an iterable for certain types
+  * (but without needing an evidence value). */
+trait TypeConstrainedFromIterable[-B] extends Any {


I would say BoundedFromIterable. The upper is implied if left out.

odersky · 2017-03-22T14:18:19Z

Should we try to merge Bounded (aka TypeConstrained) and Constrained? We can always express Bounded as a Constraint where we demand an instance of <:<. How many examples are there where we need Bounded? I know of String and BitSet, are there others?

szeiger · 2017-03-22T16:31:51Z

I pushed another commit to address @odersky's review comments.

Added more docs to ConstrainedIterablePolyTransforms
Removed unnecessary type parameter in to(TypeConstrainedFromIterable)
Renamed TypeConstrainedFromIterable to BoundedFromIterable

The problem with BitSet was caused by the companion object not extending IterableFactory because it is type-constrained. I fixed it by adding a supertype BoundedIterableFactory that covers all cases. OTOH we don't really need BoundedFromIterable so I removed that in exchange. This means that the "bounded" abstraction is only visible in companion objects but not in the collection types.

julienrf · 2017-04-13T13:46:16Z

@lrytz I noticed that I don’t have the same result as you:

What bothers me is this BitSet.this.Ev thing (instead of Ordering)… Which version of IntelliJ are you using? (I’m using 2017.1.1)

lrytz · 2017-04-13T19:53:47Z

@julienrf this changed at some point between my comment (#45 (comment)) and now. On the current master branch I get the same as you.

julienrf · 2017-04-20T13:03:29Z

@szeiger In practice, do you think we will have other instantiations of Ev than Ordering? The current design leads to some issues, notably when used with dotty: this method, by taking an unconstrained type parameter Ev makes the implicit search explode. Also, there is the IDE issue that I reported above.

What would be the impact of removing this abstract layer ConstrainedPolyBuildable and having just an OrderingConstrainedPolyBuildable (with no Ev type parameter)?

szeiger · 2017-04-28T17:07:27Z

I can't think of any other instantiations in the core collection library. Apart from some reusable interfaces we wouldn't lose much by not abstracting over Ev[_]. People could still define collection types with other constraints by duplicating the interfaces. The only additional work required in these cases is implementing the CanBuild implicits (but as you point out abstracting over Ev[_] for these implicits doesn't work correctly in Dotty anyway).

`BuildFrom` is like `FromSpecificIterable` but with an extra `From` argument, like in the final version of scala#45. `FromSpecificIterable` existed conceptually in that version as `BuildFrom[Any, …]` but didn’t have a separate type. This new version has separate abstractions for buildable (strict) collection types in the form of `StrictBuildFrom` and `FromSpecificIterableWithBuilder`. Since we can get a surrogate builder (through one of the new `Builder.from` methods) for any lazy collection and we can restrict code to work only with strict collections via the `Buildable` trait, this is not strictly necessary. The only reason for separating the `Builder` abstractions is to avoid exposing them in `FromSpecificIterable`. Even though everything can be built in a strict way, these abstractions sit on top of the lazy ones, not below them.

`BuildFrom` is like `FromSpecificIterable` but with an extra `From` argument, like in the final version of #45. `FromSpecificIterable` existed conceptually in that version as `BuildFrom[Any, …]` but didn’t have a separate type. This new version has separate abstractions for buildable (strict) collection types in the form of `StrictBuildFrom` and `FromSpecificIterableWithBuilder`. Since we can get a surrogate builder (through one of the new `Builder.from` methods) for any lazy collection and we can restrict code to work only with strict collections via the `Buildable` trait, this is not strictly necessary. The only reason for separating the `Builder` abstractions is to avoid exposing them in `FromSpecificIterable`. Even though everything can be built in a strict way, these abstractions sit on top of the lazy ones, not below them.

szeiger commented Mar 8, 2017

View reviewed changes

szeiger mentioned this pull request Mar 8, 2017

Use a universal abstraction for constrained collection types #39

Closed

szeiger requested review from odersky and Ichoran March 9, 2017 11:56

lrytz reviewed Mar 10, 2017

View reviewed changes

Ichoran approved these changes Mar 13, 2017

View reviewed changes

julienrf reviewed Mar 13, 2017

View reviewed changes

odersky reviewed Mar 13, 2017

View reviewed changes

Make source collection responsible for building

8e1fe11

julienrf reviewed Mar 20, 2017

View reviewed changes

Fix the SortedSet inheritance problems pointed out by julienrf

82c74c2

odersky approved these changes Mar 22, 2017

View reviewed changes

Address review comments and make BitSet work with BuildFrom

a70f5ab

szeiger merged commit a324440 into scala:master Mar 23, 2017

julienrf mentioned this pull request May 1, 2017

Have some sort of HasBuilder type class ? #65

Closed

szeiger mentioned this pull request Jun 12, 2017

Retrofit BuildFrom into the new design #97

Closed

szeiger mentioned this pull request Jun 19, 2017

Unify strict and lazy building #114

Closed


		// This use case from https://github.com/scala/scala/pull/5233 still eludes us


		implicit def canBuild[A]: () => Builder[A, C[A]] = () => newBuilder[A] // TODO Reuse the same instance
		def constrainedNewBuilder[A : Ev]: Builder[A, CC[A]]


		def newBuilder[A <: B]: Builder[A, C[A]]

		protected[this] lazy val monomorphicIterableFactoryProto: MonomorphicIterableFactory[B, C[B]] = new MonomorphicIterableFactory[B, C[B]] {

Abstractions for constrained collection types #45

Abstractions for constrained collection types #45

Conversation

szeiger commented Mar 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szeiger commented Mar 9, 2017

Ichoran commented Mar 10, 2017

lrytz commented Mar 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szeiger commented Mar 10, 2017

Ichoran commented Mar 10, 2017

szeiger commented Mar 10, 2017

Ichoran commented Mar 10, 2017

plokhotnyuk commented Mar 10, 2017

SethTisue commented Mar 10, 2017

Ichoran commented Mar 13, 2017

Ichoran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szeiger Mar 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienrf commented Mar 13, 2017

odersky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szeiger commented Mar 13, 2017

odersky commented Mar 13, 2017

Ichoran commented Mar 13, 2017

szeiger commented Mar 19, 2017

Ichoran commented Mar 20, 2017

julienrf commented Mar 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienrf Mar 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lrytz commented Mar 20, 2017

szeiger commented Mar 20, 2017

julienrf commented Mar 20, 2017 • edited Loading

szeiger commented Mar 20, 2017

julienrf commented Mar 21, 2017

Ichoran commented Mar 21, 2017

odersky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odersky Mar 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lrytz commented Mar 10, 2017 •

edited

Loading

szeiger Mar 13, 2017 •

edited

Loading

julienrf Mar 20, 2017 •

edited

Loading

julienrf commented Mar 20, 2017 •

edited

Loading

odersky Mar 22, 2017 •

edited

Loading

julienrf commented Apr 13, 2017 •

edited

Loading