Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Abstractions for constrained collection types #45

Merged
merged 8 commits into from
Mar 23, 2017

Conversation

szeiger
Copy link
Contributor

@szeiger szeiger commented Mar 8, 2017

This PR takes the result of #39 and the discussion there to implement the necessary abstractions properly. This integrates the () => Builder implicits that were added to master in the meantime and unifies these builder factories with type-constrained/unconstrained collection factories and with monomorphic collection factories that do not need hacks like BitSet with Iterable[E] anymore.

  • Add an upper bound in FromIterable and IterableFactory. This
    allows us to express type-constrained iterables like BitSet without
    requiring an implicit evidence.

  • Drop the C[X] <: Iterable[X] constraint in FromIterable.
    Guaranteeing an Iterable[X] result is not necessary anywhere and
    it makes monomorphic collection types awkward, for example producing
    a BitSet with Iterable[X] for any X <: Int. What we really want
    is to be able to drop the precise element type in C[X] = BitSet.

  • Add FromIterableBase which contains the target type constructor as
    a type member. It is extended by FromIterable which pulls the
    type constructor out into a type parameter. This allows us to write
    polymorphic code for arbitrary FromIterableBase subtypes without
    having to infer the type constructor (which is not possible when it
    is not a real type constructor but a type lambda computation like
    C[X] = BitSet via MonomorphicIterableFactory.

  • Add ConstrainedFromIterable and ConstrainedIterableFactory to
    mirror FromIterable and IterableFactory but with an additional
    implicit evidence for the element type. An upper bound and a
    monomorphic version are not necessary: Constrained collection types
    are always polymorphic (otherwise they would already have the
    correct evidence value for the element type) and any type bound can
    be rolled into the evidence type.

  • Overload to for FromIterable and ConstrainedFromIterable.

  • Add ConstrainedIterablePolyTransforms which mirrors
    IterablePolyTransforms with an additional implicit evidence (so it
    is based on ConstrainedFromIterable instead of FromIterable). It
    provides an additional method unconstrained to perform a widening
    conversion to the most specific unconstrained collection type.

  • Replace the implicit () => Builder with a more generic implicit
    MonomorphicIterableFactory. It works via implicit lookup for all
    unconstrained, type-constrained and value-constrained unary type
    constructors.

  • Add ConstrainedMapFactory which mirrors MapFactory with an
    additional implicit evidence for the key type. I considered using
    a binary type constructor that allows the evidence to be based on
    both, key and value types but it would complicate the API with a
    projection type for the key-only constraints and I do not expect that
    we will need it for any collection.

  • Implement a builder-based fromIterable in MapFactory and
    ConstrainedMapFactory and add implicit MonomorphicIterableFactory
    implementations to both.

- Add an upper bound in `FromIterable` and `IterableFactory`. This
  allows us to express type-constrained iterables like `BitSet` without
  requiring an implicit evidence.

- Drop the `C[X] <: Iterable[X]` constraint in `FromIterable`.
  Guaranteeing an `Iterable[X]` result is not necessary anywhere and
  it makes monomorphic collection types awkward, for example producing
  a `BitSet with Iterable[X]` for any `X <: Int`. What we really want
  is to be able to drop the precise element type in `C[X] = BitSet`.

- Add `FromIterableBase` which contains the target type constructor as
  a type member. It is extended by `FromIterable` which pulls the
  type constructor out into a type parameter. This allows us to write
  polymorphic code for arbitrary `FromIterableBase` subtypes without
  having to infer the type constructor (which is not possible when it
  is not a real type constructor but a type lambda computation like
  `C[X] = BitSet` via `MonomorphicIterableFactory`.

- Add `ConstrainedFromIterable` and `ConstrainedIterableFactory` to
  mirror `FromIterable` and `IterableFactory` but with an additional
  implicit evidence for the element type. An upper bound and a
  monomorphic version are not necessary: Constrained collection types
  are always polymorphic (otherwise they would already have the
  correct evidence value for the element type) and any type bound can
  be rolled into the evidence type.

- Overload `to` for `FromIterable` and `ConstrainedFromIterable`.

- Add `ConstrainedIterablePolyTransforms` which mirrors
  `IterablePolyTransforms` with an additional implicit evidence (so it
  is based on `ConstrainedFromIterable` instead of `FromIterable`). It
  provides an additional method `unconstrained` to perform a widening
  conversion to the most specific unconstrained collection type.

- Replace the implicit `() => Builder` with a more generic implicit
  `MonomorphicIterableFactory`. It works via implicit lookup for all
  unconstrained, type-constrained and value-constrained unary type
  constructors.

- Add `ConstrainedMapFactory` which mirrors `MapFactory` with an
  additional implicit evidence for the key type. I considered using
  a binary type constructor that allows the evidence to be based on
  both, key and value types but it would complicate the API with a
  projection type for the key-only constraints and I do not expect that
  we will need it for any collection.

- Implement a builder-based `fromIterable` in `MapFactory` and
  `ConstrainedMapFactory` and add implicit `MonomorphicIterableFactory`
  implementations to both.
monomorphicIterableFactoryProto.asInstanceOf[MonomorphicIterableFactory[E, C[E]]]
}

trait MonomorphicIterableFactory[-B, +Repr] extends IterableFactory[B, ({ type L[X] = Repr })#L]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MonomorphicIterableFactory makes sense from the POV of collection implementations. Calling it CanBuild would better convey the meaning to users.

@@ -32,14 +42,11 @@ final class TreeSet[A]()(implicit val ordering: Ordering[A])

// From SortedLike
def range(from: A, until: A): TreeSet[A] = ???

// From SortedPolyTransforms
def map[B](f: (A) => B)(implicit ordering: Ordering[B]): TreeSet[B] = ???
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstrainedIterablePolyTransforms extends IterablePolyTransforms in order to get the linearization right for the purpose of overload resolution, so this override is no longer required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not an override, just an implementation, FYI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to compile master without this line:

[info] Compiling 37 Scala sources to /Users/szeiger/code/collection-strawman/target/scala-2.12/classes...
[error] /Users/szeiger/code/collection-strawman/src/main/scala/strawman/collection/immutable/TreeSet.scala:10: class TreeSet needs to be abstract, since method map in trait SortedPolyTransforms of type [B](f: A => B)(implicit ordering: Ordering[B])strawman.collection.immutable.TreeSet[B] is not defined
[error] (Note that A => B does not match A => B: their type parameters differ)
[error] final class TreeSet[A]()(implicit val ordering: Ordering[A])
[error]             ^
[error] one error found

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s what I meant by “this was an implementation”.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not though. The error message is misleading. I haven't tried to track this down further but there is no abstract map method anywhere in the inheritance hierarchy of TreeSet. It inherits two concrete implementations but without enforcing CIPS <: IPS the types don't line up.

@@ -29,14 +30,21 @@ class TraverseTest {
def traverseTest: Unit = {

val xs1 = immutable.List(Some(1), None, Some(2))
optionSequence(xs1)(immutable.List.canBuild[Int]) // TODO Remove explicit CanBuild parameter after https://issues.scala-lang.org/browse/SI-10081 is fixed?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still require an explicit parameter even with SI-10081 fixed


val xs2 = immutable.TreeSet(Some("foo"), Some("bar"), None)
optionSequence(xs2)(immutable.TreeSet.canBuild[String])
val o2 = optionSequence(xs2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explicit factory required, even for constrained collections. Both the previous implementation here and my old attempt in #39 needed it.


// This use case from https://github.com/scala/scala/pull/5233 still eludes us
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering to resurrect the idea of path-dependent types for CanBuild scoping, essentially unifying the new way I implement to (which doesn't need to infer a unary type constructor anymore) with the implicit builder factories. This would not enable a type-driven breakOut but a value-driven approach with an explicit factory similar to the syntax below (like in to).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I agree that we should try something like this.

@szeiger
Copy link
Contributor Author

szeiger commented Mar 9, 2017

I tried this trick for a CanBuildFrom-like abstraction:

trait Iterable[+A] extends IterableOnce[A] with IterableLike[A, Iterable] {
  final type Build[-B, +Repr] = MonomorphicIterableFactory[B, Repr]
  ...
}

// Testing:
def pair[A, Repr <: Iterable[A], To](xs: Repr with Iterable[A])(implicit fi: Repr#Build[(A, A), To]): To = ???

val xs5 = immutable.List(1, 2, 3)
val p1 = pair(xs5)
val p1t: immutable.List[(Int, Int)] = p1

My understanding of the implicit scopes in 7.2 (http://www.scala-lang.org/files/archive/spec/2.11/07-implicits.html) is that Repr#Build[(A, A), To] first puts the companion object of Repr into the implicit scope and then dealiases Build but scalac disagrees. The Repr companion is not part of the scope. Am I misunderstanding the spec or is this a bug?

@Ichoran
Copy link
Contributor

Ichoran commented Mar 10, 2017

You know, I'm getting worried that this level of abstraction is no better than and possibly even worse than CanBuildFrom. ConstrainedIterablePolyTransforms is particularly bad--I can't imagine that anyone who had trouble with CBF would find CIPT simple. (And the various self-types and factories are not terribly transparent either, I don't think, to a less expert user.)

I think if this rewrite is going to be a success on the simplification dimension, we're going to have to rely less on clever and elaborate type encodings and more on the compiler to do the thing we want by default. If this means altering the scope or search strategy of implicits, I think that's fine. If we drop the simplification part--that is, understanding how collections work remains a dark art of the high wizards of Scala--then the existing design I think is quite good. And I'm not totally opposed to dropping that part; having better behavior and better implementation improves things quite nicely. But perhaps we do want the simplification, and then I think this is getting to be too weighty.

In particular, I would take a completely orthogonal approach to constraints: a constrained collection does not even live in the same hierarchy, but requires boxing to become a full-fledged collection member. We're partway there with ConstrainedIterablePolyTransforms, and I think if you don't care to understand the four type parameters very carefully that using CIPT is actually quite straightforward. But having coexisting constrained and unconstrained methods makes things especially tricky to implement.

I'll try to come up with a sketch this weekend where the hierarchy is held together more loosely, and where to get the full power of collections you may need to box into them. It's more of a breaking change, so maybe it's not going to work. But this experiment is convincing me that you can't really get CBF-like functionality without CBF-like complexity.

@lrytz
Copy link
Member

lrytz commented Mar 10, 2017

In its defense, CIPT almost doesn't leak to the parts of the collections hierarchy which have nothing to do with constrained collections. This is very different from CBF, which are everywhere.

It does add complexity related to overriding order and overloading resolution. But this looks good to me:
image

Compared to the 2.12 version
image

/** Transforms over iterables that can return collections of different element types for which an
* implicit evidence is required.
*/
trait ConstrainedIterablePolyTransforms[+A, +C[A], +CC[X] <: C[X], Ev[A]] extends ConstrainedFromIterable[CC, Ev] with IterablePolyTransforms[A, C] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+C[A] could be +C[_] (also in IterablePolyTransforms)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already removed a bunch of unnecessary names. I must have overlooked these ones.

@szeiger
Copy link
Contributor Author

szeiger commented Mar 10, 2017

The simplification vs CanBuildFrom is two-fold:

  1. You only see the constraint-related complexity in the collection types that need them (i.e. arrays and sorted collections).

  2. Compiler errors and auto-complete results look the way you expect despite abstracting over the evidence type because you're always dealing with a concrete instantiation where the evidence type is known.

For users I think this is vastly simpler than CanBuildFrom. For collection implementors it makes things somewhat easier because they can reuse ConstrainedIterablePolyTransforms for new evidence types. I agree that implementing collections is still difficult but that seems to be inherent complexity that we cannot remove. Either we abstract over possible constraints everywhere (CanBuildFrom) or we have two versions of all poly-transforms for constrained and unconstrained collections.

Note that adding the unconstrained collection type constructor as another new type parameter is independent of abstracting over the evidence type. SortedPolyTransforms should also have done it so it can extend IterablePolyTransforms, otherwise you need to override all methods in all implementations to get the linearization right.

But this experiment is convincing me that you can't really get CBF-like functionality without CBF-like complexity.

We're already sacrificing functionality in this design: There's no automatic fallback to building an unconstrained collection and you can't implement a breakOut equivalent. Maybe more importantly, while we can implement methods like Future.sequence with the current level of functionality we cannot implement the improved version from scala/scala#5233, which is what I was trying to do with the Repr#Build trick above. This requires something similar to breakOut, not because we want to build a target type that is different from the source type but because the target type may require a binary type constructor.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 10, 2017

@lrytz - Yes, I like the improvement of usage you get from CIPT, as I alluded to earlier! And I do like that it's kept away from many of the collections.

@szeiger - Actually I view the lack of automatic fallback an advantage, and I'm not saying that we shouldn't have constrained and unconstrained collections inherit different things. I'm actually wondering whether we can additionally simplify things by allowing the inheritance to vary even more. I'm reasonably happy with the existing design. I'm just looking for additional ways to simplify.

For instance, with CIPT we have a fourth type parameter, C[_] or C[A], that exists solely to implement a single widening method. Nothing in CIPT cares (yet) that this method exists, but it complicates the type signature of the trait (both by having it and by requiring CC[X] <: C[X]). Can we just drop it? Or do we need it as a bridge to the less constrained methods? If we need it as a bridge, can we instead require a box? (Maybe this has too big a performance hit.)

I think actually a good bit of the problem I sense intuitively is the VeryLongAndExplicitClassNames combined with [A, C[_], X]. It feels like there are these giant things you must focus on (why else would they be so long?) and then there's cryptic type muddle where you can't tell what is for what and why. Maybe shorter trait names and longer generic parameter names would make it more self-documenting. (But I'm trying to put myself in the shoes of someone less familiar, which is a difficult thing to do accurately.)

@szeiger
Copy link
Contributor Author

szeiger commented Mar 10, 2017

Actually I view the lack of automatic fallback an advantage

I don't have strong feelings either way, but I'm OK with removing automatic fallback.

For instance, with CIPT we have a fourth type parameter, C[_] or C[A], that exists solely to implement a single widening method.

Implementing the unconstrained method there is just a side-effect of having C available. It is needed for with IterablePolyTransforms[A, C] to get the linearization right.

It feels like there are these giant things you must focus on (why else would they be so long?) and then there's cryptic type muddle where you can't tell what is for what and why

I wrote this code in the established Scala style of one-letter type parameters (already deviating a bit with CC and Ev because I didn't just want to pick some random single letter), but in general I would often prefer more expressive names (see also scala/scala#5265).

@Ichoran
Copy link
Contributor

Ichoran commented Mar 10, 2017

Ah, right. So, if you didn't need to inherit IterablePolyTransforms, you wouldn't need that C. Which gets back to my thought of making things less tied into the hierarchy.

There is precedent for longer type parameter names in CanBuildFrom.

@plokhotnyuk
Copy link

Could you, please, state performance as one of main requirements for new collections.
Currently, in hot path, replacing usage of Scala's collections by Java's, because client want to pay less AWS bills

@SethTisue
Copy link
Member

@plokhotnyuk use a separate ticket, please

@Ichoran
Copy link
Contributor

Ichoran commented Mar 13, 2017

@szeiger - I played with a few variants of CIPT that have only three type parameters, and though I didn't get all the way through the necessary changes they were promising enough that I am now happy with this approach: I think there is a round or two of simplification that could be applied probably without any compiler changes. So that makes going with this exactly and then slightly refactoring a pretty safe bet. (Without getting everything compiling I couldn't tell whether type inference would work on the new case, but we can tweak type inference I imagine, as long as the inference is a sensible thing to do in general.

Copy link
Contributor

@Ichoran Ichoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strategy looks a little imposing for less-expert users but is nicely usable, and I think there are likely to be refactorings that somewhat reduce how imposing it is. This is working, so let's go with it, absent good ideas for how to improve it.

*/
trait ConstrainedIterablePolyTransforms[+A, +C[A], +CC[X] <: C[X], Ev[A]] extends ConstrainedFromIterable[CC, Ev] with IterablePolyTransforms[A, C] {

protected def coll: Iterable[A]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the advantage of that compared to having a self type annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as in IterablePolyTransforms. It allows the use for pseudo-collection types like String and Array.

*/
def to[C[X] <: Iterable[X]](fi: FromIterable[C]): C[A @uncheckedVariance] =
def to[F <: FromIterableBase[A]](fi: F): F#To[A @uncheckedVariance] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use fi.To[A] as the return type instead of the F#To[A] projection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it shouldn't matter but I generally try to avoid path-dependent types when type projections are sufficient. With path-dependent types you always need to take care to get the paths right. It is easy to get into a situation where two types are "obviously" the same but the compiler doesn't agree. This is easier with type projections. Everything's fine as long as they dealias to the same type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Dotty will not accept F#To because it might be unsound. So we should definitely stick with fi.To[A].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this kind of type projections is going to be removed from Dotty because they are unsound. Since in our case the fi parameter would always be a static object I think that using a path-dependent type would not suffer from the problems you are mentioning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, this looked safe enough for me but I haven't tested it on Dotty. It's pretty much the same as scala/scala3#1051 (comment) though, so it won't work. I'll change it to use the path-dependent type instead.


implicit def canBuild[A]: () => Builder[A, C[A]] = () => newBuilder[A] // TODO Reuse the same instance
def constrainedNewBuilder[A : Ev]: Builder[A, CC[A]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name it newConstrainedBuilder instead of constrainedNewBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, too, but it sounds like a method that creates a ConstrainedBuilder, whereas constrainedNewBuilder is consistent with constrainedFromIterable.

// variance seems sound because `to` could just as well have been added
// as a decorator. We should investigate this further to be sure.
fi.fromIterable(coll)

def to[C[_], Ev[_]](fi: ConstrainedFromIterable[C, Ev])(implicit ev: Ev[A @uncheckedVariance]): C[A @uncheckedVariance] =
fi.constrainedFromIterable(coll)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that’s unrelated to this PR, but I guess this method does not work with collections having binary type constructors (e.g. TreeMap), right? BTW, does the unconstrained one work with HashMap?

Copy link
Contributor Author

@szeiger szeiger Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this still needs to be improved. The unconstrained version cannot work for binary type constructors. We only have an element type A in Iterable. While we can restrict it to work only for (A1,A2) there is no way to compute these types A1 and A2 only through FromIterable's upper bound. Conversions to maps either need another special case or a ConstrainedFromIterable. Since we're special-casing Map in other places we should probably overload to with two additional cases for unconstrained and constrained maps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CanBuildFrom-like would nicely solve these issues. It’s probably OK if we have it only at this place instead of having it everywhere as in the current collections.

@julienrf
Copy link
Contributor

👍 to this work.

Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this goes in the right direction. But we need to correct for complexity. For me a good rule of thumb is that complexity grows with the product of number of type parameters on a given class or method. So 4 type parameters for CIPT is a lot! It would be great if we could get rid of at least one. This seems to be possible since we seem to agree that automatic widening is not needed.

I also dislike the B parameter on FromIterable. We should be particularly careful to minimize type parameter lists of widely used base traits like it. It seems in most cases the parameter is instantiated to Any. Which makes me wonder:

  • can we find a less visible way to represent the parameter, e.g. a as a type member.
  • or, should we not bother at all about this? How much code duplication would we get if we special cased those cases where the bound is not Any?

Let's also try to not go overboard with long names.


def newBuilder[A <: B]: Builder[A, C[A]]

protected[this] lazy val monomorphicIterableFactoryProto: MonomorphicIterableFactory[B, C[B]] = new MonomorphicIterableFactory[B, C[B]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to shorten this name, unless we want to give a great opportunity for ridicule. So my vote goes to CanBuild and canBuildProto (or maybe we can even leave out the proto?)

*/
def to[C[X] <: Iterable[X]](fi: FromIterable[C]): C[A @uncheckedVariance] =
def to[F <: FromIterableBase[A]](fi: F): F#To[A @uncheckedVariance] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Dotty will not accept F#To because it might be unsound. So we should definitely stick with fi.To[A].


/** Base trait for instances that can construct a collection from an iterable */
trait FromIterable[+C[X] <: Iterable[X]] {
def fromIterable[B](it: Iterable[B]): C[B]
trait FromIterable[-B, +C[_]] extends FromIterableBase[B] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the bound here? It would be great if we could avoid the additonal B parameter.

@szeiger
Copy link
Contributor Author

szeiger commented Mar 13, 2017

@odersky

It would be great if we could get rid of at least one. This seems to be possible since we seem to agree that automatic widening is not needed.

I don't see how. We need the unconstrained type constructor for manual widening and to get the linearization right. There is no automatic widening in this version.

can we find a less visible way to represent the parameter, e.g. a as a type member.

I would have preferred that, too, but due to variance the default would be Nothing if you don't specify anything else but we need Any. We could have a type alias for the Any case but I think it would only complicate things further. OTOH, how often do you need to use FromIterable? Most users will only see it in to and in the factory methods (if they look at the actual source; scaladoc and IDEs show the instantiated Any bound in most cases).

or, should we not bother at all about this? How much code duplication would we get if we special cased those cases where the bound is not Any

String and BitSet are the cases that we know about. They would either need special-casing or be modeled as constrained types with implicit evidence. I would like to avoid the latter due to the performance implications.

@odersky
Copy link
Contributor

odersky commented Mar 13, 2017

@szeiger My tendency would be to special case String and BitSet. The reasoning is that it's better to require some duplication for implementers of non-parametric collection types than to complicate the framework to a point where it is too hard for them to extend it anyway. The existing collections were a lesson in that respect.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 13, 2017

Also in favor of special-casing: it is important that String especially but also BitSet be really fast, which means that many things will be written as a special case anyway.

I'll try to finish up some of my 3-parameter CIPTs tonight or the next day. The basic idea is to drop the inheritance of IPT in CIPT and push responsibility for putting something sensible for C[A] down to the leaf class implementations and/or use protected methods to enforce sensible linearization. I got multiple things that compiled with very little work, but haven't had time to test them all in enough depth to know what flaws they might have.

@szeiger
Copy link
Contributor Author

szeiger commented Mar 19, 2017

I've just pushed a new commit. Ever since the implicit builders were merged I had this nagging feeling that we had the whole thing backwards. In the new design the source collection is responsible for building a target collection, we do not rely on an implicit CanBuildFrom for that, so why should we do that for implicit builders? It comes with the same old problem of getting different results depending on the static type (because we rely on implicit search in companion objects to decide which builder to use) and it also caused me all kinds of problems trying to unify implicit builders and companion objects.

So I went back to the drawing board and started with the assumption that the source collection has to decide what to build. We already had Buildable for collections that allow eager building of the same collection with the same element type. A more consistent name for it is MonoBuildable and the newBuilder method is now called newBuilderWithSameElemType. This is consistent with IterableMonoTransforms and fromIterableWithSameElemType. What we were missing though was the poly version of it, so I added PolyBuildable. Unlike MonoBuildable it is not limited by variance, so its newBuilder method can be public. This allows anyone who has a strict unconstrained collection (i.e. a PolyBuildable) to build the same collection type with any element type. For constrained collection types I added ConstrainedPolyBuildable with an extra implicit evidence to mirror ConstrainedFromIterable.

Now we have everything in place to write methods outside the collection framework (like Future.sequence) in the same style as methods within, i.e. by overloading for constrained and unconstrained collection types:

  def optionSequence1[C[X] <: Iterable[X] with PolyBuildable[X, C], A](xs: C[Option[A]]): Option[C[A]] =
    xs.foldLeft[Option[Builder[A, C[A]]]](Some(xs.newBuilder)) {
      case (Some(builder), Some(a)) => Some(builder += a)
      case _ => None
    }.map(_.result)

  def optionSequence1[Ev[_], CC[_], A](xs: ConstrainedPolyBuildable[Option[A], CC, Ev] with Iterable[Option[A]])(implicit ev: Ev[A]): Option[CC[A]] =
    xs.foldLeft[Option[Builder[A, CC[A]]]](Some(xs.newConstrainedBuilder)) {
      case (Some(builder), Some(a)) => Some(builder += a)
      case _ => None
    }.map(_.result)

Since having to overload everything (if you want to support constrained collection types) is rather awkward, the next step was how to unify both through an implicit value. The result is essentially CanBuildFrom but I changed the To type from a type parameter to a type member because it is always a computed result type and not a type that guides the computation:

trait BuildFrom[-From, -Elem] {
  type To
  def newBuilder(from: From): Builder[Elem, To]
  def fromIterable(from: From)(it: Iterable[Elem]): To
}

Note that there is no method to get a Builder without a From instance. The source collection is still responsible for building!

While usage is essentially the same as for CanBuildFrom (we could even define a type alias to make it look the same) the way the implicits are materialized is very different. The companion object of BuildFrom has two methods to get a BuildFrom from a PolyBuildable and from a ConstrainedPolyBuildable (if an evidence value is available). This is defined in a self-contained way on top of the collection framework. No collection type has to define its own BuildFrom instances.

All implicit builders in companion objects are gone and so are the BuildStrict, BuildConstrained and BuildStrictConstrained abstractions. None of that is needed anymore.

Now we can write a single method like this that abstracts over PolyBuildable and ConstrainedPolyBuildable:

  def optionSequence[CC[X] <: Iterable[X], A](xs: CC[Option[A]])(implicit bf: BuildFrom[CC[Option[A]], A]): Option[bf.To] = ...

We can even do breakOut, and in way that is much more consistent with the new design than my previous attempt. In addition to the implicit values that BuildFrom provides for buildables, it also defines implicit conversions from all four basic collection factory types (IterableFactory, ConstrainedIterableFactory, MapFactory and ConstrainedMapFactory) to BuildFrom. This allows you to pass any collection factory that you could use in a .to call to a method that expects a BuildFrom:

    val xs4 = immutable.List[Option[(Int, String)]](Some((1 -> "a")), Some((2 -> "b")))
    val o4 = optionSequence(xs4)(immutable.TreeMap)

In fact, we can use it to define to:

  def to[F <: TypeConstrainedFromIterable[A]](fi: F): fi.To[A @uncheckedVariance] =
    fi.fromIterable(coll)

  // Generic version of the method above that can build anything with BuildFrom. Note that `bf` is not implicit.
  // We never want it to be inferred (because a collection could only rebuild itself that way) but we do rely on
  // the implicit conversions from the various factory types to BuildFrom.
  def to(bf: BuildFrom[Iterable[A], A]): bf.To =
    bf.fromIterable(coll)(coll)

The first version is purely for performance: It allows collection types without an evidence value to be built in a more direct way. Everything else uses the implicit conversion to BuildFrom.

FromIterable still extends TypeConstrainedFromIterable (which was called Build in the previous version). We could easily change that so that most collection types do not have to extend TypeConstrainedFromIterable. It would require an extra overload for to to have efficient implementations for both.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 20, 2017

Aha! I like this one. I think it still needs a couple of rounds of simplification and renaming to make it comprehensible and to get rid of some of the veryLongNamesEspeciallyWhenTheyRepeat: LongNameEspeciallyWhenTheyRepeat = new LongNameEspeciallyWhenTheyRepeat things.

But this seems like a better design overall. I have to think through and test cases where the source collection being more in-charge might be a problem. But from first principles, as you say, this seems better: if the source is in charge, dynamic dispatch takes care of loss of type information. Conceptually, then, the issue would be creating collections from scratch. I need to look into that in more detail.

@julienrf
Copy link
Contributor

Note that there is no method to get a Builder without a From instance.

That could be useful to implement unfolds, though.

type To = C[E]
def newBuilder(from: Any): Builder[E, To] = fact.newBuilder
def fromIterable(from: Any)(it: Iterable[E]): To = fact.fromIterable(it)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like smell: we should not have to introduce this Any parameter if we don’t need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need for the "proper" cases (implicit values for poly builders). When you do a manual breakout, the from value is ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could have a subclass like BuildFromAny[-Elem] extends BuildFrom[Any, Elem] with overloads that do not take a from value. This would be enough for the non-implicit use case in to. Is getting rid of the Any parameter here worth the extra complexity of a new class with two new methods?

trait IterableFactories[+C[X] <: Iterable[X]] extends FromIterable[C] {
/** Base trait for instances that can construct a collection from an iterable for certain types
* (but without needing an evidence value). */
trait TypeConstrainedFromIterable[-B] extends Any {
Copy link
Contributor

@julienrf julienrf Mar 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpperBoundedFromIterable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say BoundedFromIterable. The upper is implied if left out.

trait SortedSet[A]
extends collection.SortedSet[A]
with Set[A]
with SortedSetLike[A, SortedSet]

trait SortedSetLike[A, +C[X] <: SortedSet[X]]
extends collection.SortedSetLike[A, C]
with ConstrainedIterablePolyTransforms[A, Set, SortedSet]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with ConstrainedIterablePolyTransforms[A, Set, C]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, would it be possible to pull up this line at the level of collection.SortedSet so that we don’t repeat it for collection.mutable.SortedSet and collection.immutable.SortedSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible.

@@ -331,6 +336,8 @@ class StrawmanTest {
// val xs7 = xs.map((k: String, v: Int) => (new Foo, v)) Error: No implicit Ordering defined for Foo
val xs7 = (xs: immutable.Map[String, Int]).map((k, v) => (new Foo, v))
val xs8: immutable.Map[Foo, Int] = xs7
val xs9 = xs6.to(List).to(mutable.HashMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// Breakout-like use case from https://github.com/scala/scala/pull/5233:
val xs4 = immutable.List[Option[(Int, String)]](Some((1 -> "a")), Some((2 -> "b")))
val o4 = optionSequence(xs4)(immutable.TreeMap) // same syntax as in `.to`
val o4t: Option[immutable.TreeMap[Int, String]] = o4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lrytz
Copy link
Member

lrytz commented Mar 20, 2017

Also like the latest change!

I'm a bit worried about keeping / promoting the "implicit buildFrom" pattern. We are moving away from it for the internal collections implementation and use overloading instead, why should we recommend it for external methods? I see that collection types don't have to define their own BuildFrom instances, which certainly makes things simpler. But there are other issues with implicit BuildFrom, like the complex signatures, Scaladoc, IDE discovery. For example, I defined your optionSequence methods as extension methods, here's how it looks:

image

image

@szeiger
Copy link
Contributor Author

szeiger commented Mar 20, 2017

@Ichoran @julienrf Building an arbitrary target collection will require a factory object. Just as in the case of to it is not type-driven. You have to pass the actual object. You can write it as a single method by using a non-implicit BuildFrom the same way the new to overload does. The alternative is to overload it for the different factory kinds. In fact, this is something I also considered for to but you have to support four factory kinds (unconstrained iterable/map, constrained iterable/map) if you want it to be comprehensive (unlike the to method in the current collections library which does not support map types).

@lrytz I tried it with To as a type parameter instead of a type member but it didn't help, either. I don't think there's much we can do other than using BuildFrom-based methods sparingly. If you want a single definition that can abstract over kinds, implicits are the only way to do it.

@julienrf
Copy link
Contributor

julienrf commented Mar 20, 2017

Building an arbitrary target collection will require a factory object.

Then we will not be able to write generic code as you did with optionSequence, but in cases where we have no “from” collection. A typical use case would be “unfolds”.

With the current collections we can write the following:

  class Unfolder[That] {
    def apply[S, A](init: S)(next: S => Option[(A, S)])(implicit cbf: CanBuildFrom[Nothing, A, That]): That = {
      val builder = cbf.apply()
      …
    }
  }

And then That can be List[X] as well as TreeMap[X, Y].

@szeiger
Copy link
Contributor Author

szeiger commented Mar 20, 2017

Yes, that's correct. It is consistent with to and with the new way of doing breakOut-like operations and it doesn't look like a huge loss to me. In most cases you want to build a specific collection type, so you just use the correct factory object. In my experience with using breakOut the result is usually assigned to a variable, in which case the new syntax is actually shorter because you don't have to spell out the element type (in addition to importing and using breakOut itself). If you want to abstract over collection types you have to pass a CanBuildFrom around and repeat the necessary type parameters for that everywhere. This was non-trivial in the old design and doesn't change much here (it may be slightly simpler now with To being a type member).

@julienrf
Copy link
Contributor

My global feeling about this PR: I definitely think we need a CanBuildFrom abstraction as you proposed, because it makes it possible to write powerful generic methods such as def sequence[F[X] <: Iterable[X], A](xs: F[G[A]])(implicit bf: BuildFrom[F[G[A]], A]): G[bf.To]. However, the hierarchy really becomes more complex with this PR. Maybe we should keep the CanBuildFrom-like thing but keep a simpler hierarchy (at the price of duplicating more things between the four kinds of factories).

@Ichoran
Copy link
Contributor

Ichoran commented Mar 21, 2017

@julienrf - I concur. That was one of the "rounds of simplification" I envisioned, along with one more attempt to check whether any more type parameters can be made type members instead. (Along with a renaming to things that might technically be a little less accurate but are more intuitive, e.g. IterableOfSame instead of IterableMonoTransforms. Same/Other or Alike/Unlike can be useful here.)

Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I am happy for this to go in after some polishings.

def to[C[X] <: Iterable[X]](fi: FromIterable[C]): C[A @uncheckedVariance] =
// variance seems sound because `to` could just as well have been added
// as a decorator. We should investigate this further to be sure.
def to[F <: TypeConstrainedFromIterable[A]](fi: F): fi.To[A @uncheckedVariance] =
fi.fromIterable(coll)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the F parameter here.

def to(fi: TypeConstrainedFromIterable[A]): fi.to[A]

should work, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try. (I'm used to writing type projections where this wouldn't work)

// the implicit conversions from the various factory types to BuildFrom.
def to(bf: BuildFrom[Iterable[A], A]): bf.To =
bf.fromIterable(coll)(coll)

/** Convert collection to array. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about overloading to. Why do we need both versions? Also, naming conventions vary a lot
TypeConstrainedFromIterable and BuildFrom seem to have nothing in common.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need both. The TypeConstrainedFromIterable version covers the most common case and doesn't require an implicit conversion from the factory type to a BuildFrom. It's only there for performance (but I haven't benchmarked it). The BuildFrom version alone is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, actually BitSet doesn't currently work with the BuildFrom version. I'll try to fix it.


/** Transforms over iterables that can return collections of different element types for which an
* implicit evidence is required.
*/
Copy link
Contributor

@odersky odersky Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should explain in comment with examples why we need both C and CC. It looked non-senscial at first to me, and I started to understand only when I looked at instantiations of the class. (of course if we found a way to merge C and CC that would be even better).

trait IterableFactories[+C[X] <: Iterable[X]] extends FromIterable[C] {
/** Base trait for instances that can construct a collection from an iterable for certain types
* (but without needing an evidence value). */
trait TypeConstrainedFromIterable[-B] extends Any {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say BoundedFromIterable. The upper is implied if left out.

@odersky
Copy link
Contributor

odersky commented Mar 22, 2017

Should we try to merge Bounded (aka TypeConstrained) and Constrained? We can always express Bounded as a Constraint where we demand an instance of <:<. How many examples are there where we need Bounded? I know of String and BitSet, are there others?

@szeiger
Copy link
Contributor Author

szeiger commented Mar 22, 2017

I pushed another commit to address @odersky's review comments.

  • Added more docs to ConstrainedIterablePolyTransforms

  • Removed unnecessary type parameter in to(TypeConstrainedFromIterable)

  • Renamed TypeConstrainedFromIterable to BoundedFromIterable

The problem with BitSet was caused by the companion object not extending IterableFactory because it is type-constrained. I fixed it by adding a supertype BoundedIterableFactory that covers all cases. OTOH we don't really need BoundedFromIterable so I removed that in exchange. This means that the "bounded" abstraction is only visible in companion objects but not in the collection types.

@szeiger szeiger merged commit a324440 into scala:master Mar 23, 2017
@julienrf
Copy link
Contributor

julienrf commented Apr 13, 2017

@lrytz I noticed that I don’t have the same result as you:

bitset-map

What bothers me is this BitSet.this.Ev thing (instead of Ordering)… Which version of IntelliJ are you using? (I’m using 2017.1.1)

@lrytz
Copy link
Member

lrytz commented Apr 13, 2017

@julienrf this changed at some point between my comment (#45 (comment)) and now. On the current master branch I get the same as you.

@julienrf
Copy link
Contributor

@szeiger In practice, do you think we will have other instantiations of Ev than Ordering? The current design leads to some issues, notably when used with dotty: this method, by taking an unconstrained type parameter Ev makes the implicit search explode. Also, there is the IDE issue that I reported above.

What would be the impact of removing this abstract layer ConstrainedPolyBuildable and having just an OrderingConstrainedPolyBuildable (with no Ev type parameter)?

@szeiger
Copy link
Contributor Author

szeiger commented Apr 28, 2017

I can't think of any other instantiations in the core collection library. Apart from some reusable interfaces we wouldn't lose much by not abstracting over Ev[_]. People could still define collection types with other constraints by duplicating the interfaces. The only additional work required in these cases is implementing the CanBuild implicits (but as you point out abstracting over Ev[_] for these implicits doesn't work correctly in Dotty anyway).

szeiger added a commit to szeiger/collection-strawman that referenced this pull request Jun 12, 2017
`BuildFrom` is like `FromSpecificIterable` but with an extra `From`
argument, like in the final version of scala#45. `FromSpecificIterable`
existed conceptually in that version as `BuildFrom[Any, …]` but didn’t
have a separate type.

This new version has separate abstractions for buildable (strict)
collection types in the form of `StrictBuildFrom` and
`FromSpecificIterableWithBuilder`. Since we can get a surrogate builder
(through one of the new `Builder.from` methods) for any lazy collection
and we can restrict code to work only with strict collections via the
`Buildable` trait, this is not strictly necessary. The only reason for
separating the `Builder` abstractions is to avoid exposing them in
`FromSpecificIterable`. Even though everything can be built in a strict
way, these abstractions sit on top of the lazy ones, not below them.
julienrf pushed a commit that referenced this pull request Jun 29, 2017
`BuildFrom` is like `FromSpecificIterable` but with an extra `From`
argument, like in the final version of #45. `FromSpecificIterable`
existed conceptually in that version as `BuildFrom[Any, …]` but didn’t
have a separate type.

This new version has separate abstractions for buildable (strict)
collection types in the form of `StrictBuildFrom` and
`FromSpecificIterableWithBuilder`. Since we can get a surrogate builder
(through one of the new `Builder.from` methods) for any lazy collection
and we can restrict code to work only with strict collections via the
`Buildable` trait, this is not strictly necessary. The only reason for
separating the `Builder` abstractions is to avoid exposing them in
`FromSpecificIterable`. Even though everything can be built in a strict
way, these abstractions sit on top of the lazy ones, not below them.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants