Smarter handling of "block" and "headline" formatting. #1687

munificent · 2025-03-28T01:45:50Z

The formatter has a notion of "block formatting" that comes into play in places like:

// No indentation needed for RHS here:
variable = [
  block,
  formatted,
];

// Indentation needed here:
variable =
    notBlock +
    formatted;

A hard design question in the formatter is whether we can determine if an AST node is "block-formatted" or "block-formattable" eagerly based just on the AST or whether it has to be figured out later during solving.

Eagerly is appealing because we have access to the full AST and know what the actual source code the user wrote is before it gets lowered to a Piece tree that might obscure that. (In other words, there might be two ASTs that we want to format differently that get lowered to Pieces that are hard to distinguish.) It's also generally good for performance to make as many formatting decisions as we can before going into the solver.

But it turns out that there are cases where we really don't know if a AST node will end up block shaped or not. Consider:

variable = (target + expression).method(argument, another);

If we're able to fit the method target on one line and instead split in the argument list, then the RHS of = does end up block-shaped:

variable = (target + expression).method(
  argument,
  another,
);

Nice. But if the target is too long for that to work, then we should not consider the method call block-shaped. We don't want to allow:

variable = (target +
      expression).method(
  argument,
  another,
);

But as you can see, we can't tell that just looking at the AST. It depends on how the formatter formats that target expression.

Alas, I didn't realize this problem early on in the rewrite and by the time it became clear to me, it was too late to start over again. Since then, I've tried several times to refactor towards a model where we can express these shaped-based formatting constraints during solving without (hopefully) completely tanking performance.

This PR is that. I've broken it into a series of commits to try to make it more digestible, but it's still a pretty big change. But it significantly improves the output and I think also builds a foundation to support other more sophisticated style rules we might want.

Here's the basic idea. Much of formatting is about resolving constraints between parent and child nodes. For example, we want to avoid output like:

operand + (anotherOne *
    thingOperand) + fourth;

Here, there is a split in the * expression but it doesn't cause the surrounding + expressions to split. That makes for very hard to read code. The way the formatter handles this is by having constraints where it says "if a newline occurs inside child X, then parent Y can't be in state Z." In the case of InfixPiece, the constraint is simply that a newline in any operand forces the InfixPiece to fully split.

It's a really simple model. Each Piece defines a list of states, and then for each state, it defines which children may contain newlines. That gets you surprisingly far. But it struggles with code like:

variable = (target + expression).method(
  argument,
  another,
);

// Versus:
variable = (target +
    expression).method(argument, another);

Here, there is a newline in the RHS child of the assignment in both cases, but they're different kinds of newlines. We want the first one to be OK and allow the = on the same line as the target, but not the second one.

To support this, we expand the constraint model a bit. Instead of just "newline or not", the presence of newlines in a child piece puts it into one of a few shapes. The shapes are:

// Shape.inline:
all + one + line;

// Shape.block:
[
  delimited,
  andBlockIndented,
]

// Shape.headline:
somethingOnFirstLine
    .moreStuff()
    .afterThat();

// Shape.other:
anything +
    else;

Then we allow pieces to constrain their children not just to disallow newlines but to permit certain shapes and prohibit others. In practice, basically AssignPiece is the only one that really cares about this. But given how prevalent =, :, and => are (which all use AssignPiece), that ends up having a large impact on formatting quality.

In order to get this working, I also had to add some other machinery like a slightly smarter system for handling indentation. And it let me get rid of some of the other ways of modelling constraints (like most uses of the applyConstraints() API).

The main goal is to finally be able to fix #1465 and #1466 in a non-hacky robust way. But I hope this will make it easier to implement other style changes too. By pushing more of the formatting choices into the back end where we know how the child actually formatted, the parent can take that into account.

Aside from fixing those issues, these changes also tweak the output around assignments with complex operands in ways that are hard to describe precisely. Given a piece of code like:

var constructedRequest = new Request()
  ..campaignId = (rollup.campaignId == null ? new Int64(0) : rollup.campaignId);

It could be formatted a variety of ways with no clear best:

var constructedRequest = new Request()
  ..campaignId = (rollup.campaignId == null ? new Int64(0) : rollup.campaignId);

var constructedRequest = new Request()..campaignId =
    (rollup.campaignId == null ? new Int64(0) : rollup.campaignId);

var constructedRequest =
    new Request()
        ..campaignId = (rollup.campaignId == null
            ? new Int64(0)
            : rollup.campaignId);

And with the scale of this refactoring, sometimes the formatter makes different choices. I plan to tweak it a little more going forward (now that it's easier to do so). But, overall, I don't think it makes any of these cases worse, and supporting headline-style formatting makes a lot of code much better.

Unfortunately, this PR regresses performance. Roughly ~20% on the microbenchmarks. As far as I can tell, there are no new pathological performance corners, it's just a little uniformly slow everywhere. I have some ideas on how to claw the performance back but if not, I can live with it. So far, users have complained about the output but not performance.

Currently, a parent can only indicate whether a child can contain newlines or not when determining whether a solution is invalid. This works fine for most pieces, but isn't sophisticated enough to allow fixing #1465 and #1466 where we want to say that the child can have newlines but only in certain styles. This is a step towards supporting those more sophisticated constraints. It doesn't change the underlying semantics at all, but it replaces the basic boolean newlines yes/no, with a Set<Shape>.

Instead of pushing a raw integer for how much indentation a piece adds, use an enum with different values for each kind of indentation. Then, instead of a separate "canCollapse" parameter, have some logic that understands that certain pairs of semantic indentation can be merged. The hope is that this should then simplify some of the complex indentation handling in AssignPiece (and maybe elsewhere). There's no behavioral change or deep refactoring in this commit. It just replaces the int indentation and "canCollapse" parameter with an enum. Note that the old short style code also has a class named Indent which has static constants for various indentation levels. The tall code also used that same class. Now it uses its own Indent class which is an actual enum. That's why much of the change in this commit is just removing an import. The existing uses of Indent silently switch over to the new enum in code_writer.dart.

It looks nice if we align binary operands when the first operand is on its own line, which is often the case after `=`, `:`, and `=>`: ```dart variable = operand + another; ``` This is also true for adjacent strings and conditional expressions. Previously, we had some somewhat hacky code that looked at the AST node surrounding an expression to determine if it should indent itself or not. This removes that in favor of using the new merging ability of Indent. We use a separate Indent type for assignment and infix operators and then merge those two together. This simplifies a bunch of things and lends itself to more uniform formatting since every AST node that lowers to a piece using Indent.infix will get this behavior. In fact, this commit highlighted that `is` and `as` expressions currently *aren't* formatted the same way as other binary operators, which I think was just an oversight. This commit still formats them differently, but I'll probably change that later. For now, it just leaves TODOs. This commit does change the formatter behavior very slightly, but only in obscure edge cases.

This is (hopefully) the big commit. It's largely a refactoring but also has some style changes. Most of the style changes in places where the prior formatting was unintentional or a compromise to the old architecture. The changes in this commit are: - Add a new "block" shape. Pieces that split into delimited lists (collection literals, blocks, argument lists) have block shape. - Change AssignPiece, CasePiece, ChainPiece, and ForInPiece to use that for handling their block-formatted child pieces. Previous commits already let a piece constrain its children to a subset of the allowed shapes, and now we use that to let them say not just whether not a newline is allowed, but what resulting shape the child must have. - Add some machinery to CodeWriter to control the shape of the resulting piece when a newline is written. Coming up with a nice API for that is pretty hard, but I think what I have here works OK. There is also some machinery to determine how a child piece's shape affects the resulting shape of the parent. It isn't used heavily yet but will probably be used more in future changes to tweak the style for more complex nested expressions. This commit has two negative consequences: - Perf is regressed roughly 15% across the board. That's not great, but it's not catastrophic. To avoid a new pathological corner case, I added a hack where nested `=>` expressions (which occur when someone really wants to write Haskell in Dart) always force the outer ones to split. Without that, curried code gets catastrophically slow. I have some ideas on how to get some performance back but I'll save that for later. - Some "headline"-shaped output is no longer supported. It's always been a goal of the formatter to support output like: ```dart variable = target .method() .another(); ``` Prior to this commit, that *sort of* worked for most expressions, but really only incidentally and you could run into places where it should work but didn't. Moving to shape-based constraints makes it uniformly *not* work. In a future commit, I plan to add another "headline" shape that will get code like this working again, reliably. On the plus side, the behavior of the formatter is overall more consistent with this change.

We now allow some non-block shaped pieces on the right side of an assignment without splitting if it those pieces are "headline"-shaped. That means they have a single first line that makes sense on the `=` line followed by other subordinate code. There are three kinds of expressions that support that: ```dart // Conditionals: variable = condition ? thenBranch : elseBranch; // Split method chains: variable = target .method() .another(); // (Also with unsplit properties): variable = target.some.props .method() .another(); // Multi-line strings: variable = ''' A very long string that is multiple lines.'''; ```

kallentu

I'm happy with the structural changes and the commits were nice to go through. Thanks for working on this, Bob.

lib/src/back_end/code_writer.dart

test/tall/invocation/chain_comment.stmt

lib/src/piece/assign.dart

(Most were fixed by #1687, but some were already better.) Close #900 (!). Close #1525. Close #1526. Close #1603. Close #1685.

I was trying out the new formatter on a corpus after #1687 and saw a bunch of diffs like: ```dart // Before: Widget( name: some.property.chain, ) // After: Widget( name: some .property .chain, ) ``` While the new headline style looks nice for most method call chains, it looks weird if it's only just a series of properties. In particular, a single `foo.bar` looks really silly when split. This PR bumps the split cost of a chain up if it only contains properties. That leads the solver to prefer splitting the surrounding construct when possible.

(Most were fixed by #1687, but some were already better.) Close #900 (!). Close #1525. Close #1526. Close #1603. Close #1685.

* Avoid splitting chains containing only properties. I was trying out the new formatter on a corpus after #1687 and saw a bunch of diffs like: ```dart // Before: Widget( name: some.property.chain, ) // After: Widget( name: some .property .chain, ) ``` While the new headline style looks nice for most method call chains, it looks weird if it's only just a series of properties. In particular, a single `foo.bar` looks really silly when split. This PR bumps the split cost of a chain up if it only contains properties. That leads the solver to prefer splitting the surrounding construct when possible. * Add test for property chain that still doesn't fit.

passsy · 2025-04-01T19:41:29Z

Interesting read, thanks for taking the extra time!

This is already fixed (probably by #1687). Close #1645.

This was already fixed by #1687. Close #1712.

munificent added 6 commits March 19, 2025 18:46

Fix double indentation of const pattern.

ccce7e8

munificent requested review from kallentu and natebosch March 28, 2025 01:45

Fix TODO label.

f68f173

kallentu approved these changes Mar 28, 2025

View reviewed changes

lib/src/back_end/code_writer.dart Show resolved Hide resolved

test/tall/invocation/chain_comment.stmt Show resolved Hide resolved

lib/src/piece/assign.dart Outdated Show resolved Hide resolved

Remove debug code.

9e48c35

munificent merged commit da2d0fa into main Mar 28, 2025
5 checks passed

munificent deleted the shape branch March 28, 2025 21:46

munificent added a commit that referenced this pull request Mar 28, 2025

Add regression tests for issues that are fixed now.

81519c9

(Most were fixed by #1687, but some were already better.) Close #900 (!). Close #1525. Close #1526. Close #1603. Close #1685.

munificent mentioned this pull request Mar 28, 2025

Add regression tests for issues that are fixed now. #1688

Merged

munificent mentioned this pull request Mar 28, 2025

Avoid splitting chains containing only properties. #1689

Merged

munificent added a commit that referenced this pull request Apr 1, 2025

Add regression tests for issues that are fixed now. (#1688)

496740a

(Most were fixed by #1687, but some were already better.) Close #900 (!). Close #1525. Close #1526. Close #1603. Close #1685.

This was referenced Apr 1, 2025

Keep qualified names tighter than the containing expression #1557

Open

Ability to use short style in newer Dart versions #1684

Closed

munificent added a commit that referenced this pull request Apr 2, 2025

Add regression test for #1645.

a75242f

This is already fixed (probably by #1687). Close #1645.

munificent mentioned this pull request Apr 2, 2025

Add regression test for #1645. #1697

Merged

munificent added a commit that referenced this pull request Apr 2, 2025

Add regression test for #1645. (#1697)

cd73f3e

This is already fixed (probably by #1687). Close #1645.

munificent added a commit that referenced this pull request Apr 24, 2025

Add regression test for #1712.

9727fb8

This was already fixed by #1687. Close #1712.

This was referenced Apr 24, 2025

Add regression test for #1712. #1715

Merged

Unnecessary split in parameter that receives an anonymous function #1712

Closed

munificent added a commit that referenced this pull request Apr 24, 2025

Add regression test for #1712. (#1715)

e7264fc

This was already fixed by #1687. Close #1712.

munificent added a commit that referenced this pull request Jul 28, 2025

Add regression test for #1712. (#1715)

226c90e

This was already fixed by #1687. Close #1712.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Smarter handling of "block" and "headline" formatting. #1687

Smarter handling of "block" and "headline" formatting. #1687

Uh oh!

munificent commented Mar 28, 2025

Uh oh!

kallentu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

passsy commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Smarter handling of "block" and "headline" formatting. #1687

Smarter handling of "block" and "headline" formatting. #1687

Uh oh!

Conversation

munificent commented Mar 28, 2025

Uh oh!

kallentu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

passsy commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants