Skip to content

[SwiftLexicalScopes][GSoC] Add simple scope queries and initial name lookup API structure #2696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 5, 2024

Conversation

MAJKFL
Copy link
Contributor

@MAJKFL MAJKFL commented Jun 23, 2024

This is an initial PR for the SwiftLexicalScopes, developed as one of the GSoC 2024 projects.

In this PR:

  • Introduced three basic queries lookupLabeledStmts, lookupFallthroughSourceAndDest and lookupCatchNode.
  • Added test functions that will guide development of the full scope name lookup.
  • Created initial structure of the lexical scope API hierarchy with FileScope as the root.

This is my first contribution to the project, so I would greatly appreciate if someone could double-check everything. Thank you!

@MAJKFL MAJKFL requested review from ahoppen and bnbarham as code owners June 23, 2024 17:30
@ahoppen ahoppen requested a review from DougGregor June 24, 2024 17:25
Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wohooo. Exciting to see this happen. I’ve left a few comments inline. The overall direction that we’re going in is great. And thanks for all the test cases.


var sourceSyntax: SyntaxProtocol { get }

func getDeclarationFor(name: String, at syntax: SyntaxProtocol) -> Syntax?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should call this lookup and have it return an array of DeclSyntax nodes. It is possible to shadow declarations across Scopes in Swift, and I would expect a lookup in a nested scope to be able to see both declarations. That makes it the client's responsibility to overlay a tie-breaking policy (e.g. deepest decl wins)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I think it could be a great idea to include both of them, since when looking up with shadowing we could stop as soon as we find a name match adding a bit of optimization. Something like lookup and lookupWithShadowing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without considering shadowing, it's possible to find multiple declarations that would need to be returned. This can happen with overloaded local functions, like this:

func f() {
  func g(x: Int) { }
  func g(x: Double) { }

  g(x: 3)
  g(x: 3.14159)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I haven’t thought about this case. You’re right, thanks for pointing this out. I adjusted the Scope protocol accordingly.

@CodaFi
Copy link
Contributor

CodaFi commented Jun 24, 2024

I think that this API would emphasize syntax over scope and add entrypoints as such.

My suggestion would be

protocol ScopeSyntax: SyntaxProtocol {
  func parent() -> (any ScopeSyntax)?
  func lookup(_ name: String) -> [DeclSyntax]
}

extension SourceFileSyntax: ScopeSyntax { /**/ }
extension CodeBlockSyntax: ScopeSyntax { /**/ }
/*etc*/

Internally, we can project a Scope tree over the syntax node and perform the lookups against that tree. But I'm not sure it's a good idea to expose the scope tree to clients.

@MAJKFL
Copy link
Contributor Author

MAJKFL commented Jun 25, 2024

I think that this API would emphasize syntax over scope and add entrypoints as such.

My suggestion would be

protocol ScopeSyntax: SyntaxProtocol {
  func parent() -> (any ScopeSyntax)?
  func lookup(_ name: String) -> [DeclSyntax]
}

extension SourceFileSyntax: ScopeSyntax { /**/ }
extension CodeBlockSyntax: ScopeSyntax { /**/ }
/*etc*/

Internally, we can project a Scope tree over the syntax node and perform the lookups against that tree. But I'm not sure it's a good idea to expose the scope tree to clients.

I think it would be a great idea to make syntax in the center of those APIs, but I’m not entirely sure how would this work with more complex scopes like with function declarations. If we’d like to project a scope onto the FunctionDeclSyntax we’d have to include some more complex logic handling name lookup for e.g. generic parameters.

Compiler right now for this invalid snippet:

func foo<T1, T2: T1>(a: T1, b: T2) {
    print(a, b)
}

Produces this scope map:

***Complete scope map***
ASTSourceFileScope 0x14d13d000, [1:1 - 4:1] 'test.swift'
`-AbstractFunctionDeclScope 0x14d170020, [1:1 - 3:1] 'foo(a:b:)'
  `-GenericParamScope 0x14d139378, [1:1 - 3:1] param 0 'T1'
    `-GenericParamScope 0x14d139378, [1:1 - 3:1] param 1 'T2 : T1'
      |-ParameterListScope 0x14d139580, [1:21 - 1:34]
      `-FunctionBodyScope 0x14d170020, [1:36 - 3:1]
        `-BraceStmtScope 0x14d1701e8, [1:36 - 3:1]

The parameter T1 should be recognized during lookup as the first generic parameter.

I assume that’s why there are separate AbstractFunctionDeclScope and FunctionBodyScope so there’s a clear path for recursive lookup i.e. BraceStmtScope -> FunctionBodyScope -> GenericParamScope -> GenericParamScope -> AbstractFunctionScope -> …. If we’d consolidate this functionality into one SyntaxScope this would require some complex logic for handling lookup at generic parameters differently than in e.g. function body. I started prototyping how a structure resembling the compiler implementation could work in this case with inner (FunctionBodyScope) and outer (AbstractFunctionDeclScope) scopes both associated to FunctionDeclSyntax here.

What about a hybrid approach? We could still use the custom structure mapped over the existing syntax nodes internally and additionally introduce a new ScopeSyntax protocol that would add entry points for the API at the nodes themselves (that is move the functionality of LexicalScopes into the syntax nodes). We would retain the flexible scope structure and still emphasize syntax nodes as the entry points for the API.

It’s possible I’m wrong and I missed some detail so I’d appreciate to hear your take on this. I’d also like to hear from @DougGregor what he thinks about it.

@MAJKFL MAJKFL requested a review from ahoppen June 25, 2024 13:32
@CodaFi
Copy link
Contributor

CodaFi commented Jun 28, 2024

I assume that’s why there are separate AbstractFunctionDeclScope and FunctionBodyScope so there’s a clear path for recursive lookup i.e. BraceStmtScope -> FunctionBodyScope -> GenericParamScope -> GenericParamScope -> AbstractFunctionScope ->

This is rather because ASTScope is designed in a decidedly much more object-oriented fashion than the rest of the compiler. I consider this to be an abstraction leak of the tree, but the structure of the model is nonetheless sound to your point. My preference is still for a Syntax-oriented API.*

What about a hybrid approach?

I agree! Consider the following:

struct Scope {
  var parent: Scope?
  /* etc. */
}

protocol ScopeSyntax: SyntaxProtocol {
  var scope: Scope { get } // project this node into a `Scope`
  func lookup(_ name: String, in scope: Scope) -> [DeclSyntax]
}

extension ScopeSyntax {
  func lookup(_ name: String) -> [DeclSyntax] {
    self.lookup(name, in: self.scope)
  }
}

That would handle a client that e.g. wants to perform unqualified lookup for a generic parameter by

  1. Walking from the FunctionDeclSyntax node to the GenericParameterListSyntax node
  2. Projecting its generic parameter list
  3. Calling functionDecl.lookup("x", in: genericParameterList.scope)

As for capturing lookup from a particular part of the generic parameter list, you could implement the projection of the syntax for a generic parameter to a Scope by storing an index into the enclosing generic parameter list and the generic parameter list itself.

*To recover the behavioral dispatch points that the object orientation is really needed for, we just need to have Scopes carry a Syntax node. Then you can branch on the kind. If you want to get fancy about it, you can create a sub-enum capturing all of the scope-aware syntax nodes, store that as a field, and branch on that.

@CodaFi
Copy link
Contributor

CodaFi commented Jun 28, 2024

Just a note while this is on my mind: We also have to be careful here to look at ASTScope as a model because its projection function is top-down (root to leaf), where as a Scope is (presumably) projected bottom-up (leaf to root). There are interesting things to consider in the bottom-up case like

guard let x = foo() else {
  return bar
}

Projecting the scope of the else brace statement has to be careful to walk through the scopes defined in its binding condition list and demand the scope of the next-most syntactic enclosing scope. Another way to put it - you could implement bottom-up scope projection by linking the else branch to its binders and marking them as "holes" that a recursive lookup procedure should merely traverse but not collect declarations from.

@CodaFi
Copy link
Contributor

CodaFi commented Jun 28, 2024

Another passing thought: If a Scope is sufficiently opaque, you can avoid allocating a Scope tree at all/modeling it with separate class references by having the parent node point to a ScopeSyntax. A Scope would then just be a value-typed handle returned as the result of projecting up the ScopeSyntax parts of one particular branching path of the syntax tree. Heck, you could even have the Scope value carry the set of local declarations along with its parent pointer. The implementation of lookup falls out neatly as a tree walk that joins results from each Scope it encounters.

@MAJKFL
Copy link
Contributor Author

MAJKFL commented Jul 1, 2024

Another passing thought: If a Scope is sufficiently opaque, you can avoid allocating a Scope tree at all/modeling it with separate class references by having the parent node point to a ScopeSyntax.

I tried remodeling the name lookup with shadowing for function declaration with just ScopeSyntax nodes and I think it could be a good idea to follow this lead. Specifying lookup caller fixes the problem with name lookup at generic parameters and function parameter list. Additionally, it could also potentially serve as a good starting point for lookup in CodeBlockSyntax to filter the declarations according to their relative position.

As for capturing lookup from a particular part of the generic parameter list, you could implement the projection of the syntax for a generic parameter to a Scope by storing an index into the enclosing generic parameter list and the generic parameter list itself.

I tried a bit different approach with generic parameter scopes. I followed the hierarchical structure of ASTScope to make the API more uniform across the lookup and not worry about keeping the indexes of generic parameters. This way, each GenericParameterSyntax implements ScopeSyntax. On the other hand, it complicates a bit more the lookup on the FunctionDeclSyntax side, but I think it’s a great tradeoff.

Also, as for now, I’m just returning the values as TokenSyntax nodes. Since the lookup could also result in variable declarations, function parameters, function name, generic parameters etc. Is there a way to represent them uniformly as DeclSyntax as you suggested?

I’d love to hear your feedback @CodaFi about the adjusted structure. I tried to follow your suggestions with these two adjustments, so I hope I haven’t missed anything.


var sourceSyntax: SyntaxProtocol { get }

func getDeclarationFor(name: String, at syntax: SyntaxProtocol) -> Syntax?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without considering shadowing, it's possible to find multiple declarations that would need to be returned. This can happen with overloaded local functions, like this:

func f() {
  func g(x: Int) { }
  func g(x: Double) { }

  g(x: 3)
  g(x: 3.14159)
}


/// Helper method to recursively collect labeled statements from the syntax node's parents.
private func lookupLabeledStmtsHelper(at syntax: Syntax?) -> [LabeledStmtSyntax] {
guard let syntax, !syntax.is(MemberBlockSyntax.self) else { return [] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to avoid walking up past closure or function bodies here, e.g.,

func f() {
  label1:
    for x in y {
      z = x.map {
        // label1 is not visible from here
      }
   }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I added them with respective test cases.


/// Given syntax node position, returns the current switch case and it's fallthrough destination.
func lookupFallthroughSourceAndDestination(at syntax: SyntaxProtocol) -> (SwitchCaseSyntax?, SwitchCaseSyntax?) {
guard let originalSwitchCase = syntax.ancestorOrSelf(mapping: { $0.as(SwitchCaseSyntax.self) }) else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, this also should probably stop at closure and function boundaries. There might be a more general "walk up the parent tree until you get out of the current function" operation you'll want to use both here and for looking up labeled statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a generic helper method walkParentTree. It takes an array of syntax node types the lookup should terminate at and specifies type of nodes to collect to make it more reusable in the future. We can then use it to create more specialized functions like walkParentTreeUpToFunctionBoundary.

Comment on lines 93 to 99
if let thisCase = child.as(SwitchCaseSyntax.self) {
if thisCase.id == switchCaseSyntax.id {
visitedOriginalCase = true
} else if visitedOriginalCase {
return thisCase
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in fact not considered how to handle #if clauses. I don't want to rely on any kind of syntactic preprocessing of the syntax tree to remove the #ifs: we need this query to work on the original, unmodified syntax tree.

So, it seems like we will need something like #1816 to have any chance of making lookupNextSwitchCase(at:) work when there's an #if around the case. When we do get #1816, we'll need to parameterize this operation by a BuildConfiguration. I'm okay with waiting until #1816 lands to make that change, and just... pick the first one or something... if there are #ifs around the following case.

} else {
return lookupCatchNodeHelper(at: syntax.parent, traversedCatchClause: traversedCatchClause)
}
case .functionDecl(let functionDecl):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to handle initializers, closures, and accessors, I presume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I added those cases. Should we also include source file? Does compiler expect that as a result when we have a try in top level code?

Comment on lines 27 to 32
func assertLexicalScopeQuery(
source: String,
methodUnderTest: (SyntaxProtocol) -> ([SyntaxProtocol?]),
expected: [String: [String?]]
) {
// Extract markers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful for writing tests, thank you!

}
case .functionDecl(let functionDecl):
if functionDecl.signature.effectSpecifiers?.throwsClause != nil {
return syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have the throwsClosure != nil check here: the catch node is always the function, regardless of whether it currently specifies that it throws or not. The type checker can figure out whether it should have been throws.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know. I removed the condition and adjusted one of the tests.

Comment on lines 18 to 21
@_spi(Compiler) @_spi(Testing) public static func lookupLabeledStmts(at syntax: SyntaxProtocol) -> [LabeledStmtSyntax] {
guard let scope = syntax.scope else { return [] }
return scope.lookupLabeledStmts(at: syntax)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the LexicalScopes as a namespace, how about putting these operations directly on SyntaxProtocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not entirely sure how much I can adjust the functions already used by the compiler, that’s why I put it in LexicalScopes with similar parameter that indicated location to start lookup. I moved them now to SyntaxProtocol so clients can just use syntaxNode.lookupLabeledStmts().

@MAJKFL MAJKFL requested a review from DougGregor July 2, 2024 16:18

extension SyntaxProtocol {
/// Scope at the syntax node. Could be inherited from parent or introduced at the node.
var scope: Scope? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is effectively always returning a FileScope for the outermost SourceFileSyntax.

} else {
return lookupCatchNodeHelper(at: syntax.parent, traversedCatchClause: traversedCatchClause)
}
case .functionDecl, .accessorDecl, .initializerDecl:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also deinits and closures.

import Foundation
import SwiftSyntax

class FileScope: Scope {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileScope isn't doing anything. I think it should be removed.

Comment on lines 29 to 35
protocol Scope {
/// The parent of this scope.
var parent: Scope? { get }

/// Syntax node that introduces this protocol.
var sourceSyntax: SyntaxProtocol { get }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, Scope isn't doing anything any more. Can we remove it, and put the various operations on SyntaxProtocol directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a leftover from the original API structure I wanted to implement. I moved now all the query methods to SyntaxProtocol and removed parts related to the Scope protocol.

Package.swift Outdated
@@ -28,6 +28,7 @@ let package = Package(
.library(name: "SwiftSyntaxMacros", targets: ["SwiftSyntaxMacros"]),
.library(name: "SwiftSyntaxMacroExpansion", targets: ["SwiftSyntaxMacroExpansion"]),
.library(name: "SwiftSyntaxMacrosTestSupport", targets: ["SwiftSyntaxMacrosTestSupport"]),
.library(name: "SwiftLexicalScopes", targets: ["SwiftLexicalScopes"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about the name here a little more. I think I suggested SwiftLexicalScopes initially, but given that the "scope" part is actually getting somewhat downplayed... should we call this SwiftLexicalLookup? All of these operations are, essentially, lexical lookups for various kinds of things---lookups for declarations, labeled statements, where a fallthrough ends up---and most will have "lookup" in their names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also like the idea. I changed it to SwiftLexicalLookup. It’s more broad and not limiting the library to just scope lookup could pave the way for including a broader range of APIs, like the break and continue lookup you’ve mentioned yesterday that is not necessarily part of ASTScope as far as I remember. Maybe it would be a good idea to use this library in the long run to consolidate this sort of logic in one codebase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is the right place to consolidate all of the lexical lookup logic over time.

1️⃣func foo() rethrows {
2️⃣do {
3️⃣do {
try 4️⃣f()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth testing try x + y and try! x + y, where we look up from the y, to make sure we get the appropriate catch node. We just tripped over this same issue with the C++ implementation .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a bug here indeed. Thanks for pointing this out! I added a new case for ExprListSyntax checking if the first expression is a TryExprSyntax with question or exclamation mark. I also added an appropriate test case.

…xicalLookup`. Fix bug with expression list after `try` keyword.
@MAJKFL MAJKFL requested a review from DougGregor July 3, 2024 09:26
Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! We are very, very close, but I'd like to polish this up just a little bit more before we land this PR.

@DougGregor
Copy link
Member

@MAJKFL looks like Package.swift has a conflict. If you're able to resolve that and run formatting locally (if you haven't already), I think we're ready to kick off testing

@DougGregor
Copy link
Member

@swift-ci please test

@DougGregor
Copy link
Member

@swift-ci please test

1 similar comment
@DougGregor
Copy link
Member

@swift-ci please test

@DougGregor
Copy link
Member

@swift-ci please test Windows

@DougGregor
Copy link
Member

macOS failure is this:

Error: .spi.yml did not contain the same libraries as Package.swift:
 - Missing in .spi.yml: SwiftLexicalLookup

@DougGregor
Copy link
Member

@swift-ci please test

@MAJKFL
Copy link
Contributor Author

MAJKFL commented Jul 4, 2024

macOS failure is this:

Error: .spi.yml did not contain the same libraries as Package.swift:
 - Missing in .spi.yml: SwiftLexicalLookup

I added SwiftLexicalLookup to the .spi.yml file. I hope it's fixed now.

@DougGregor
Copy link
Member

@swift-ci please test

@DougGregor
Copy link
Member

That looks right to me. Kicked off CI and let's hope for the green checks

@DougGregor
Copy link
Member

Feel free to merge if it passes

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the new API. I have a few more comments, mostly around clarity.

@MAJKFL MAJKFL requested review from ahoppen and DougGregor July 4, 2024 14:06
@DougGregor
Copy link
Member

@swift-ci please test

@DougGregor
Copy link
Member

@swift-ci please test Windows

Copy link
Member

@DougGregor DougGregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say "let's merge!"

)
}

/// Callect syntax nodes matching the collection type up until encountering one of the specified syntax nodes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Callect syntax nodes matching the collection type up until encountering one of the specified syntax nodes.
/// Collect syntax nodes matching the collection type up until encountering one of the specified syntax nodes.

collectNodesOfTypeUpToFunctionBoundary(type, stopWithFirstMatch: true).first
}

/// Collect syntax nodes matching the collection type up until encountering one of the specified syntax nodes. The nodes in the array are inside out, with the innermost node being the first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also wrap this comment at 120 columns?


// MARK: - lookupCatchNode

/// Given syntax node location, finds where an error could be caught. If `traverseCatchClause` is set to `true` lookup will skip the next do statement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, 120 columns

import XCTest
import _SwiftSyntaxTestSupport

/// `methodUnderTest` is called with the token at every position marker in the keys of `expected`. It then asserts that the positions of the syntax nodes returned by `methodUnderTest` are the values in `expected`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this comment would be good to keep at 120 columns.

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nitpicky format comments, otherwise looks good to me. Thank you @MAJKFL!

@MAJKFL
Copy link
Contributor Author

MAJKFL commented Jul 5, 2024

Is it ok if we postpone the formatting changes to the next PR to not trigger CI for a couple of comments? I'll remember to include them. If you think the PR is good enough, could you please merge it? I see I would need write access for this repo to do so.

@DougGregor
Copy link
Member

I'll merge and we can pick up the formatting tweaks in another PR

@DougGregor DougGregor merged commit 9cc77ac into swiftlang:main Jul 5, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants