Skip to content

AST Printer vs Schema Printer? #1478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
craigsmitham opened this issue Aug 24, 2018 · 13 comments
Closed

AST Printer vs Schema Printer? #1478

craigsmitham opened this issue Aug 24, 2018 · 13 comments

Comments

@craigsmitham
Copy link

The schema printer renders description block strings differently than the AST printer renders block string descriptions. I imagine there are a few other differences between the printers as well.

I'm implementing a GraphQL server based on GraphQL JS. For printing the schema, I first convert the schema to an AST, then re-use the AST printer. This means I have to modify some of the test cases because the implementation of the schema printer differs slightly (as in the case of descriptions) from the AST printer.

What is the rationale in GraphQL JS of having both a schema printer and an AST printer, and should GraphQL JS consider migrating to the use of the single AST printer (assuming there was a function to convert a schema to an AST)?

@IvanGoncharov IvanGoncharov added bug and removed bug labels Aug 24, 2018
@IvanGoncharov
Copy link
Member

@craigsmitham Can you please give concrete examples of such difference?

@craigsmitham
Copy link
Author

@IvanGoncharov specifically, printBlockString (AST printer), has different behavior than printDescription.

I discovered this by implementing the buildASTSchema tests. The "supports descriptions" test case renders descriptions in a single line (e.g. """description""" because it uses the schema printer, but if you use the AST printer's printBlockString function, these comments will be rendered with newlines wrapping the description value rendering like so:

"""
description
"""

Reviewing the printDescription function in the schemaPrinter, I can see additional logic (such as line length, etc) that is not being considered in the printBlockString function.

It seems like dead weight to maintain two different printer implementations, is there a reason the schema printer doesn't just convert the schema to an AST and use the schema printer?

@craigsmitham
Copy link
Author

To be clear, I'm not making a feature request - just trying to understand the rationale for having two different printers. My guess is that two different printers are a historical artifact because converting to an AST from a schema was not feasible, or if feasible, may not be desirable. If so - I'd just like to know why. If it is feasible and desirable to consolidate printer implementations - I'd like to know that too so I can know better which route to take with my GraphQL implementation.

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Aug 24, 2018

To be clear, I'm not making a feature request - just trying to understand the rationale for having two different printers.

Disclaimer: I'm not an author of this code.
For me, it's a question of KISS vs DRY. I work a lot with AST so my estimate is that it will be 1.5-2x increase in code size to switch schema printer to AST + it will introduce a bunch of utility functions to create nodes and this will affect code readability.

Another factor is that Relay and other tools use SDL for storing schema and since some schemas are huge (e.g. Facebook has 10000+ types) it's not very efficient to create AST as an intermediate step since it will waste a lot of CPU and memory.

That said if someone show schema printer implementation with AST output that as fast and simple as ours I don't see any reason to refuse this approach.

@craigsmitham
Copy link
Author

craigsmitham commented Aug 24, 2018

@IvanGoncharov I don't think there would be any code size increase, as printer.js already supports printing SDL. So, it would be a net decrease. With regard to performance, because printer.js supports SDL, the KISS and DRY approach would be to use it for printing. Even if it did cause a slowdown in performance, wouldn't the KISS and DRY gains be worth it? I don't think schema printing is on the critical paths for any tools or execution.

@IvanGoncharov
Copy link
Member

I don't think there would be any code size increase, as printer.js already supports printing SDL

printer.js supports printing AST nodes to SDL but you need to generate those AST nodes so you need to write code that creates AST from GraphQL* classes.

@craigsmitham
Copy link
Author

craigsmitham commented Aug 24, 2018

@IvanGoncharov good point, forgot that. In my case I've found that code to be pretty trivial and I think having the schema -> AST capability seems reasonable so I've wondered why it wasn't already in GraphQL JS. My guess is that since SDL only became part of the spec recently, approaches to dealing with SDL were bifurcated early on but there is a lot of opportunity to consolidate and re-use existing mechanisms now that SDL is part of the spec.

I have the same question about validation. It seems some validation rules are being implemented for SDL using AST visitors, but there is a seperate schema validator as well. It appears validating an SDL based on the AST is where the codebase is heading. If so, there will again likely be a lot of overlap with the existing schema validator. Now, with validation I can see that there will be some schema validation functions that would require a schema instance, but a lot of them could be validated using an SDL AST.

I think the direction I'm going to head with my implementation is to use the AST for printing the SDL and validating the schema. It will be interesting to see what % of cases I run into any issues with this approach and need to fall back to validating the schema instance directly.

At this point, I'm really just looking for validation/understanding that this is the right direction to go in or if I'm missing out on some critical considerations.

I wonder if @leebyron can add any additional insight or historical perspective.

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Aug 24, 2018

@craigsmitham I took a closer look at printSchema implementation and it looks like you are right its pretty easy to generate AST since it's structure is very similar to GraphQL* classes. We will still have a significant memory overhead but I agree that it's not a very critical path.
I think I found a real reason why it wasn't done previously:

* - commentDescriptions:
* Provide true to use preceding comments as the description.

You can't put comments into AST so in the pre-release version of SDL you will lose all descriptions.
We decided to support pre-release SDL until 16.0.0 so until then it's not an option 😞

@craigsmitham
Copy link
Author

@IvanGoncharov ah, that makes sense. I wasn't planning on implementing support for legacy features like comments as descriptions, so that's why that case didn't occur to me. OK, so I'm more confidant now in using the AST for printing now that I know that one of the reasons the schemaPrinter is needed is for legacy features, but is less likely to be needed post 16.0.0.

Any thoughts on validation? It seems the same pattern applies to SDL validation. Some rules have been introduced for SDL AST validaitons, and it seems most test cases for the schema validator could be translated into AST SDL validation rules. The big advantage (and current gap) to this approach is that you will have more validations available when analyzing the AST - and a single source for schema validation logic.

@IvanGoncharov
Copy link
Member

IvanGoncharov commented Aug 24, 2018

Any thoughts on validation?

@craigsmitham You can split validation rules into a few different groups:

  1. SDL specific. For example:
type Query {
  foo: String
  foo: Float
}

It's not affecting GraphQLObjectType since you provide fields as a map:

new GraphQLSchema({
  name: 'Query',
  fields: {
    foo: {
      type: GraphQLString,
    }
  }
})

Another such example is that GraphQLSchema doesn't allow duplicating types:

'Schema must contain unique named types but contains multiple ' +
`types named "${type.name}".`,

Same goes for validating unknow types, defaultValues, etc.

  1. Directive related validations. Since directives are scoped to SDL document and not presented in GraphQLSchema you can validate them only in SDL.

  2. Checks that make sense only on a fully constructed schema. For example, the following is valid SDL:

type Query

But if you do:

let schema = buildSchema('type Query');

buildSchema will succeed but schema will fail validation
But if you then do:

schema = extendSchema(schema, parse(`
  extend type Query {
    foo: String
  }
`));

schema become valid.
That's why validateSchema is only called before execute or other function which expects a fully constructed schema for the same reason we can't apply the same checks on SDL level.

  1. Checks that can be done both on SDL and GraphQLSchema. It's actually the smallest category but even for them you need to keep validating GraphQLSchema because:

Here is a potential workflow that explains why we need final validation done on GraphQLSchema:

  1. validate SDL to ensure that directives are used correctly (known directives in correct locations, all required args, etc.) and buildSchema will succeed (no duplicating fields, correct defaultValues, etc.)
  2. buildSchema
  3. apply transformations based on directives
    ...
  4. validate extension SDL
  5. extend the schema
  6. apply transformations based on directives in extension SDL
    ...
  7. schema is passed to graphql or similar function
  8. validate schema to ensure it's fully constructed and valid after all transformations

@craigsmitham
Copy link
Author

craigsmitham commented Aug 24, 2018

@IvanGoncharov this all kind of makes sense assuming you don't have a schema -> AST function. If you did, your final example, you would simply construct the SDL AST from the final schema (or at whatever step in the process you need), and validate it with the appropriate SDL rules (using the AST validator).

It seems the current approach favors using the schema validator unless AST validation is necessary. What I'm saying is that if you want to apply all the schema validation rules (not just a minimally valid SDL) to an AST, it's not currently possible. So, when you say "Checks that make sense only on a fully constructed schema", I can imagine a few tooling scenarios where you would want these checks to run on SDL prior to schema construction.

The bottom line question for me regarding validation (as it is with printing), is whether there are any downsides (other than additional memory overhead) for using the AST/visitors as the primary source for for all validation logic? My current working hypothesis is that this is indeed the desirable approach, given the increased flexibility and ease-of-use of validating/printing ASTs using the same mechanisms, and that there is little need for validation of the schema instance itself - especially as the schema object model has mechanisms (like maps, invariant guards, etcs) that inherently validate itself.

@OlegIlyenko - curious if you have any perspective on these questions given the more type-safe nature of Scala. The more type-safe I make the schema implementation, the less need for validation - but I still want it for the AST.

@mjmahone
Copy link
Contributor

@craigsmitham I've had this same inkling: if you are operating with an AST schema and AST executables, then it would be nice to be able to do all validation without building up a GraphQLSchema class.

It's certainly not necessary, but might be a nice feature addition. Personally I would prefer us doing all validation on pure AST, but we'd probably want to improve our internal validation contexts a bit to handle, and cache, lookups better. Without this, we'll be doing O(n) lookups in many places, repeatedly, causing some of our validation rules to grow to O(n^2) (or O(n*m), where n and m are certain types of AST nodes).

@craigsmitham
Copy link
Author

@mjmahone the approach I'm taking right now is to implement schema validation in the exact same pattern as the AST SDL validators. So far everything seems to be going well from a functional perspective, although it will take some time to see what the performance implications and remedies are. @mjmahone and @IvanGoncharov thank you for your comments and feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants