Skip to content

Improve JSON/YAML parsing errors #1090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JordanMartinez opened this issue Oct 16, 2023 · 15 comments · Fixed by #1227
Closed

Improve JSON/YAML parsing errors #1090

JordanMartinez opened this issue Oct 16, 2023 · 15 comments · Fixed by #1227
Assignees

Comments

@JordanMartinez
Copy link
Contributor

JordanMartinez commented Oct 16, 2023

See #1057 and #1086 for context.

json-codecs (in its current latest version, which is not the same version of the library used by language-purescript at the moment) differs from codec-argonaut/argonaut-codec in the following ways:

  1. it allows custom error messages unlike both libraries JsonDecodeError
  2. it allows one to easily define the order in which fields are encoded to an object (I think codec-arognaut supports this but not argonaut-codec) using record syntax
  3. It allows for field renaming both when encoding and decoding
  4. unlike argonaut-codec but similar to codec-argonaut, it's value-based codecs
  5. it's not bidirectional like codec-argonaut, but I could add bidirectional codecs

See the test code and its current output

The library isn't the most well-tested it could be (e.g. are the otherwise unidirectional encoder/decoders for a given type actually bidirectional if used together). But would there be interest in this?

@f-f
Copy link
Member

f-f commented Oct 16, 2023

I'll first note that we are using codec-argonaut mostly because it's what the Registry uses - that allows us to reuse a lot of codecs. It would be easier to move if the Registry did as well.

Now - I thought about switching JSON library to improve error messages, but I'm not convinced this move would buy us much?
The error messages that are coming out of your library are a bit nicer out of the box, but not an order of magnitude better - I have the feeling we could achieve something similar by writing a custom printer for JsonDecodeError, or pattern matching on some known failures to return nice errors, both of which should be considerably less work than migrating to a new library and adding custom things on top of that.

To sum up: I am not opposed to switching to something better, but I'd like to see us try to improve things with the current library first, and if these attempts look like failures then the new thing should have some clear way to address them.

@JordanMartinez
Copy link
Contributor Author

Makes sense. I think a custom printJsonDecodeError could work. I'm not sure about the 'pattern matching on some known failures' part because that seems like it'll be hacky one way or another. But definitely a much stopgap solution than updating all the codecs to json-codecs.

@f-f
Copy link
Member

f-f commented Oct 17, 2023

I think we'll need both (the matching, and the custom printer) to handle this stuff. E.g. for #1057, we get this JsonDecodeError if there's a publish field, but it doesn't contain the right stuff:

  Under 'Config':
  At object key package:
  Under 'PackageConfig':
  At object key publish:
  Under 'PublishConfig':
  At object key license:
  No value was found.

That's a Named "Config" (AtKey "package" (Named "PackageConfig" (AtKey "publish" (Named "PublishConfig" (AtKey "license" MissingValue))))).

If we'd match on the Named "Config" we could have a nice Your spago.yaml configuration is not correct: , to which we could append the result of the custom printer the object 'package.publish' does not contain the key 'license'


Then I think #1086 is a little harder - we get a cryptic

Under 'Config':
At object key package:
Under 'PackageConfig':
At object key dependencies:
At array index 2:
Expected value of type 'Object'.

This is not legible for two reasons (as @finnhodgkin noted):

  1. the codec for PackageName swallows the parsing error - see here
  2. we parse with an alternative since we can find either a string or an object in there:

    spago/core/src/Config.purs

    Lines 266 to 268 in d398cba

    decode json =
    map Left (CA.decode PackageName.codec json)
    <|> map Right (CA.decode packageSingletonCodec json)

    This means that even if PackageName would return a sensible parsing error we'd still lose it since codecs-argonaut does not accumulate errors.

I think for (1) we could wrap the parse error in a TypeMismatch, but I am not entirely sure how to tackle (2). Maybe there's a way to restructure our parser so that we get better errors?

I wonder if @garyb or @thomashoneyman have any insight on this?

@f-f f-f changed the title Migrate codecs to json-codecs? Improve JSON errors Oct 17, 2023
@f-f f-f changed the title Improve JSON errors Improve JSON/YAML parsing errors Oct 17, 2023
@garyb
Copy link
Member

garyb commented Oct 17, 2023

Hmm... to accumulate the errors you'd have to write a manual codec that tries both branches and then checks if they both fail, and if so constructs a new error message. Even after that, with the current structure of the error message type there's not really a good fit for the constructor that message would go into unfortunately. Re-error-typing the codecs is possible, but it would mean essentially reimplementing them all, or having to call a wrapper function around them all to make them compatible.

I do intend to change things a little bit for the purescript-json version of the Codec-based stuff, I was thinking of just having the errors be a path paired with a plain string, so the structure is there if needed but the message itself can be whatever. Migrating to that would probably be easier than json-codecs because I intend to keep everything else the same as the current codecs, so it should just be a case of swapping the package names, and only code that produced or dealt with errors would need changing.

I've been working on my house lately so the JSON stuff has been on the backburner, but if it's something that you're interested in I could set aside some time this weekend to wrap that up instead.

@CharlesTaylor7
Copy link
Collaborator

CharlesTaylor7 commented Oct 17, 2023

PackageName is a newtype around string.

So can we use our own codec for that?

Something like wrapIso PackageName (CA.string) wouldn't swallow errors, right?

@garyb
Copy link
Member

garyb commented Oct 17, 2023

The error swallowing in question is due to the <|> between PackageName.codec and packageSingletonCodec - it'll only be producing the error from packageSingletonCodec.

@f-f
Copy link
Member

f-f commented Oct 17, 2023

I've been working on my house lately so the JSON stuff has been on the backburner, but if it's something that you're interested in I could set aside some time this weekend to wrap that up instead.

That would be lovely! ❤️ Wish I could lend you a hand on the house works 😄

@garyb
Copy link
Member

garyb commented Oct 22, 2023

It lives: https://github.com/garyb/purescript-codec-json

There are some changes from codec-argonaut:

  • it uses purescript-json (perhaps a bit redundant to point that out smile:)
  • inside the codec is ExceptT rather than Either now
    • so we can use the error-accumulating Alternative instance (!)
    • if there are places that were importing decode from Data.Codec it can be switched to the decode from Data.Codec.JSON to avoid the need to runExcept the result
  • I didn't bring the Compat or Migration modules across (but could do if we end up needing them)
  • You're not forced to supply a name in codecs for objects/records and indexed arrays now (you can still use named to introduce one though). Not forcing names for every level of structure lets the errors combine better, and I've encountered a few cases in my own code where I don't always want to name a thing if it's a record nested in another, things like that.

The revised error structure I mentioned ended up gaining a little more complexity than my "just path + message" proposal, there's also a "causes" that allows errors to be given context (like type names, or when grouping errors for failed alternatives). Here's an example of the kind of thing it prints for an error now where I used <|> to combine 3 codecs:

Could not decode IntTuple:
  Failed to decode alternatives:
    - Expected value of type Array
    - $.fst: No value found
    - $.a.value: No value found
    - $.a.val: No value found

The setup that produces that is in the tests.

The print function for errors does some work to try and make the output reasonably pleasant to interpret. Each line can be prefixed by a JSONPath style expression that points to the point in the JSON that the error arose at, with the paths being made relative to the outer error they're showing for. If a path is empty / JP.Tip / $ rendering it is skipped. etc. You can see the structure of the error in the tests and compare with the rendering here to get a feel for what it's doing 🙂.

Let me know what you think!

@f-f
Copy link
Member

f-f commented Oct 23, 2023

Thanks @garyb, this looks great! ❤️
And it sounds like it would fix all of our current issues. I think the first step would be to port the Registry codecs if @thomashoneyman is on board with this (we'll need Compat.maybe, but we could just put it in the registry-lib for now)

@thomashoneyman
Copy link
Member

I'm on board. Can do a review of the registry codecs and update their errors after I finish a couple other things, unless this is urgent.

@garyb
Copy link
Member

garyb commented Oct 23, 2023

Ah yeah, I was thinking I might add a Compat.maybe style codec to Common under a different name, since I think that's almost the only thing anyone uses from Compat anyway. 😉

@f-f
Copy link
Member

f-f commented Oct 23, 2023

Common.nullable?

@garyb
Copy link
Member

garyb commented Oct 23, 2023

👍 exactly what I had in mind!

@garyb
Copy link
Member

garyb commented Oct 23, 2023

I've pushed a new version with that added now.

@f-f f-f added this to the spago-next beta bugs milestone Dec 10, 2023
@f-f f-f self-assigned this Jun 3, 2024
@f-f
Copy link
Member

f-f commented Jun 3, 2024

I have this almost ready, with patches for codec-json, the registry, and spago itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants