Strict whitespace adherence option for JSON #165
Replies: 2 comments
-
Hi @CoffeeVampir3! See the docs here for info on how to specify whitespace constraints on JSON outputs: Note that we do not (yet!) support "pretty-print" style constraints (i.e. setting some fixed indentation strategy like what you'd get from Now for some unsolicited thoughts / opinions: I'd actually wager that forcing whitespace in a way that disagrees with "what the LM wants to do on its own" is going to degrade your output quality. Hopeful any such effect would be marginal, but any time your constraints disagree with the LM itself, you're risking throwing it off-distribution. That being said, if you are using Aside from quality, pretty-printed JSON is going to have much higher token usage than what your LM already wants to do. On the other extreme, you can force an even more "compact" output with no whitespace at all. But again... this will probably degrade quality. In all though, I just want to point out that llguidance guarantees that your JSON will be well-formed. This means you can do things like "force one output format for the LM (or let it make that decision itself) but parse it as JSON and dump it to a format of your choice before displaying it to a user"! |
Beta Was this translation helpful? Give feedback.
-
Also, while for non-recursive schemas it's possible to hard-code whitespace in the grammar the way you did, for recursive ones we need special support in the lexer. This has been started in #107 but is far from complete... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
An example json grammar, E.G:
This does not strictly enforce white spacing, and it doesn't seem to be possible to use this style of grammar to do so.
Here's an example literally output from an llm:
Though this is technically correct, I'd like to propose a strict whitespace mode that enforces what you might consider "human readable json"
This is a grammar that produces such json, but it's significantly less readable for humans:
Example output:
Having a standard strict-whitespacing I think would be a great feature to have around.
Cheers, really love the library!
Beta Was this translation helpful? Give feedback.
All reactions