Skip to content

[Proposal] Remove TreatUnmatchedTokensAsErrors #2350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
KathleenDollard opened this issue Mar 9, 2024 · 8 comments
Open

[Proposal] Remove TreatUnmatchedTokensAsErrors #2350

KathleenDollard opened this issue Mar 9, 2024 · 8 comments
Labels
Design Powderhouse Work to isolate parser and features

Comments

@KathleenDollard
Copy link
Contributor

In the same spirit as asking "why" on custom parsers, we are asking "why" on TreatUnmatchedTokensAsErrors.

The major reason, and the one used by dotnet new and dotnet msbuild is to compose CLIs. The args not used by the .NET CLI are passed to the additional parser. In the case of dotnet new it builds a dynamic parser based on the template chosen (very cool stuff). msbuild still has its own parser.

We've found that this is better handled by creating an CliArgument that is an array of strings and passing that array along to the other parser.

One of the reasons TreatUnmatchedTokensAsErrors feels uncomfortable because it leads to exposing the Token type outside the parser layer. It seems preferable to make this an internal type. There also may be an implication that you can have unmatched tokens anywhere in the CLI, but in general, once unmatched tokens appear, parsing can become complicated and problematic.

The proposal is to remove it.

We look forward to hearing about any scenarios that are not covered by a collection of strings as an argument on the command that should route to a composed CLI.

@KathleenDollard KathleenDollard added Design Powderhouse Work to isolate parser and features labels Mar 9, 2024
@KalleOlaviNiemitalo
Copy link

The dotnet msbuild strategy of creating an CliArgument that is an array of strings, fails to distinguish between these commands:

  • dotnet msbuild -- -interactive should search for a project or solution in a directory named -interactive.
  • dotnet msbuild -interactive should search for a project or solution in the current directory, and build that interactively.

In dotnet new, I don't think templates can take non-option arguments, so the ambiguity is less of a problem there.

It could still become a problem in other applications that dynamically build parsers. But that doesn't mean TreatUnmatchedTokensAsErrors is the best way to fix it. Some kind of parser hook that lets the application create a subcommand on demand could also work.

@KathleenDollard
Copy link
Contributor Author

@baronfel @rainersigwald

Can you comment on this - how we do it, how we want to do it, and whether we could make a change without breaking the world.

@KathleenDollard
Copy link
Contributor Author

Does anyone besides dotnet .. have the need to modify the CLI tree based on commands within the tree?

Background:

  • dotnet new alters the CLI tree based to make the template you choose effectively a command. It has unique options and --help refers to the template.
  • dotnet build and dotnet msbuild pass arguments on to another system that currently has its own parser and non-Posix syntax

@jonsequitur
Copy link
Contributor

The usual use case for this that I've seen is that after a certain point on the command line, a second parser will be taking over, possibly in another process.

dotnet (incorrectly 🤓) used -- as the signal for this hand-off but not all command lines make it explicit. More often than not, it's associated with a subcommand (e.g. dotnet new which can hand off to an effectively infinite number of different parsers for the different templates.)

@baronfel
Copy link
Member

baronfel commented Mar 21, 2024

FWIW we've long since fixed our incorrect usage of -- and moved to standard handling of -- for the most relevant commands of new and run. I think the driver for this was a pre-beta-4 S.CL processing change that kind of forced us to get correct! (Pay no attention to the fact that several tools contributed by other teams (watch, razor, dotnet-test) are still doing things incorrectly).

@tmat
Copy link
Member

tmat commented Mar 27, 2024

@KathleenDollard We need this API in dotnet watch: See dotnet/sdk#39618.

There we are parsing a command line with unknown structure and passing some arguments through to an arbitrary subcommand that dotnet-watch invokes.

I'd prefer better APIs to do so, one that would be easier to use and handle -- properly (same issue as #2350 (comment)).

Perhaps having a special CliCommand that servers as a placeholder for an arbitrary command and all options and arguments following it get associated with it might work.

@KathleenDollard
Copy link
Contributor Author

The alternative approach to unmatched tokens is to create a command with an argument that is a collection of strings. Those strings would then be the args passed to the command.

I believe that -- could be treated as a command for this purpose if that is sensible.

I am looking for cases where that approach would fail. We will definitely support handing off to other parsers.

Perhaps having a special CliCommand that servers as a placeholder for an arbitrary command and all options and arguments following it get associated with it might work.

Just want to clarify whether this is just making it easy to create CliCommand with an argument that is a collection of strings, or something fancier.

@fourpastmidnight
Copy link

fourpastmidnight commented Aug 27, 2024

I'm a bit new here but I have been a big fan of SCL since I first found it about two years ago. I think I agree with getting rid of TreatUnmatchedTokensAsErrors. But I want to make sure that I understand this:

The major reason, and the one used by dotnet new and dotnet msbuild is to compose CLIs. The args not used by the .NET CLI are passed to the additional parser. In the case of dotnet new it builds a dynamic parser based on the template chosen (very cool stuff). msbuild still has its own parser.

We've found that this is better handled by creating an CliArgument that is an array of strings and passing that array along to the other parser.

If I understand correctly, let's say I have a CLI program foo and a sub-command bar. Then, if the issued command is foo bar --option1 value1 -- arg1 --arg2 value2, what you're proposing is that all the tokens after the -- are aggregated together into a CliArgument of type string[], and that that argument is passed to bar (since that's the active command) and that bar would either have its own parser to parse the argument, or it could choose to treat these "unmatched tokens" as errors (if bar doesn't support the use of -- and the tokens following it)?

If what I wrote above is accurate in my understanding, then I would advocate for -- being another option. Why an option? I believe that when you see CLIs that use --, the arguments following -- are called "extra arguments", and are treated somewhat similarly as C/C++ varargs or C# params string[] args.

Now, my current understanding is that it is proposed that this is all somewhat implicitly done by the parser? But I wonder if this should be a more explicit choice by the CLI author? So, when defining bar, I might do something like:

var bar = new Command("bar");
bar.AddExtraArgumentOption(required: false); // the default value of required is false

OK, I don't quite like that name, but why do I propose this? Because now you have no unmatched tokens at all. It's expected that there could be a -- option followed by one or more tokens accumulated into an argument as a string[]. So the whole concept of "unmatched tokens" (as it pertains to tokens following a --) goes away. CLI authors make an explicit choice to opt-in to processing of these "extra" arguments.

One thing that might gum up the works here is if -- really belongs to the root command foo and not the sub-command bar. IIRC, POSIX standards say that the -- token MUST be the last option/argument in a command line invocation after all other options/arguments. (In this way, when -- is associated with a command higher in the command tree hierarchy than the "active" executing command, you could consider these extra arguments to be "ambient" to all the subcommands below the command to which the extra arguments are registered.) So the suggested function AddExtraArgumentOption(bool) is useful to help the parser know to whom these "extra arguments" belong. Once a command uses AddExtraArgumentOption(bool) to "register" that it can accept extra arguments, only commands at that same level or above in the command tree hierarchy can also register acceptance of extra arguments. Or conversely said, once a command registers support for extra argument processing, no commands below it in the command hierarchy may register explicit support for extra argument processing.

So, imagine a tree of commands:

graph LR
  A([foo]) --> B([bar]);
  A --> C([baz]);
  C --> D([qux]);
Loading

If foo.AddExtraArgumentOption() is called, then the tree looks like this:

graph LR
    A([foo]) --> B([bar])
    A --> C([baz])
    A --> D[--]
    C --> E([qux])
Loading

The values supplied to the argument of the -- option would be available to foo as well as ambient to all sub-commands of foo all the way down the command hierarchy, including qux. And bar and baz would not be able to register for extra argument support because that would conflict with its parent command foo.

However, if foo does not register support for processing extra arguments via --, then either or both of bar and baz are free to (because only one sub-command can be run at a time during a CLI invocation):

var bar = new Command('bar').AddExtraArgumentOption();
var baz = new Command('baz').AddExtraARgumentOption();

And again, of course, qux would not itself be able to register for "extra argument" support (but would ambiently have access to the extra argument values from it's parent command baz).

graph LR
    A([foo]) --> B([bar])
    A --> C([baz])
    B --> D[--]
    C --> E([qux])
    C --> F[--]
Loading

It should also be noted that there are some CLI applications that will happily run to successful completion even with extra "unmatched" (and unused) tokens (though, those are usually superfluous and don't usually follow a -- because they are entered by mistake by the CLI user). I don't quite like those, personally, but they exist (unfortunately, I can't recall one off the top of my head, but I've seen them)! So the parser should probably keep an array of unmatched tokens that are accessible to the active command. But again, it is still up to the CLI author whether to treat these unmatched tokens as an error, or simply ignore them.

graph LR
    A([foo]) --> B([bar])
    A --> C([baz])
    A --> D[--]
    D --> E[/arguments/]
    A --> F[/"`_*unmatched*_ arguments`"/]
Loading

So that's why I think that I agree with removing TreatUnusedTokensAsErrors—the parser doesn't need to raise an error as a function of parsing. In fact, the CLI author ought to "opt-in" to the -- functionality so that the parser actually knows who is the "owner" of those arguments. And the parser should still parse "unmatched" tokens and provide access to those tokens globally to all command handlers, allowing the CLI author to decide whether that's an error or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design Powderhouse Work to isolate parser and features
Projects
None yet
Development

No branches or pull requests

6 participants