Skip to content

Parser.String.regex #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 14, 2022
Merged

Parser.String.regex #153

merged 1 commit into from
Mar 14, 2022

Conversation

jamesdbrock
Copy link
Member

Add a regex parser.

Resolves #141


Checklist:

  • Added the change to the changelog's "Unreleased" section with a link to this PR and your username
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation in the README and/or documentation directory
  • Added a test for the contribution (if applicable)

@jamesdbrock jamesdbrock merged commit aa1e2b3 into main Mar 14, 2022
@jamesdbrock jamesdbrock deleted the regex1 branch March 14, 2022 17:21
@natefaubion
Copy link
Contributor

What was the reason for taking an options object instead of using the RegexFlags Semigroup instance?

@jamesdbrock
Copy link
Member Author

What was the reason for taking an options object instead of using the RegexFlags Semigroup instance?

Yeah I see what you mean. If the type of this parser were regex :: RegexFlags -> String -> ParserT s m a then we could call it like regex (dotAll <> ignoreCase <> unicode) "x*". That’s a pretty nice API, and clearly that’s what the Semigroup instance is for.

But I’m not sure that it handles default flags well. Two of the regex parser flags default to true. In the documentation for Parser.String.regex I hint that the users might want to turn off the dotAll flag. (And possibly the unicode flag?) In which case it’s nice to be able to write regex {dotAll: false} "x*".

But I don’t know, that’s a pretty rare case, and it may never come up, and if it does come up then the user could set the dotAll flag to false by building the flags like regex unicode "x*". Though when we write it that way the author’s intent to set dotAll to false is less clear than regex {dotAll: false} "x*".

Another problem with the Semigroup API is that if the user wants to turn on ignoreCase then they might write regex ignoreCase "x*", which would accidentally turn off the dotAll and unicode flags.

The part that troubles me the most is that I defined a RegexFlagsRow in the parsing package. I would prefer to use only definitions from Data.String.Regex.

I would like to hear your advice about this.

@chtenb
Copy link
Member

chtenb commented Mar 21, 2022

regex (dotAll <> ignoreCase <> unicode) "x*"

How would you turn off flags given a semigroup flags value? Or is that not possible?

@natefaubion
Copy link
Contributor

The docs don't specify why the defaults are the way that they are, so I'm left to guess a bit. For example, in string-parsers it always uses noFlags and you have no choice in the matter. The reason I asked is that I'd consider it a bit unidiomatic as far as regex APIs go. I was surprised that the newtype was actually exported. My first guess upon seeing it is that maybe someone is unfamiliar with the Semigroup instance. The defaults consideration clears up a few things.

If you feel very strongly that these defaults are correct for 95%+ of use cases, maybe the default regex should provide no configuration? You can then provide a regexWithFlags version, and partially apply it for the default case regex = flip regexWithFlags (unicode <> dotAll). Right now you have to worry about the defaults at every call (do I really want to pass in {} or something else?) and I'm unlikely to remember what the defaults actually are. Unfortunately the constraints makes it difficult to parse from the type signature alone, and the defaults are noted fairly far down in the comment string so I won't see it right away. Defaults are tricky, because it's often a question of perception. If still have to pass a {} argument and consider what it means, am I actually gaining a convenience?

@jamesdbrock
Copy link
Member Author

If you feel very strongly that these defaults are correct for 95%+ of use cases,

I'm not at all sure that these defaults are correct.

maybe the default regex should provide no configuration? You can then provide a regexWithFlags version, and partially apply it for the default case regex = flip regexWithFlags (unicode <> dotAll).

Ok I like that, I'll change it so it works that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

regex primitive parser
3 participants