Skip to content

diamond: Unspecified behavior on invalid input #626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rbasso opened this issue Nov 20, 2017 · 14 comments
Closed

diamond: Unspecified behavior on invalid input #626

rbasso opened this issue Nov 20, 2017 · 14 comments

Comments

@rbasso
Copy link
Contributor

rbasso commented Nov 20, 2017

The type signature for diamond...

diamond :: Char -> [String]

... demands the user to either:

  • Implement a partial function.
  • Define a behavior when the input isn't in ['A'..'Z'].

It should probably be changed to...

diamond :: Char -> Maybe [String]

... making it total and also signalling invalid input in a more idiomatic way.

@Average-user
Copy link
Member

Average-user commented Nov 20, 2017

Or maybe, just demands the user to use the function with a correct input, but I see what you mean. Either way, I've seen this problem in many other exercises, usually when it is not specify what to do with incorrect inputs in the tests cases.

@rbasso
Copy link
Contributor Author

rbasso commented Nov 21, 2017

Or maybe, just demands the user to use the function with a correct input...

While that could be considered reasonable in many other languages, we strive to avoid partial functions or encoding errors as special values in Haskell, in favor of Maybe and Either types.

To expose students to idiomatic Haskell code, we started to rewrite some exercises to use Maybe in #115. Later, we took that practice further in #194.

Either way, I've seen this problem in many other exercises, usually when it is not specify what to do with incorrect inputs in the tests cases.

I don't think we should specify how the functions should behave for each possible input as a rule, because that would limit the diversity of solutions. Sometimes people surprise us with interesting solutions that wouldn't pass a unnecessarily strict test suite.

The main problem here is that we don't allow the diligent student to signal invalid inputs in a idiomatic way, so I think we should change the signature. But I don't think we need to include additional tests to check if the solution correctly handles those cases.

@Average-user
Copy link
Member

In that case I think it would be better to do a new type data Abc = A | B | C... or something like that

@rbasso
Copy link
Contributor Author

rbasso commented Nov 21, 2017

In that case I think it would be better to do a new type data Abc = A | B | C... or something like that

That would force the user to pattern-match against all the constructors, unless it derives something like Enum or Ix, which is certainly possible. Interesting.

I guess these are the alternatives when implementing exercises that would normally allow invalid inputs:

  1. Expect the user to encode the error as a special value or implement a partial function.
  2. Signal the invalid input returning a Maybe or Either.
  3. Use a datatype that represents exactly the set of valid inputs.

I'm definitely against (1), because it promotes bad programming practices.
With (2) we get stronger typing, and also push to the student dealing with bad input data.
Finally, with (3) we have the strongest typing, and the student doesn't have to deal with invalid input and things like Maybe and Either.

Between (2) and (3), I think the choice depends mostly on what the exercise is supposed to teach, and also on how convenient it is to use a custom data type.

@Average-user
Copy link
Member

That would force the user to pattern-match against all the constructors, unless it derives something like Enum or Ix, which is certainly possible. Interesting.

Yhea, I was thinking on something like this.
I think that if you want to turn a partial function into a total one, it is better to use the right set, that to treat irrelevant elements of another set.

@Average-user
Copy link
Member

So. Should I make a PR? You want to make the PR?

@Average-user
Copy link
Member

Average-user commented Nov 21, 2017

What about this for example:

module Diamond (diamond) where

data ABC = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q |
           R | S | T | U | V | W | X | Y | Z deriving(Enum)

pad :: Int -> String
pad = flip replicate ' '

oneRow :: Char -> (Int, Int) -> String
oneRow c (0, y) = pad y ++ [c] ++ pad y
oneRow c (x, y) = pad y ++ [c] ++ pad x ++ [c] ++ pad y

diamond :: ABC -> [String]
diamond = (\x -> x ++ tail (reverse x)) . mkTop . succ . fromEnum
  where rows x = zip (0 : take (x-1) [1, 3..]) [x-1, x-2..0]
        mkTop  = zipWith oneRow ['A'..'Z'] . rows

@rbasso
Copy link
Contributor Author

rbasso commented Nov 22, 2017

What about this for example:

It would add at least Eq and Show instances too, making interactive use in GHCi more convenient . I guess Ord, Bounded and Ix would not hurt, but the user can add them if needed.

Instead of ABC I would probably use Letter or something more descriptive.

Ok...let me try a solution:

module Diamond (Letter(..), diamond) where

data Letter = A | B | C | D | E | F | G | H | I | J | K | L | M
            | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
            deriving (Eq, Show, Enum)

diamond :: Letter -> [String]
diamond x = matrix (diamondChar radius) [0..2*radius] [0..2*radius]
  where
    diamondChar r i j = if abs (i - r) + abs (j - r) == r
                        then toEnum $ r - abs (i - r) + fromEnum 'A'
                        else ' '
    radius = fromEnum x

-- Build a lazy, list-based matrix based on a formula and two ranges.
matrix :: (a -> b -> c) -> [a] -> [b] -> [[c]]
matrix f xs ys = [[ f x y
                  | y <- ys ]
                  | x <- xs ]

It compiles and passes the changed test suite, but I'm still not happy about it...

It feels a little weird to have an ADT input and output a [String]. Also, I could have used join to avoid repeating [0..2*radius]...

Anyway...

So. Should I make a PR? You want to make the PR?

Currently I don't have too much spare time, so I'll pass. 😄

Considering that using a big custom datatype isn't very common in the track, I think it would be better to first ask @petertseng what he thinks about it. He is probably very busy, but I'm sure he'll take a look at this issue soon.

@Average-user
Copy link
Member

Ok...let me try a solution:

I think mine is cleaner, (with the proper little changes like ABC -> Letters) but thats probably because I'm already used to it.

I've already done custom datatypes under his "watch". But I also think it is a good Idea to know what are his thoughts about this.

@petertseng
Copy link
Member

It feels a little weird to have an ADT input and output a [String]

It does... but I suppose I wouldn't object.

I can't help but notice that, for example, https://hackage.haskell.org/package/base-4.10.0.0/docs/Data-Char.html#v:digitToInt doesn't attempt to define a large datatype. It can indeed be interesting to use Enum, but there are other exercises where using it is already possible (try food-chain or something).

I'll take the large datatype if you insist on it, but I was thinking just adding the Maybe would be fine.

I'm sorry to bring the discussion to other exercises, but any thoughts on what this would mean for such exercises as nucleotide-count? Note that https://github.com/exercism/haskell/blob/master/exercises/nucleotide-count/test/Tests.hs#L46 asks to validate the strand against invalid characters in it; if we prevent invalid inputs in diamond, will we want to potentially do the same with exercises such as nucleotide-count? Does that change the answer to what we want to do with diamond?

You should not interpret this comment comment as saying it is required for this track to be consistent.

It can be noted that the Idris track did in fact restrict the valid inputs to hamming: https://github.com/exercism/idris/blob/master/exercises/hamming/src/example.idr

@rbasso
Copy link
Contributor Author

rbasso commented Nov 26, 2017

I'll take the large datatype if you insist on it, but I was thinking just adding the Maybe would be fine.

Agreed!

Also, I have the feeling now that creating a sum-type with all possible input values, while not a big problem in this case, is an anti-pattern. How would this scale if we had a range of a thousand values? What if we had digits in the range?

When I think about it, what I want is a character constrained to the ['A', 'Z'] range, not another data type. Maybe that would be easier in Idris.

I'm sorry to bring the discussion to other exercises, but any thoughts on what this would mean for such exercises as nucleotide-count?

While I think that enforcing invariables in the type system is a great idea, sometimes it is awkward to do that in Haskell. I don't think we should push all the exercises in this direction when it feels artificial in the language.

About nucleotide-count, using length-parameterized types is idiomatic in Idris, but it is still considered type-hackery in Haskell, which is an advanced topic and would confuse students.

It can be noted that the Idris track did in fact restrict the valid inputs to hamming: https://github.com/exercism/idris/blob/master/exercises/hamming/src/example.idr

I think using an ADT makes more sense in hamming, because it is about counting differences in sequences, so it could work for any Eq a.

diamond seems designed as a character exercise, because it necessarily contain spaces. If I where to rewrite it, I would consider changing its type to...

diamong :: Char -> Maybe Text

... punishing the use of String.

That wouldn't prevent the user of using an ADT and solve it:

data Letter = A | B | C | D | E | F | G | H | I | J | K | L | M
            | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
            deriving (Eq, Show, Enum)

toLetter :: Char -> Maybe Letter
toLetter = ...

diamond :: Char -> Maybe Text
diamond = fmap diamond'  . toLetter
  where
    diamond' :: Letter -> Text
    diamond' = ...

ps: Sorry for the long post!

@Average-user
Copy link
Member

I'll take the large datatype if you insist on it, but I was thinking just adding the Maybe would be fine.

So. Whats the conclusion ADT or Maybe Char ?

@petertseng
Copy link
Member

I say this. If you asked me to write the PR, I would use Maybe. But I will not be sufficiently motivated to write the PR. Therefore, the one who writes it gets to choose. Based on the proposals here, I find either acceptable.

@Average-user
Copy link
Member

If I you asked me to write the PR, I would use what I already commented, but I was thinking to wait for the Property based tests

petertseng pushed a commit that referenced this issue Feb 14, 2018
Previously, the function had undefined behaviour on non-letter inputs.
Now we can specify that it should return Nothing.
For now, intentionally no tests are added for Nothing.

Closes #626.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants