Skip to content

nucleotide-count: Nucleotides have a misleading type #694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
isovector opened this issue Jul 21, 2018 · 6 comments
Closed

nucleotide-count: Nucleotides have a misleading type #694

isovector opened this issue Jul 21, 2018 · 6 comments

Comments

@isovector
Copy link

The function nucleotideCounts :: String -> Either String (Map Char Int) I'd suggest has the wrong type. Nucleotides are certainly an enum and not a Char. I worry the exercise as it stands today discourages people from thinking about data modeling and instead plays into a bias to use strings more often than they should be.

@petertseng petertseng changed the title Nucleotides have a misleading type nucleotide-count: Nucleotides have a misleading type Jul 31, 2018
@petertseng
Copy link
Member

I agree -- an enum would make sense so the type system can help make clear there are only four possible values at each position.

I would also take this opportunity to address what was said in #626 (comment)

While I think that enforcing invariables in the type system is a great idea, sometimes it is awkward to do that in Haskell. I don't think we should push all the exercises in this direction when it feels artificial in the language.

About nucleotide-count, using length-parameterized types is idiomatic in Idris, but it is still considered type-hackery in Haskell, which is an advanced topic and would confuse students.

I still think there would be value in using a data for nucleotide-count; we don't have to do the Idris-style length-parameterisation.

I believe the files needed to edit to resolve this issue are the following:

@sshine
Copy link
Contributor

sshine commented Sep 8, 2018

I agree. But if we use a

data Nucleotide = A | C | G | T

then what's the point in having a Map Nucleotide Int when you could just have a

data NucleotideCount = NC { _a :: Int, _c :: Int, _g :: Int, _t :: Int }

but then you wouldn't need the Nucleotide data type if the original input is a String.

Perhaps one should keep the Map Nucleotide Int since that adds complexity.

@sshine
Copy link
Contributor

sshine commented Sep 8, 2018

Should the tests keep asserting that the map contains all four nucleotides when a nucleotide occurred zero times?

@isovector
Copy link
Author

@sshine I personally wouldn't let the NucleotideCount described above through in a real code review---it doesn't compose well with the ecosystem. Eg, writing Nucelotide -> NucelotideCount -> Int requires a pattern match that the compiler won't help you write correctly. When in doubt, use the definitions that are the hardest to get wrong.

@sshine
Copy link
Contributor

sshine commented Sep 8, 2018

@isovector Do you mean something other than what I did in PR #714?

@isovector
Copy link
Author

@sshine nope, #714 looks great to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants