parallel-letter-frequency: add canonical data #2209

ErikSchierboom · 2023-02-24T14:58:57Z

Closes #574

junedev

Something to consider regarding the canonical data for this exercise: We had the data proposed here in the Go track before as well and students were always very disappointed/confused that the concurrent version was not actually faster that the sequencial one. In Go this is easy to access for a student as all tests include benchmarks. I would assume the problem also exists in other languages as concurrency primitives usually have a price that is only worth it if the parallel processing happens for a while.

While "it's not faster" is a nice learning, I am not sure it is the intention of the exercise so this is why I am bringing this up here.
cc @kytrinyx

Here the data we use in the Go track now where you actually see an improvement with the concurrent version:
https://github.com/exercism/go/blob/main/exercises/practice/parallel-letter-frequency/parallel_letter_frequency_test.go
I imagine for some languages you would need an even bigger input to see any benefit.

ErikSchierboom · 2023-02-27T12:01:01Z

Here the data we use in the Go track now where you actually see an improvement with the concurrent version:
https://github.com/exercism/go/blob/main/exercises/practice/parallel-letter-frequency/parallel_letter_frequency_test.go
I imagine for some languages you would need an even bigger input to see any benefit.

I did consider including a very large input, but was unsure what people would think. Thoughts?

ErikSchierboom · 2023-03-01T07:38:48Z

I've updated the large texts test case to match the data used in the Go exercise.

andrerfcsantos · 2023-03-01T19:55:43Z

This is exciting!

I'm OK with this being merged as is, but here are some thoughts on this:

Right now, the tests have really small strings or really big ones. I think we should have more cases in between. Maybe cases where each text is a sentence or a short paragraph with punctuation. That way, both the counting logic and the "ignore non-letters characters" logic could be tested simultaneously, in addition to the tests that test for these things separately. It would make for a more interesting debug experience in general. One idea for a test case would be the famous The quick brown fox jumped over the lazy dog.. It includes all letters of the English alphabet, punctuation, and a capital letter, but any other sentence would do too.
I see a test case with non-ASCII characters. Should this test case be put behind a scenario/property so tracks could more easily filter it out? I'm thinking of languages that can't handle utf8 out-of-the-box that could have a problem with this test case.
The way I see it, there are 3 main definitions for what a "large input" is for this exercise: few big strings, many small strings and many big strings. In Go, we were more interested in the first definition since it was the one that would allow us to make the concurrent version faster. But I think there's value in exploring the second option too, even if it isn't faster in the concurrent version. What happens when you have 100 strings that might or not be small and give now string to each thread/coroutine/goroutine? While for Go having 100 goroutines is not a problem, maybe it can be a problem for Java or other languages to have 100 threads, and this would allow exploring those scenarios too. Maybe it's worth having these as different kinds of large inputs as different scenarios/properties too?

petertseng

please amend the commit message to include "closes https://github.com/exercism/problem-specifications/issues/574" or any equivalent string, thank you

ErikSchierboom · 2023-03-02T08:42:04Z

I've changed a couple of things:

Added the unicode scenario to the unicode test case
Added a test with 50 small texts ("abbccc")
Added a test that has some sentences, which have a combination of lower and uppercase letters, whitespace and punctuation.

ErikSchierboom · 2023-03-02T08:42:52Z

please amend the commit message to include "closes #574" or any equivalent string, thank you

Done. I've also added this to the PR description.

ErikSchierboom · 2023-03-02T13:22:24Z

@junedev @petertseng Are you happy with the changes I made?

junedev · 2023-03-02T21:56:50Z

@ErikSchierboom I preferred the version without the last test about the many small inputs. I always saw this exercise as being a good starter to practice concurrency primitives for the first time. I would have left the "tuning how much you do concurrently" part for another exercise. I'm still ok waving this through, just my personal opinion.

ErikSchierboom · 2023-03-03T07:35:34Z

I'm fine with removing it. Let's hear what @petertseng think.

petertseng

Hmm, I think that rather depends on the teaching goals of this exercise versus that future exercise. But that exercise doesn't exist yet and this one does now. So what I would think to do is: Take the test with many small texts for now. Once that future exercise is made, stop recommending the test with many small texts, if it's better suited for that exercise.

(Of course, we've discussed in the past we don't have a really good way to say that a test is no longer recommended since reimplements is the only mechanism, but I don't think that should be considered fatal to this idea)

exercises/parallel-letter-frequency/canonical-data.json

ErikSchierboom · 2023-03-07T07:49:20Z

Hmm, I think that rather depends on the teaching goals of this exercise versus that future exercise. But that exercise doesn't exist yet and this one does now. So what I would think to do is: Take the test with many small texts for now. Once that future exercise is made, stop recommending the test with many small texts, if it's better suited for that exercise.

@junedev would you be okay with that?

(Of course, we've discussed in the past we don't have a really good way to say that a test is no longer recommended since reimplements is the only mechanism, but I don't think that should be considered fatal to this idea)

I think we might need some way to deprecate a test case without it being reimplemented.

junedev · 2023-03-07T07:56:27Z

@ErikSchierboom Whatever you/others think is best is fine for me. I just wanted to mention it has a small drawback, that was all.

ErikSchierboom · 2023-03-07T08:07:14Z

I'm not entirely sure. CC @exercism/reviewers I'd be curious in hearing your thoughts.

Closes #574

ErikSchierboom · 2023-03-14T19:06:22Z

Thanks everyone for chiming in! I've decided to leave the many texts test case in there, as I think it is interesting.

ErikSchierboom mentioned this pull request Feb 24, 2023

parallel-letter-frequency: add exercise exercism/gleam#238

Closed

junedev reviewed Feb 25, 2023

View reviewed changes

junedev approved these changes Mar 1, 2023

View reviewed changes

petertseng approved these changes Mar 1, 2023

View reviewed changes

ErikSchierboom force-pushed the parallel-letter-frequency-canonical-data branch 4 times, most recently from b61d450 to 88fd5ee Compare March 2, 2023 08:41

andrerfcsantos approved these changes Mar 2, 2023

View reviewed changes

ErikSchierboom mentioned this pull request Mar 3, 2023

scenarios: add concurrent scenario #2218

Merged

petertseng approved these changes Mar 6, 2023

View reviewed changes

exercises/parallel-letter-frequency/canonical-data.json Show resolved Hide resolved

ErikSchierboom added 2 commits March 7, 2023 09:10

parallel-letter-frequency: add canonical data

f4ec140

Closes #574

Add concurrent scenario

9fc4e72

ErikSchierboom force-pushed the parallel-letter-frequency-canonical-data branch from 32b659f to 9fc4e72 Compare March 7, 2023 08:11

ErikSchierboom merged commit 822d524 into main Mar 14, 2023

ErikSchierboom deleted the parallel-letter-frequency-canonical-data branch March 14, 2023 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel-letter-frequency: add canonical data #2209

parallel-letter-frequency: add canonical data #2209

ErikSchierboom commented Feb 24, 2023 •

edited

Loading

junedev left a comment •

edited

Loading

ErikSchierboom commented Feb 27, 2023

ErikSchierboom commented Mar 1, 2023

andrerfcsantos commented Mar 1, 2023 •

edited

Loading

petertseng left a comment

ErikSchierboom commented Mar 2, 2023

ErikSchierboom commented Mar 2, 2023

ErikSchierboom commented Mar 2, 2023

junedev commented Mar 2, 2023

ErikSchierboom commented Mar 3, 2023

petertseng left a comment

ErikSchierboom commented Mar 7, 2023

junedev commented Mar 7, 2023

ErikSchierboom commented Mar 7, 2023

ErikSchierboom commented Mar 14, 2023

parallel-letter-frequency: add canonical data #2209

parallel-letter-frequency: add canonical data #2209

Conversation

ErikSchierboom commented Feb 24, 2023 • edited Loading

junedev left a comment • edited Loading

Choose a reason for hiding this comment

ErikSchierboom commented Feb 27, 2023

ErikSchierboom commented Mar 1, 2023

andrerfcsantos commented Mar 1, 2023 • edited Loading

petertseng left a comment

Choose a reason for hiding this comment

ErikSchierboom commented Mar 2, 2023

ErikSchierboom commented Mar 2, 2023

ErikSchierboom commented Mar 2, 2023

junedev commented Mar 2, 2023

ErikSchierboom commented Mar 3, 2023

petertseng left a comment

Choose a reason for hiding this comment

ErikSchierboom commented Mar 7, 2023

junedev commented Mar 7, 2023

ErikSchierboom commented Mar 7, 2023

ErikSchierboom commented Mar 14, 2023

ErikSchierboom commented Feb 24, 2023 •

edited

Loading

junedev left a comment •

edited

Loading

andrerfcsantos commented Mar 1, 2023 •

edited

Loading