-
-
Notifications
You must be signed in to change notification settings - Fork 556
simple-cipher: Add canonical-data.json #1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simple-cipher: Add canonical-data.json #1241
Conversation
Okay, I figured out: The building output was not so clear, I was missing some required properties and the output was saying too much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting one to deal with, for a few reasons. I thank you for rising to the challenge.
There are a few cases where encode
only has plaintext
as an input, and other times when it has key
and plaintext
. I would think that in all cases both are needed, otherwise the test would seem to be underspecified. How would we be able to determine the ciphertext if we don't have both a plaintext and a key?
Same comment with decode.
I considered explicitly testing plaintexts with non-letter characters (which I assume would just get passed through?) but since they are already done in https://github.com/exercism/problem-specifications/blob/master/exercises/atbash-cipher/canonical-data.json and https://github.com/exercism/problem-specifications/blob/master/exercises/rotational-cipher/canonical-data.json I don't feel a particular need to include them in this exercise. I understand that there may be tracks that implement this exercise but neither Atbash nor rotational cipher; I guess I can't defend my position in these cases
"input": { | ||
"plaintext": "aaaaaaaaaa" | ||
}, | ||
"expected": "cipher.key" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is interesting because we are to understand that this is not the literal string "cipher.key"
but instead the key of the cipher, whatever that may be.
This therefore falls under one of the categories of #1225 .
This suggests some possible courses of action:
- Make no change to this PR. It's up to readers of this JSON file to understand that certain fields are symbolic instead of literal strings:
- The
expected
of everyencode
- The
input.ciphertext
of everydecode
- I'm not even sure that this list is correct
- The
- Make no change to this PR. It's up to readers of this JSON file to understand that all strings beginning with the seven characters
cipher.
are symbolic instead of literal strings. - Make a change to this PR that uses a different type (perhaps some JSON object) for such symbolic inputs and/or outputs.
- Add your suggestion here
I will not yet express an preference ordering between these options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Make a change to this PR that uses a different type (perhaps some JSON object) for such symbolic inputs and/or outputs.
- Add your suggestion here
As a sort of combination of the above, what about the following:
"expected": {
"match": "^{cipher.key}$"
}
Where {symbol}
is to be interpolated with local variables before matching is applied.
On the surface, this approach seems to simply add complication to the current syntax. However, if this is to act as a precedent for future situations, the interpolation approach allows multiple variable values to be used.
Examples of such use:
"^{name.first} {name.last}$"
"^{a + b / c}$"
However, if my fellow maintainers think this is too complicated, my vote would be option 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is very specific to the canonical-data specification/syntax. I believe you guys know way more than me about where we are and where we should go. I'm completely open to suggestions.
"expected": "aaaaaaaaaa" | ||
}, | ||
{ | ||
"description": "It is reversible. I.e., if you apply decode in a encoded result, you must see the same plaintext encode parameter as a result of the decode method", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is useful to show the student what is possible. However, it is not possible for the test to reject any implementation that the previous two tests have not already rejected, since it just composes the previous two tests together.
Thus, the question is: Is the advantage of showing the student the possibilities worth the disadvantage of the test that does not help catch any more mistakes?
I personally thought it was not worth keeping this test. Let me know your thoughts on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first test guarantee that if we pass aaaaaaaaaa
as argument to the encode
, the result should be the same as the key. And the second one guarantee that if we pass key
as argument to the decode
we expect a aaaaaaaaaa
as result. This is the first test that uses a different plaintext
.
I believe that is good to explicitly verify that the algorithm is reversible, since is a characteristic of it. But you're right, if the encode and the decode are already verified with valid "reversibles"
outputs, the nesting of the encode and decode must be reversible.
I personally like this reversibles test cases. But I see your point
"description": "Incorrect key cipher", | ||
"cases": [ | ||
{ | ||
"description": "It throws an error with an all uppercase key", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since not all languages throw errors, I am considering asking for more language-agnostic wording. For example, "it rejects an all-uppercase key" or "it treats an all-uppercase key as an error" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I think it will be all wrong if I had used it throws an exception
, which is very specific. But throw a program error is quite universal for me, it's language agnostic and operating systems oriented.
For sure there is room for improvement, I liked your second suggestion more.
What do you think about my points?
"cases": [ | ||
{ | ||
"description": "It throws an error with an all uppercase key", | ||
"property": "new", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting since new
might imply too much an object-oriented interface, which would raise questions as to how this file is to be understood by languages less amenable to object-oriented interfaces.
I don't have a concrete suggestion yet. Let's see if there are any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this is a tricky one.
I think the problem is that we're not testing the constructor method itself, but verifying that assigning a invalid key to key
must cause a error, what is done in the constructor here.
One option for me would be set the property to key
or maybe something like key=
will be also valid for me. I don't have any better/concrete suggestion too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use the encode
property and name the case something like Encode using uppercase key
.
"expected": { "error": "Bad key" } | ||
}, | ||
{ | ||
"description": "It throws an error with empty key", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we can immediately know, upon providing an empty key, that we will never be able to encipher or decipher any texts, so we can immediately reject the empty key.
What about cases where the key is shorter than the text? How are they handled?
For anyone asking that question, the answer is in the currently last test case, entitled "It can handle messages longer than the key"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right
"expected": "aaaaaaaaaa" | ||
}, | ||
{ | ||
"description": "It is reversible. I.e., if you apply decode in a encoded result, you must see the same plaintext encode parameter as a result of the decode method", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same q as above re: whether showing the possibilities is worth the test which catches no extra mistakes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at my reply in the other review.
Thank you so much for your extensive feedback @petertseng! I will get back to it as soon as I can |
If we pass So you don't necessarily need both values for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general remark: I'm not a huge fan of starting every test case description with "it". To me, that does not add anything of value to the description. As an example, the second test case is named "It can encode". What does "it" refer to? I would much prefer it to be named something like "Encode using random key". Same goes for the other descriptions that start with "it".
"version": "1.0.0", | ||
"cases": [ | ||
{ | ||
"description": "Random key cipher", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the exercise's description, the random key part is actually the third and last step in the description. To me, this implies that the test cases involving the random key should also be moved to the bottom of the test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third step is making it fault tolerant. The first step is the cipher without key assignment, that's the random key. But yeah, I will organize the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gustavosobral Is is correct that you have not yet applied the re-ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first step is the Cipher without key, so that's the Random key
set of cases. The second is step is the cypher accepting keys, this is the Substitution
set of cases. The third step is making it fault tolerant, that's the Incorrect key
cases. I believe they are in order now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I think I did not explain it correctly. What I meant was not to re-order the individual test cases with the "Random key cipher" case, but the "Random key cipher" case itself. At the moment, the order is:
- "Random key cipher"
- "Substitution cipher"
- "Incorrect key cipher"
I would argue that the first case should actually be the last, as it is a special case of the "Substitution cipher" case. Furthermore, the README also lists it as the third case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you mean. So if I understood correctly, your suggestion is to change the order to:
- Substitution cipher
- Random key cipher
- Incorrect key cipher
Right? I think we all agree that the last step is making it fault tolerant against incorrect keys?
I see your point and believe that we can really see the Random key as a expansion of the Substitution part, but the thing is that in the exercise description.md it explicitly say that the step 2 is the random key cipher:
Step 2
Shift ciphers are no fun though when your kid sister figures it out. Try amending the code to allow us to specify a key and use that for the shift distance. This is called a substitution cipher.
So, the exercise description sees the functionality of accepting a key parameter as a expansion of having the random generated case. Which I also believe is correct. Did you see the point here?
Would you still suggest the re-order in this case?
Thank you so much for the feedback and sorry for the late reply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I then misread the description. It's fine to leave it as-is then.
"cases": [ | ||
{ | ||
"description": "It throws an error with an all uppercase key", | ||
"property": "new", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use the encode
property and name the case something like Encode using uppercase key
.
"description": "Substitution cipher", | ||
"cases": [ | ||
{ | ||
"description": "It keeps submitted key", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a useful test? The current format subtly implies state (keeping track of the key). Why not modify the input for the other test cases to have the key explicitly passed as an input value? This is how virtually canonical data cases are usually setup: all input is specified within the test case itself. As an example, the next case would then look like this:
{
"description": "It can encode",
"property": "encode",
"input": {
"plaintext": "aaaaaaaaaa",
"key": "abcdefghij"
},
"expected": "abcdefghij"
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. Thank you
Is not only the
I totally agree. Thanks for the feedback @ErikSchierboom |
Thank you guys for the extensive feedback! Please check my comments and my changes @petertseng @cmccandless and @ErikSchierboom |
Great set of improvements! I've just added a small question. |
@petertseng @rpottsoh Would you mind reviewing this PR in the current state? |
Overall I think this is great! Initially I was confused by the use of symbolic and literal strings. I don't think I have seen this used before in the testdata. It caught me be surprise. I didn't review all the comments above before reviewing the JSON. I just dove in. I think it could be worthwhile to include a comment of sorts at the beginning of the JSON to indicate that some of the strings are symbolic and perhaps to indicate which of those are symbolic. Once I finally reviewed the comments and the README the JSON made perfect sense. Thanks @gustavosobral for working on this. 👍 |
Thank you for the feedback @rpottsoh. It makes total sense to add comments in the file. I just did it, thanks. |
"exercise": "simple-cipher", | ||
"version": "1.0.0", | ||
"comments": | ||
["Some of the strings used in this file are symbolic and doesn't represent their literal value. They are:", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not
instead of doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment at the beginning.
@petertseng Are you interested in reviewing this PR? If not, not worries of course. |
With three approvers, I think we are now ready to merge this. Thanks a lot @gustavosobral! |
Can anyone clarify the CI building output for me? I see that is failing but I didn't understand why.
As I understood from the canonical-data.json specification, I should be able to nest
cases
keywords.For this problem in specific, there is a test case where we don't assert, but we use a match test to check a property against a regex, so I used a
match
key for it.Closes #586