Skip to content

canonical-data.json needs standardisation #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
catb0t opened this issue Sep 12, 2016 · 5 comments
Closed

canonical-data.json needs standardisation #376

catb0t opened this issue Sep 12, 2016 · 5 comments

Comments

@catb0t
Copy link
Contributor

catb0t commented Sep 12, 2016

Hello,

I maintain the Factor track, and I'd like to automate generation of unit tests for exercises in my language.

Looking at exercises/leap/canonical-data.json it would seem to be quite simple. However, many of the canonical-data.jsons don't have a standard set of keys found in leap's json, and this makes it difficult to automate around.

There are, as far as I can tell, two solutions to the problems introduced by the inconsistencies.

  • Rather than hardcoding the description, input and expected keys, use a regex / fuzzy find to
    group keys into description, input and output. The main disadvantages of this are twofold: not
    only must my code be flimsy, but so must everyone else's, and subject to break on the whims of anyone.
  • Standardise on a fixed, predictable set of keys and what their values represent. This makes the jobs of track maintainers easier, simplifies interacting code, and future-proofs the api and the code.

I think standardisation would be greatly beneficial, and if we make an API more accessible, perhaps more tracks will automate generation / regeneration of tests, which would be positive.

But before I open a pull request with structural changes to hundreds of lines of data, I'd like some feedback.

First, is anyone objected to changing the names of the keys? They're rather haphazard (nearly as if
it had been written for humans to read ): ) and some exercises are missing canonical-data.json altogether, and consequently I have difficulty believing there are programs reading this stuff.
Second, what keys should be used? I'm thinking something like:

  • For exercises with one input translating to one output, description, input and output.
  • For exercises with multiple inputs / multiple outputs, description, input_N, output_N.

Note that it would be disadvantageous to use an array for multiple inputs / outputs where an array is not part of the exercise because it would be hard or impossible to tell the difference between multiple inputs and an actual array. We could have keys like input_multi which is an array of inputs, I suppose?

Thoughts?

@catb0t
Copy link
Contributor Author

catb0t commented Sep 12, 2016

Also while I'm talking about this API, are the canonical-data.json hosted somewhere other than https://raw.githubusercontent.com/exercism/x-common/master/exercises/${EXERCISE}/canonical-data.json, or is that where I should grab it from?

@catb0t
Copy link
Contributor Author

catb0t commented Sep 12, 2016

@kytrinyx Idk if you get these notifications :(

@petertseng
Copy link
Member

petertseng commented Sep 13, 2016

Duplicates #336


Also while I'm talking about this API, are the canonical-data.json hosted somewhere other than https://raw.githubusercontent.com/exercism/x-common/master/exercises/${EXERCISE}/canonical-data.json, or is that where I should grab it from?

I believe that is the place; at least I'm not aware of any other places!

nearly as if it had been written for humans to read

I think this may not be that far from the truth, though I will argue later on: "why not both?"

I have difficulty believing there are programs reading this stuff.

I documented a few examples in https://github.com/exercism/x-api/issues/113

Go:

Ruby:

Between the fields referenced in example.tt and lib, that defines the structure that the JSON is supposed to have.

Scala: https://github.com/exercism/xscala/tree/master/testgen/src/main/scala

There are case classes defining the expected structure.

So what does this all mean! This means that currently, these tracks have to define the expected structure on a per-exercise basis! Standardisation could allow them to have less custom logic per exercise. I'm not sure it's avoidable for some statically typed languages, though, since they may still have to define the types of the values (some exercises have integer inputs, some exercises have string inputs, etc)

For exercises with multiple inputs / multiple outputs, description,input_N, output_N.

I see that this is easy for a machine to read. Can we simultaneously make it easy for a human to read as well? Consider that in e.g. https://github.com/exercism/x-common/blob/master/exercises/all-your-base/canonical-data.json I imagine that many tracks will pass in three inputs: input_base, input_digits, output_base, and then check that the output digits are as specified in output_digits. If the data then simply looked like "input_1": 2, "input_2": [1], "input_3": 10, "output": [1] I think it might not be clear what is the difference between input_1 and input_3 to a human, and I consider this important for being able to understand PRs that propose to change the test cases.

@catb0t
Copy link
Contributor Author

catb0t commented Sep 13, 2016

I didn't think this really was a dupe of #336 because I'd read that before, but perhaps you're right.

I believe we can simultaneously make the JSON easier for humans and programs to read, but the way it is now makes it very hard to make a generalising program.

The examples you've linked to share a common problem: because each exercise has a different structure, each exercise needs its own separate, different test generator program.

This is, to me, to put it plainly, an insane amount of unecessary work -- my goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible. I don't want a different ${exercisename}-testgen.factor for each different JSON structure.

@kytrinyx
Copy link
Member

Idk if you get these notifications :(

I do, but I've been traveling for the past week and am only just catching up.

I would like this to be merged with #336 which is the same topic. The goal is the same in both of these threads: to be able to generate the test suites.

@catb0t would you mind collecting your arguments and observations from this thread and adding them to the other one? That would let you and @zenspider and @devonestes get on the same page about what the problem and potential solutions are, and others could chime in to help sort it out as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants