canonical-data.json needs standardisation #376

catb0t · 2016-09-12T14:34:17Z

Hello,

I maintain the Factor track, and I'd like to automate generation of unit tests for exercises in my language.

Looking at exercises/leap/canonical-data.json it would seem to be quite simple. However, many of the canonical-data.jsons don't have a standard set of keys found in leap's json, and this makes it difficult to automate around.

There are, as far as I can tell, two solutions to the problems introduced by the inconsistencies.

Rather than hardcoding the description, input and expected keys, use a regex / fuzzy find to
group keys into description, input and output. The main disadvantages of this are twofold: not
only must my code be flimsy, but so must everyone else's, and subject to break on the whims of anyone.
Standardise on a fixed, predictable set of keys and what their values represent. This makes the jobs of track maintainers easier, simplifies interacting code, and future-proofs the api and the code.

I think standardisation would be greatly beneficial, and if we make an API more accessible, perhaps more tracks will automate generation / regeneration of tests, which would be positive.

But before I open a pull request with structural changes to hundreds of lines of data, I'd like some feedback.

First, is anyone objected to changing the names of the keys? They're rather haphazard (nearly as if
it had been written for humans to read ): ) and some exercises are missing canonical-data.json altogether, and consequently I have difficulty believing there are programs reading this stuff.
Second, what keys should be used? I'm thinking something like:

For exercises with one input translating to one output, description, input and output.
For exercises with multiple inputs / multiple outputs, description, input_N, output_N.

Note that it would be disadvantageous to use an array for multiple inputs / outputs where an array is not part of the exercise because it would be hard or impossible to tell the difference between multiple inputs and an actual array. We could have keys like input_multi which is an array of inputs, I suppose?

Thoughts?

The text was updated successfully, but these errors were encountered:

catb0t · 2016-09-12T14:48:57Z

Also while I'm talking about this API, are the canonical-data.json hosted somewhere other than https://raw.githubusercontent.com/exercism/x-common/master/exercises/${EXERCISE}/canonical-data.json, or is that where I should grab it from?

catb0t · 2016-09-12T20:24:29Z

@kytrinyx Idk if you get these notifications :(

petertseng · 2016-09-13T01:40:10Z

Duplicates #336

Also while I'm talking about this API, are the canonical-data.json hosted somewhere other than https://raw.githubusercontent.com/exercism/x-common/master/exercises/${EXERCISE}/canonical-data.json, or is that where I should grab it from?

I believe that is the place; at least I'm not aware of any other places!

nearly as if it had been written for humans to read

I think this may not be that far from the truth, though I will argue later on: "why not both?"

I have difficulty believing there are programs reading this stuff.

I documented a few examples in https://github.com/exercism/x-api/issues/113

Go:

An example_gen.go in each exercise directory e.g. https://github.com/exercism/xgo/blob/master/exercises/leap/example_gen.go - it defines the structure that the file is expected to have.
Common code at https://github.com/exercism/xgo/blob/master/gen/gen.go

Ruby:

A small script in the bin directory e.g. https://github.com/exercism/xruby/blob/master/bin/generate-leap
A file in lib adding convenience functions on each case e.g. https://github.com/exercism/xruby/blob/master/lib/leap_cases.rb
An example.tt in each exercise directory e.g. https://github.com/exercism/xruby/blob/master/exercises/leap/example.tt

Between the fields referenced in example.tt and lib, that defines the structure that the JSON is supposed to have.

Scala: https://github.com/exercism/xscala/tree/master/testgen/src/main/scala

There are case classes defining the expected structure.

So what does this all mean! This means that currently, these tracks have to define the expected structure on a per-exercise basis! Standardisation could allow them to have less custom logic per exercise. I'm not sure it's avoidable for some statically typed languages, though, since they may still have to define the types of the values (some exercises have integer inputs, some exercises have string inputs, etc)

For exercises with multiple inputs / multiple outputs, description,input_N, output_N.

I see that this is easy for a machine to read. Can we simultaneously make it easy for a human to read as well? Consider that in e.g. https://github.com/exercism/x-common/blob/master/exercises/all-your-base/canonical-data.json I imagine that many tracks will pass in three inputs: input_base, input_digits, output_base, and then check that the output digits are as specified in output_digits. If the data then simply looked like "input_1": 2, "input_2": [1], "input_3": 10, "output": [1] I think it might not be clear what is the difference between input_1 and input_3 to a human, and I consider this important for being able to understand PRs that propose to change the test cases.

catb0t · 2016-09-13T12:29:00Z

I didn't think this really was a dupe of #336 because I'd read that before, but perhaps you're right.

I believe we can simultaneously make the JSON easier for humans and programs to read, but the way it is now makes it very hard to make a generalising program.

The examples you've linked to share a common problem: because each exercise has a different structure, each exercise needs its own separate, different test generator program.

This is, to me, to put it plainly, an insane amount of unecessary work -- my goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible. I don't want a different ${exercisename}-testgen.factor for each different JSON structure.

kytrinyx · 2016-09-19T15:44:47Z

Idk if you get these notifications :(

I do, but I've been traveling for the past week and am only just catching up.

I would like this to be merged with #336 which is the same topic. The goal is the same in both of these threads: to be able to generate the test suites.

@catb0t would you mind collecting your arguments and observations from this thread and adding them to the other one? That would let you and @zenspider and @devonestes get on the same page about what the problem and potential solutions are, and others could chime in to help sort it out as well.

catb0t mentioned this issue Sep 12, 2016

Short-term TODO exercism/factor#19

Open

5 tasks

catb0t closed this as completed Sep 21, 2016

catb0t mentioned this issue Sep 21, 2016

canonical-data.json standardisation discussion (was: Malformed data?) #336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canonical-data.json needs standardisation #376

canonical-data.json needs standardisation #376

catb0t commented Sep 12, 2016 •

edited

Loading

catb0t commented Sep 12, 2016 •

edited

Loading

catb0t commented Sep 12, 2016

petertseng commented Sep 13, 2016 •

edited by catb0t

Loading

catb0t commented Sep 13, 2016

kytrinyx commented Sep 19, 2016

canonical-data.json needs standardisation #376

canonical-data.json needs standardisation #376

Comments

catb0t commented Sep 12, 2016 • edited Loading

catb0t commented Sep 12, 2016 • edited Loading

catb0t commented Sep 12, 2016

petertseng commented Sep 13, 2016 • edited by catb0t Loading

catb0t commented Sep 13, 2016

kytrinyx commented Sep 19, 2016

catb0t commented Sep 12, 2016 •

edited

Loading

catb0t commented Sep 12, 2016 •

edited

Loading

petertseng commented Sep 13, 2016 •

edited by catb0t

Loading