etl: Add canonical data #507

bunnymatic · 2017-01-24T01:43:50Z

After scouring the existing test suites, I noticed that a few of them [Python, for example] (https://github.com/exercism/xpython/blob/master/exercises/etl/etl_test.py) include keys that are not just single letters but words. I've included one extra case at the bottom that tests that case. Not sure if it's something we want to include here or not.

kytrinyx · 2017-01-24T04:01:41Z

My initial thought is "no", but I'm willing to be convinced otherwise.

The way I'm thinking of this is that we have a scrabble alphabet, and we're changing the structure of the tile => point mapping.

If there are words (or multiple letters), that seems like a different problem.

bunnymatic · 2017-01-24T05:09:11Z

sounds good to me. i think i agree. the way the problem is stated is related to the scrabble board. i'll update accordingly Also... if it's any more support for leaving out the "word" tests, the majority of the existing tests *don't* test with words, only with letters.

…

On Mon, Jan 23, 2017 at 8:01 PM, Katrina Owen ***@***.***> wrote: My initial thought is "no", but I'm willing to be convinced otherwise. The way I'm thinking of this is that we have a scrabble alphabet, and we're changing the structure of the tile => point mapping. If there are words (or multiple letters), that seems like a different problem. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#507 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAaFdF30f99ZSGgslA8G_2Ham8aeFYfhks5rVXemgaJpZM4LrxIC> .

Insti · 2017-01-24T08:15:11Z

@bunnymatic, some Pull Request tips:

Title format is "problem-slug: description" in this case "etl: Add canonical data"
It's helpful to link to the description of the problem in the PR for canonical data files as this often needs to be referred to for context.
https://github.com/exercism/x-common/blob/master/exercises/etl/description.md

Insti · 2017-01-24T08:17:29Z

Word tests should not be included.

Insti · 2017-01-24T08:22:51Z

exercises/etl/canonical-data.json

+        }
+      },
+      {
+        "description": "tramsforms more values",


"transforms" is spelled incorrectly (multiple times).

I'd rather see more useful test descriptions.
"Transforms more values" is a possible description for all but one of these tests, what is this test actually testing?

i'll fix the typos. interestingly, the test descriptions are thin but consistently thin across the existing suites. I can beef them up. I was mimicking what i'd seen in the existing...

Do you have an idea for the test descriptions? I started with something more like
transforms a map with a single key and an array of one value into a map with one key and one value. That works for the first one, but the next ones get pretty wordy. Is that where you thought that might go?

If this file acted like I think of rspec describe/context/it blocks, the outer description could be kind of the prefix to the inner descriptions with something like

{ "transform": { "description": "transforms the input map", "cases": [ { "description": "with one key that has an array with one value into a map with one key and one value", "input": {"1": ["A"] }, "expected": { "a" : 1 } }, { "description": "with one key and multiple values into a map with multiple keys each with a single value", "input": { "1": ["A", "E", "I", "O", "U" ] }, "expected": { "a" : 1, ...

I'm happy to rewrite the descriptions, but i'd love a little direction or suggestion

See below for some suggestions.

Insti · 2017-01-25T18:26:28Z

exercises/etl/canonical-data.json

+    "description": "transforms the input map from a map of value: [ keys ] to a map of key: value for each of the keys",
+    "cases": [
+      {
+        "description": "transforms one value",


"single letter"

Could also be, "single score with a single letter"

Thanks @Insti for all these. I think i'm getting it. I'll make updates accordingly

Insti · 2017-01-25T18:26:54Z

exercises/etl/canonical-data.json

+      {
+        "description": "transforms more values",
+        "input": {
+          "1": ["A", "E", "I", "O", "U" ]


"multiple letters with the same score"

Insti · 2017-01-25T18:27:50Z

exercises/etl/canonical-data.json

+        }
+      },
+      {
+        "description": "transforms more keys",


"multiple scores with multiple letters"

Why is there no test for "multiple scores with a single letter" ?

Though it could be an interesting error/edge scenario, it doesn't fit in with the problem description which says we're dealing with scrabble tile scores. Because of scrabble rules, there is no case where one A tile is worth 1 and another is worth 2.

Insti · 2017-01-25T18:29:06Z

exercises/etl/canonical-data.json

+        }
+      },
+      {
+        "description": "transforms a full data set",


"multiple scores with differing numbers of letters"

Insti · 2017-01-25T18:43:44Z

exercises/etl/canonical-data.json

@@ -0,0 +1,62 @@
+{
+  "transform": {
+    "description": "transforms the input map from a map of value: [ keys ] to a map of key: value for each of the keys",


This prompting for a potential name of the method is not necessary, see: circular-buffer
It has a comment, and then the list of cases.

Despite having added the circular-buffer file, now I get to play devil's advocate: when the time comes to do #336, maybe someone will be unhappy that JSON files with only a single function-under-test have "cases" as a key under the top-level object, whereas those with multiple functions-under-test have "cases" in second-level objects.

But I have no particular reason to favour one way or the other until we come up with a proposed unified schema for all the JSON files.

If there is the idea that this file could be used to generate a test suite, wouldn't you kind of need the method name to know how to invoke the method in the test call?

The way the function is invoked is very language specific and will need to be specified by the language anyway.

@petertseng makes some good points above and I am sympathetic to the idea that some kind of identifier for the name of the problem (or section of the problem) can be specified in the JSON.

until we come up with a proposed unified schema for all the JSON files.

👍

So I guess what you're doing here is fine for now.

Insti · 2017-01-26T08:55:38Z

exercises/etl/canonical-data.json

    "cases": [
      {
-        "description": "transforms one value",
+        "description": "transforms a single letter",


'transforms ' is redundant in all these descriptions.

bunnymatic · 2017-01-26T21:21:40Z

This may look like a pile of new commits, it was a rebase on the most current master.

Only the last commit is truly new.

Just FYI

Insti · 2017-01-26T23:45:12Z

exercises/etl/canonical-data.json

@@ -38,7 +38,7 @@
        }
      },
      {
-        "description": "transforms the full set of scrabble tiles (multiple scores with differing numbers of letters)",
+        "description": "the full set of scrabble tiles (multiple scores with differing numbers of letters)",


That they are scrabble tiles is irrelevant and leads to overly verbose test names since many tracks construct the test name from the description.
"multiple scores with differing numbers of letters" should be sufficient to describe the test.

If any additional description about the problem is required it should be in the description.md file

bunnymatic · 2017-01-27T00:29:02Z

updated that description and squished this to 1 commit

petertseng

seems OK. One interesting observation about the keys of the input.

petertseng · 2017-01-27T04:50:35Z

exercises/etl/canonical-data.json

+      {
+        "description": "a single letter",
+        "input": {
+          "1": ["A"]


I know that this is a "1": (rather than 1:) because JSON would only allow string keys in their objects.

Is it the case that languages will usually use integers as their keys? If so, perhaps a note saying so? Can put it on the same level as description and cases. No standard key for that note, for some reason people ahve used "#": in the past, but maybe "notes": is better.

I also thought about this when I started this PR but then quickly got distracted with other changes. I'll add a note. I think it's an important thing to call out.

I looked at several data files. Not a single one uses notes. Though that is probably a better key (long term) I'm not sure this commit should be the one to start that. So I went with #.

If there is some documentation that could be updated... and if you guys want (me) to build a script to update all # entries with notes to lay down the "new" convention. I'd be happy to help. but for now, i'm sticking with #

Insti

Looks good @bunnymatic thanks for all the work on this and all the other ones you're working on.
I hope I'm not coming across as constantly critical.
I appreciate the work you're doing. ❤️

bunnymatic · 2017-01-27T16:22:16Z

Nope. I get it. I can see how with a large open source project like this, you want to lay down the rules and patterns as much as possible and that's hard given the open source ness and the scope (all the different languages).

I'm happy to help out. Hopefully the next ones will be closer to correct at the first commit.

petertseng

understood about "#" vs "notes", if we decide notes is the way of the future we can deal with it later.

Insti · 2017-01-27T18:40:36Z

Thanks @bunnymatic ❤️

* prime-factors: add canonical data (exercism#513) * etl: Add canonical data (exercism#507) Add ETL exercise canonical data

bunnymatic · 2017-01-28T02:22:33Z

Someone can probably close https://github.com/exercism/todo/issues/90 now

exercism/problem-specifications#507

* Add test for `version` method. Added required fixture files. Added support for passing xruby root to constructor. * Add unit test for 'generate' method. * Test generate when metadata repository is missing * Add test for `test_cases` method Add sample canonical-data.json to fixture metadata * Add test for GitCommand.short_sha

Insti changed the title ~~Add ETL exercise canonical data~~ etl: Add canonical data Jan 24, 2017

Insti reviewed Jan 24, 2017

View reviewed changes

Insti reviewed Jan 25, 2017

View reviewed changes

Insti reviewed Jan 26, 2017

View reviewed changes

Insti mentioned this pull request Jan 26, 2017

prime-factors: add canonical data #513

Merged

bunnymatic force-pushed the etl-shared-test-data branch from 375f814 to 734121f Compare January 26, 2017 21:21

Insti reviewed Jan 26, 2017

View reviewed changes

Add ETL exercise canonical data

ae2e713

bunnymatic force-pushed the etl-shared-test-data branch from c7eedc7 to ae2e713 Compare January 27, 2017 00:28

petertseng approved these changes Jan 27, 2017

View reviewed changes

Insti approved these changes Jan 27, 2017

View reviewed changes

Add comment about integer keys

c93afc5

petertseng approved these changes Jan 27, 2017

View reviewed changes

Insti merged commit 040a24f into exercism:master Jan 27, 2017

rpottsoh added a commit to rpottsoh/exercism-problem-specifications that referenced this pull request Jan 27, 2017

Updating Fork (#2)

d810bd2

* prime-factors: add canonical data (exercism#513) * etl: Add canonical data (exercism#507) Add ETL exercise canonical data

petertseng mentioned this pull request Jan 31, 2017

etl: use cases from x-common exercism/haskell#483

Merged

petertseng added a commit to exercism/haskell that referenced this pull request Feb 1, 2017

etl: use cases from x-common (#483)

68d44cf

exercism/problem-specifications#507

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl: Add canonical data #507

etl: Add canonical data #507

bunnymatic commented Jan 24, 2017

kytrinyx commented Jan 24, 2017

bunnymatic commented Jan 24, 2017 via email •

edited

Loading

Insti commented Jan 24, 2017

Insti commented Jan 24, 2017

Insti Jan 24, 2017 •

edited

Loading

bunnymatic Jan 24, 2017

bunnymatic Jan 24, 2017

Insti Jan 25, 2017

Insti Jan 25, 2017 •

edited

Loading

Insti Jan 25, 2017

bunnymatic Jan 25, 2017

Insti Jan 25, 2017 •

edited

Loading

Insti Jan 25, 2017 •

edited

Loading

Insti Jan 25, 2017

bunnymatic Jan 25, 2017

Insti Jan 25, 2017

Insti Jan 25, 2017

petertseng Jan 25, 2017

bunnymatic Jan 25, 2017

Insti Jan 26, 2017

Insti Jan 26, 2017

bunnymatic commented Jan 26, 2017 •

edited

Loading

Insti Jan 26, 2017

bunnymatic commented Jan 27, 2017

petertseng left a comment

petertseng Jan 27, 2017

bunnymatic Jan 27, 2017

bunnymatic Jan 27, 2017

Insti left a comment

bunnymatic commented Jan 27, 2017

petertseng left a comment

Insti commented Jan 27, 2017

bunnymatic commented Jan 28, 2017

etl: Add canonical data #507

etl: Add canonical data #507

Conversation

bunnymatic commented Jan 24, 2017

kytrinyx commented Jan 24, 2017

bunnymatic commented Jan 24, 2017 via email • edited Loading

Insti commented Jan 24, 2017

Insti commented Jan 24, 2017

Insti Jan 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Insti Jan 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Insti Jan 25, 2017 • edited Loading

Choose a reason for hiding this comment

Insti Jan 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bunnymatic commented Jan 26, 2017 • edited Loading

Choose a reason for hiding this comment

bunnymatic commented Jan 27, 2017

petertseng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Insti left a comment

Choose a reason for hiding this comment

bunnymatic commented Jan 27, 2017

petertseng left a comment

Choose a reason for hiding this comment

Insti commented Jan 27, 2017

bunnymatic commented Jan 28, 2017

bunnymatic commented Jan 24, 2017 via email •

edited

Loading

Insti Jan 24, 2017 •

edited

Loading

Insti Jan 25, 2017 •

edited

Loading

Insti Jan 25, 2017 •

edited

Loading

Insti Jan 25, 2017 •

edited

Loading

bunnymatic commented Jan 26, 2017 •

edited

Loading