|
1 | 1 | # Arquero
|
2 | 2 |
|
| 3 | +Arquero is a JavaScript library for query processing and transformation of array-backed data tables. |
| 4 | + |
| 5 | +Arquero (version <strong>${aq.version ?? "nope"}</strong>) is available by default as the **aq** symbol from Observable’s stdlib: |
| 6 | + |
3 | 7 | ```js echo
|
4 | 8 | aq
|
5 | 9 | ```
|
| 10 | + |
| 11 | +Following the documentation website’s [introduction](https://uwdata.github.io/arquero/), let’s extract some methods: |
| 12 | + |
| 13 | +```js echo |
| 14 | +const { all, desc, op, table } = aq; |
| 15 | +``` |
| 16 | + |
| 17 | +We can then create a table of the Average hours of sunshine per month, from [usclimatedata.com](https://usclimatedata.com/). |
| 18 | + |
| 19 | +```js echo |
| 20 | +const dt = table({ |
| 21 | + 'Seattle': [69, 108, 178, 207, 253, 268, 312, 281, 221, 142, 72, 52], |
| 22 | + 'Chicago': [135, 136, 187, 215, 281, 311, 318, 283, 226, 193, 113, 106], |
| 23 | + 'San Francisco': [165, 182, 251, 281, 314, 330, 300, 272, 267, 243, 189, 156] |
| 24 | +}); |
| 25 | +``` |
| 26 | + |
| 27 | +As we see, Arquero is column-oriented: each column is an array of values of a given type (here, numbers representing hours of sunshine per month). |
| 28 | + |
| 29 | +But a table is also iterable and as such, its contents can be displayed with [Inputs.table](/lib/inputs#table). |
| 30 | + |
| 31 | +```js echo |
| 32 | +Inputs.table(dt, {width: 370}) |
| 33 | +``` |
| 34 | + |
| 35 | +An Arquero table can be used as a data source to make happy charts with [Observable Plot](/lib/plot): |
| 36 | + |
| 37 | +```js echo |
| 38 | +Plot.plot({ |
| 39 | + height: 150, |
| 40 | + x: {label: "month"}, |
| 41 | + y: {zero: true, grid: true, label: "hours of ☀️"}, |
| 42 | + marks: [ |
| 43 | + Plot.lineY(dt, {y: "Seattle", marker: true, stroke: "red"}), |
| 44 | + Plot.lineY(dt, {y: "Chicago", marker: true, stroke: "turquoise"}), |
| 45 | + Plot.lineY(dt, {y: "San Francisco", marker: true, stroke: "orange"}) |
| 46 | + ] |
| 47 | +}) |
| 48 | +``` |
| 49 | + |
| 50 | +Arquero supports a range of data transformation tasks, including filter, sample, aggregation, window, join, and reshaping operations. For example, the following operation derives differences between Seattle and Chicago and sorts the months accordingly. |
| 51 | + |
| 52 | +```js |
| 53 | +Inputs.table(diffs, {width: 250}) |
| 54 | +``` |
| 55 | + |
| 56 | +```js echo |
| 57 | +const diffs = dt.derive({ |
| 58 | + month: d => op.row_number(), |
| 59 | + diff: d => d.Seattle - d.Chicago |
| 60 | + }) |
| 61 | + .select('month', 'diff') |
| 62 | + .orderby(desc('diff')); |
| 63 | +``` |
| 64 | + |
| 65 | +Is Seattle more correlated with San Francisco or Chicago? |
| 66 | + |
| 67 | +```js |
| 68 | +Inputs.table(correlations, {width: 250}) |
| 69 | +``` |
| 70 | + |
| 71 | +```js echo |
| 72 | +const correlations = dt.rollup({ |
| 73 | + corr_sf: op.corr('Seattle', 'San Francisco'), |
| 74 | + corr_chi: op.corr('Seattle', 'Chicago') |
| 75 | +}) |
| 76 | +``` |
| 77 | + |
| 78 | +We can aggregate statistics per city: the following reshapes (folds) the data to a two column layout: city, sun, and shows the output as objects: |
| 79 | + |
| 80 | +```js echo |
| 81 | +dt.fold(all(), { as: ['city', 'sun'] }) |
| 82 | + .groupby('city') |
| 83 | + .rollup({ |
| 84 | + min: d => op.min(d.sun), // functional form of op.min('sun') |
| 85 | + max: d => op.max(d.sun), |
| 86 | + avg: d => op.average(d.sun), |
| 87 | + med: d => op.median(d.sun), |
| 88 | + // functional forms permit flexible table expressions |
| 89 | + skew: ({sun: s}) => (op.mean(s) - op.median(s)) / op.stdev(s) || 0 |
| 90 | + }) |
| 91 | + .objects() |
| 92 | +``` |
| 93 | + |
| 94 | +For more, see [Arquero’s official documentation](https://uwdata.github.io/arquero/). |
0 commit comments