Skip to content

Commit 518d1d1

Browse files
Cristine Guadelupejosevalim
andauthored
Data module (#49)
Co-authored-by: José Valim <[email protected]>
1 parent 4f9b9c3 commit 518d1d1

File tree

4 files changed

+807
-1
lines changed

4 files changed

+807
-1
lines changed

guides/data.livemd

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
# VegaLite Data
2+
3+
```elixir
4+
Mix.install([
5+
{:explorer, "~> 0.6.1"},
6+
{:kino, "~> 0.10.0"},
7+
{:vega_lite,
8+
git: "https://github.com/cristineguadelupe/vega_lite", branch: "cg-stats", override: true},
9+
{:kino_vega_lite, "~> 0.1.9"}
10+
])
11+
```
12+
13+
## Introduction
14+
15+
The `VegaLite.Data` module is designed to provide a shorthand API to plot commonly used charts and high-level abstractions for specialized plots.
16+
17+
The API can be combined with the main `VegaLite` module at any level and at any point, providing flexibility to achieve the same results in a more concise way without compromising expressiveness.
18+
19+
Throughout this guide, we will look at how to use the API alone, in combination with the `VegaLite` module, and also show some comparisons between all the possible paths to achieve the same plotting results.
20+
21+
**Limitations**: `VegaLite.Data` relies on internal type inference, and although all options may be overridden, only data that implements the `Table.Reader` protocol is supported.
22+
23+
For meaningful examples, we will use the *fuels* dataset directly from `Explorer`
24+
25+
```elixir
26+
alias Explorer.DataFrame, as: DF
27+
alias VegaLite, as: Vl
28+
alias VegaLite.Data
29+
30+
fuels = Explorer.Datasets.fossil_fuels()
31+
32+
data = [
33+
%{"category" => "A", "score" => 28},
34+
%{"category" => "B", "score" => 50},
35+
%{"category" => "C", "score" => 34},
36+
%{"category" => "D", "score" => 42},
37+
%{"category" => "E", "score" => 39}
38+
]
39+
```
40+
41+
## Chart - the shorthand api
42+
43+
`VegaLite.Data.chart/3` and `VegaLite.Data.chart/4` are the shorthand API. We will use these functions to get quick and concise plots. It's shine for plots that don't require a lot of configuration or customization.
44+
45+
`VegaLite.Data.chart/3` takes 3 arguments: the data, the mark and a list of fields to be encoded while `VegaLite.Data.chart/4` works similarly, but takes a valid `VegaLite` specification as the first argument.
46+
47+
```elixir
48+
# A simple bar plot
49+
Vl.new()
50+
|> Vl.data_from_values(data)
51+
|> Vl.mark(:bar)
52+
|> Vl.encode_field(:y, "score", type: :quantitative)
53+
|> Vl.encode_field(:x, "category", type: :nominal)
54+
```
55+
56+
```elixir
57+
# The same chart with the shorthand api
58+
Data.chart(data, :bar, x: "category", y: "score")
59+
```
60+
61+
Plotting a simple chart is a breeze! As we can see from the comparison above, the code becomes much leaner and handleable. However, the API also accepts a list of options for each argument, allowing more complex results.
62+
63+
```elixir
64+
# A line plot with point: true without the shorthand api
65+
Vl.new()
66+
|> Vl.data_from_values(fuels, only: ["total", "solid_fuel"])
67+
|> Vl.mark(:line, point: true)
68+
|> Vl.encode_field(:x, "total", type: :quantitative)
69+
|> Vl.encode_field(:y, "solid_fuel", type: :quantitative)
70+
```
71+
72+
```elixir
73+
# A line plot with point: true using the shorthand api
74+
Data.chart(fuels, [type: :line, point: true], x: "total", y: "solid_fuel")
75+
```
76+
77+
Now let's see a bit of iteroperability between the api and the main module. We'll plot the same line chart but now with a title and a custom width.
78+
79+
```elixir
80+
# Without the shorthand api
81+
Vl.new(title: "Fuels", width: 400)
82+
|> Vl.data_from_values(fuels, only: ["total", "solid_fuel"])
83+
|> Vl.mark(:line, point: true)
84+
|> Vl.encode_field(:x, "total", type: :quantitative)
85+
|> Vl.encode_field(:y, "solid_fuel", type: :quantitative)
86+
```
87+
88+
```elixir
89+
# With the shorthand api
90+
Vl.new(title: "Fuels", width: 400)
91+
|> Data.chart(fuels, [type: :line, point: true], x: "total", y: "solid_fuel")
92+
```
93+
94+
If a channel requires more configuration, the flexibility of the API comes into play.
95+
96+
```elixir
97+
Vl.new(width: 500, height: 300, title: "Fuels")
98+
|> Vl.data_from_values(fuels, only: ["total", "solid_fuel"])
99+
|> Vl.mark(:point)
100+
|> Vl.encode_field(:x, "total", type: :quantitative)
101+
|> Vl.encode_field(:y, "solid_fuel", type: :quantitative)
102+
|> Vl.encode_field(:color, "total", type: :quantitative, scale: [scheme: "category10"])
103+
```
104+
105+
In the example above, we have a color channel that requires more customization. While it's possible to get the exact same plot using only the shorthand API, the expressiveness may be sacrificed. It's precisely in these cases that using the API together with the main module will probably result in more readable code. Let's take a look and compare the possible combinations between the API and the `VegaLite` module.
106+
107+
```elixir
108+
# Using mainly the shorthand api
109+
Vl.new(width: 500, height: 300, title: "Combined")
110+
|> Data.chart(fuels, :point,
111+
x: "total",
112+
y: "solid_fuel",
113+
color: [field: "total", type: :quantitative, scale: [scheme: "category10"]]
114+
)
115+
```
116+
117+
```elixir
118+
# Piping the shorthand api into a enconde_field
119+
Vl.new(width: 500, height: 300, title: "Fuels")
120+
|> Data.chart(fuels, :point, x: "total", y: "solid_fuel")
121+
|> Vl.encode_field(:color, "total", type: :quantitative, scale: [scheme: "category10"])
122+
```
123+
124+
As we can see, the API is flexible enough to allow it to be piped from `VegaLite`, piped to `VegaLite` or both! In principle, you are free to choose the code that best suits your needs, ideally aiming for a balance between conciseness and expressiveness.
125+
126+
## Specialized plots
127+
128+
Specialized plots provide high-level abstractions for commonly used complex charts.
129+
130+
### Heatmap
131+
132+
Plotting heatmaps directly from VegaLite requires a lot of code.
133+
134+
For a more concrete example, we will use precomputed data from the correlation matrix of the wine dataset.
135+
136+
<!-- livebook:{"disable_formatting":true} -->
137+
138+
```elixir
139+
corr_to_plot = %{
140+
"corr_val" => [1.0, -0.02, 0.29, 0.09, 0.02, -0.05, 0.09, 0.27, -0.43, -0.02,
141+
-0.12, -0.11, -0.02, 1.0, -0.15, 0.06, 0.07, -0.1, 0.09, 0.03, -0.03, -0.04,
142+
0.07, -0.19, 0.29, -0.15, 1.0, 0.09, 0.11, 0.09, 0.12, 0.15, -0.16, 0.06,
143+
-0.08, -0.01, 0.09, 0.06, 0.09, 1.0, 0.09, 0.3, 0.4, 0.84, -0.19, -0.03,
144+
-0.45, -0.1, 0.02, 0.07, 0.11, 0.09, 1.0, 0.1, 0.2, 0.26, -0.09, 0.02, -0.36,
145+
-0.21, -0.05, -0.1, 0.09, 0.3, 0.1, 1.0, 0.62, 0.29, 0.0, 0.06, -0.25, 0.01,
146+
0.09, 0.09, 0.12, 0.4, 0.2, 0.62, 1.0, 0.53, 0.0, 0.13, -0.45, -0.17, 0.27,
147+
0.03, 0.15, 0.84, 0.26, 0.29, 0.53, 1.0, -0.09, 0.07, -0.78, -0.31, -0.43,
148+
-0.03, -0.16, -0.19, -0.09, 0.0, 0.0, -0.09, 1.0, 0.16, 0.12, 0.1, -0.02,
149+
-0.04, 0.06, -0.03, 0.02, 0.06, 0.13, 0.07, 0.16, 1.0, -0.02, 0.05, -0.12,
150+
0.07, -0.08, -0.45, -0.36, -0.25, -0.45, -0.78, 0.12, -0.02, 1.0, 0.44,
151+
-0.11, -0.19, -0.01, -0.1, -0.21, 0.01, -0.17, -0.31, 0.1, 0.05, 0.44, 1.0],
152+
"x" => ["fixed acidity", "volatile acidity", "citric acid", "residual sugar",
153+
"chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH",
154+
"sulphates", "alcohol", "quality", "fixed acidity", "volatile acidity",
155+
"citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
156+
"total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality",
157+
"fixed acidity", "volatile acidity", "citric acid", "residual sugar",
158+
"chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH",
159+
"sulphates", "alcohol", "quality", "fixed acidity", "volatile acidity",
160+
"citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
161+
"total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality",
162+
"fixed acidity", "volatile acidity", "citric acid", "residual sugar",
163+
"chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH",
164+
"sulphates", "alcohol", "quality", "fixed acidity", "volatile acidity",
165+
"citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
166+
"total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality",
167+
"fixed acidity", "volatile acidity", "citric acid", "residual sugar",
168+
"chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH",
169+
"sulphates", "alcohol", "quality", "fixed acidity", "volatile acidity",
170+
"citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
171+
"total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality",
172+
"fixed acidity", "volatile acidity", "citric acid", "residual sugar",
173+
"chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH",
174+
"sulphates", "alcohol", "quality", "fixed acidity", "volatile acidity",
175+
"citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
176+
"total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality",
177+
"fixed acidity", "volatile acidity", "citric acid", "residual sugar",
178+
"chlorides", "free sulfur dioxide", "total sulfur dioxide", "density", "pH",
179+
"sulphates", "alcohol", "quality", "fixed acidity", "volatile acidity",
180+
"citric acid", "residual sugar", "chlorides", "free sulfur dioxide",
181+
"total sulfur dioxide", "density", "pH", "sulphates", "alcohol", "quality"],
182+
"y" => ["fixed acidity", "fixed acidity", "fixed acidity", "fixed acidity",
183+
"fixed acidity", "fixed acidity", "fixed acidity", "fixed acidity",
184+
"fixed acidity", "fixed acidity", "fixed acidity", "fixed acidity",
185+
"volatile acidity", "volatile acidity", "volatile acidity",
186+
"volatile acidity", "volatile acidity", "volatile acidity",
187+
"volatile acidity", "volatile acidity", "volatile acidity",
188+
"volatile acidity", "volatile acidity", "volatile acidity", "citric acid",
189+
"citric acid", "citric acid", "citric acid", "citric acid", "citric acid",
190+
"citric acid", "citric acid", "citric acid", "citric acid", "citric acid",
191+
"citric acid", "residual sugar", "residual sugar", "residual sugar",
192+
"residual sugar", "residual sugar", "residual sugar", "residual sugar",
193+
"residual sugar", "residual sugar", "residual sugar", "residual sugar",
194+
"residual sugar", "chlorides", "chlorides", "chlorides", "chlorides",
195+
"chlorides", "chlorides", "chlorides", "chlorides", "chlorides", "chlorides",
196+
"chlorides", "chlorides", "free sulfur dioxide", "free sulfur dioxide",
197+
"free sulfur dioxide", "free sulfur dioxide", "free sulfur dioxide",
198+
"free sulfur dioxide", "free sulfur dioxide", "free sulfur dioxide",
199+
"free sulfur dioxide", "free sulfur dioxide", "free sulfur dioxide",
200+
"free sulfur dioxide", "total sulfur dioxide", "total sulfur dioxide",
201+
"total sulfur dioxide", "total sulfur dioxide", "total sulfur dioxide",
202+
"total sulfur dioxide", "total sulfur dioxide", "total sulfur dioxide",
203+
"total sulfur dioxide", "total sulfur dioxide", "total sulfur dioxide",
204+
"total sulfur dioxide", "density", "density", "density", "density",
205+
"density", "density", "density", "density", "density", "density", "density",
206+
"density", "pH", "pH", "pH", "pH", "pH", "pH", "pH", "pH", "pH", "pH", "pH",
207+
"pH", "sulphates", "sulphates", "sulphates", "sulphates", "sulphates",
208+
"sulphates", "sulphates", "sulphates", "sulphates", "sulphates", "sulphates",
209+
"sulphates", "alcohol", "alcohol", "alcohol", "alcohol", "alcohol",
210+
"alcohol", "alcohol", "alcohol", "alcohol", "alcohol", "alcohol", "alcohol",
211+
"quality", "quality", "quality", "quality", "quality", "quality", "quality",
212+
"quality", "quality", "quality", "quality", "quality"]
213+
}
214+
|> Explorer.DataFrame.new()
215+
```
216+
217+
```elixir
218+
Vl.new(title: "Correlation matrix", width: 600, height: 600)
219+
|> Vl.layers([
220+
Vl.new()
221+
|> Vl.data_from_values(corr_to_plot)
222+
|> Vl.mark(:rect)
223+
|> Vl.encode_field(:x, "x", type: :nominal)
224+
|> Vl.encode_field(:y, "y", type: :nominal)
225+
|> Vl.encode_field(:color, "corr_val", type: :quantitative),
226+
Vl.new()
227+
|> Vl.data_from_values(corr_to_plot)
228+
|> Vl.mark(:text)
229+
|> Vl.encode_field(:x, "x", type: :nominal)
230+
|> Vl.encode_field(:y, "y", type: :nominal)
231+
|> Vl.encode_field(:text, "corr_val", type: :quantitative)
232+
])
233+
```
234+
235+
We can use our already explored shorthand API to simplify it.
236+
237+
```elixir
238+
Vl.new(title: "Correlation matrix", width: 600, height: 600)
239+
|> Vl.layers([
240+
Data.chart(corr_to_plot, :rect,
241+
x: [field: "x", type: :nominal],
242+
y: [field: "y", type: :nominal],
243+
color: "corr_val"
244+
),
245+
Data.chart(corr_to_plot, :text,
246+
x: [field: "x", type: :nominal],
247+
y: [field: "y", type: :nominal],
248+
text: "corr_val"
249+
)
250+
])
251+
```
252+
253+
Or we can go even further and use the `VegaLite.Data.heatmap/2` function alone or the `VegaLite.Data.heatmap/3` function in combination with `VegaLite`.
254+
255+
The specialized plots follow the same principle as the shorthand API, they can be combined with the main module, and each argument can also take a list of options to override the defaults.
256+
257+
```elixir
258+
Vl.new(title: "Correlation matrix", width: 600, height: 600)
259+
|> Data.heatmap(corr_to_plot,
260+
x: "x",
261+
y: "y",
262+
color: "corr_val",
263+
text: "corr_val"
264+
)
265+
```

0 commit comments

Comments
 (0)