Skip to content

Commit 840802f

Browse files
committed
version bump to 0.2.0
1 parent 124f659 commit 840802f

File tree

2 files changed

+61
-1
lines changed

2 files changed

+61
-1
lines changed

CHANGELOG

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,63 @@
1+
0.2.0
2+
=====
3+
Materialized view for `play`. `Query` demolition.
4+
5+
ATTN: This introduces a breaking change. The `team` field can no longer
6+
be used in the `play` method. Instead, you should use the new
7+
`play_player` method to select individual player statistics belonging to
8+
a specific team.
9+
10+
Otherwise, there are very few public facing changes, but the entire
11+
guts of `nfldb.Query` have been ripped out and replaced with more
12+
robust SQL generation code. Moreover, several idiosyncracies have been
13+
fixed and some unit tests have finally been added.
14+
15+
1. Previously, the `Query` class was doing some very clever things to do
16+
parts of a JOIN in Python code. The general flow was that filtering
17+
was applied to find primary keys---never using any JOINs---and once
18+
all criteria had been applied, those ids were used in a simple SELECT
19+
to fetch the actual rows.
20+
21+
Now all of that cruft has been removed and replaced with intelligent
22+
SQL generation that constructs one query with all the proper JOINs.
23+
For whatever reason, I thought this was slower when experimenting
24+
with it when I first started nfldb. Perhaps my indexes weren't
25+
configured properly then. In any case, I can't really see much
26+
performance difference.
27+
28+
2. The SQL generation code is very smart. Although it is not part of
29+
nfldb's public API, I imagine it would be very useful if you had some
30+
special needs. See the unexported but documented `nfldb.sql` module.
31+
32+
3. Many idiosyncracies resulting from doing a join in Python are now
33+
completely gone. For example, if you tried to apply a `sort` with a
34+
`limit` with complex search criteria, you were bound to get wrong
35+
answers. For example, if you tried sorting by both a column on the
36+
`week` table (like `down`) and a column on `play_player` (like
37+
`passing_tds`) and applied a limit to it, the results would be
38+
completely wonky because the pure Python join can't cope with it
39+
performantly. A regular SQL join? Piece of cake.
40+
41+
4. I have added a materialized view `agg_play`. This is a fancy word for
42+
"a table that automatically updates itself." In essence, whenever a
43+
new row is added to `play_player`, aggregate statistics for that play
44+
are re-computed. This makes adding data slower (which doesn't happen
45+
very frequently), but it makes querying data much faster and easier.
46+
For example, plays can be queried for `passing_yds` without ever
47+
joining with `play_player`. (Which is wonky because of the
48+
one-to-many relationship.)
49+
To reflect this clearer separation of concerns, the `Query.play`
50+
method will no longer add criteria that hits the `play_player` table.
51+
Instead, if you really want the `play_player` table, then you can use
52+
the new `play_player` method. The only field that was accepted in the
53+
`play` that is no longer allowed is the `team` and `player_id`
54+
fields. This is because there is no sensible way to aggregate these
55+
values into a single play.
56+
57+
To the best of my knowledge, that is the only possible breaking
58+
change here.
59+
60+
161
0.1.6
262
=====
363
- Add better error message when config file can't be found.

nfldb/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = '0.1.6'
1+
__version__ = '0.2.0'
22

33
__pdoc__ = {
44
'__version__': "The version of the installed nfldb module.",

0 commit comments

Comments
 (0)