|
| 1 | +0.2.0 |
| 2 | +===== |
| 3 | +Materialized view for `play`. `Query` demolition. |
| 4 | + |
| 5 | +ATTN: This introduces a breaking change. The `team` field can no longer |
| 6 | +be used in the `play` method. Instead, you should use the new |
| 7 | +`play_player` method to select individual player statistics belonging to |
| 8 | +a specific team. |
| 9 | + |
| 10 | +Otherwise, there are very few public facing changes, but the entire |
| 11 | +guts of `nfldb.Query` have been ripped out and replaced with more |
| 12 | +robust SQL generation code. Moreover, several idiosyncracies have been |
| 13 | +fixed and some unit tests have finally been added. |
| 14 | + |
| 15 | +1. Previously, the `Query` class was doing some very clever things to do |
| 16 | + parts of a JOIN in Python code. The general flow was that filtering |
| 17 | + was applied to find primary keys---never using any JOINs---and once |
| 18 | + all criteria had been applied, those ids were used in a simple SELECT |
| 19 | + to fetch the actual rows. |
| 20 | + |
| 21 | + Now all of that cruft has been removed and replaced with intelligent |
| 22 | + SQL generation that constructs one query with all the proper JOINs. |
| 23 | + For whatever reason, I thought this was slower when experimenting |
| 24 | + with it when I first started nfldb. Perhaps my indexes weren't |
| 25 | + configured properly then. In any case, I can't really see much |
| 26 | + performance difference. |
| 27 | + |
| 28 | +2. The SQL generation code is very smart. Although it is not part of |
| 29 | + nfldb's public API, I imagine it would be very useful if you had some |
| 30 | + special needs. See the unexported but documented `nfldb.sql` module. |
| 31 | + |
| 32 | +3. Many idiosyncracies resulting from doing a join in Python are now |
| 33 | + completely gone. For example, if you tried to apply a `sort` with a |
| 34 | + `limit` with complex search criteria, you were bound to get wrong |
| 35 | + answers. For example, if you tried sorting by both a column on the |
| 36 | + `week` table (like `down`) and a column on `play_player` (like |
| 37 | + `passing_tds`) and applied a limit to it, the results would be |
| 38 | + completely wonky because the pure Python join can't cope with it |
| 39 | + performantly. A regular SQL join? Piece of cake. |
| 40 | + |
| 41 | +4. I have added a materialized view `agg_play`. This is a fancy word for |
| 42 | + "a table that automatically updates itself." In essence, whenever a |
| 43 | + new row is added to `play_player`, aggregate statistics for that play |
| 44 | + are re-computed. This makes adding data slower (which doesn't happen |
| 45 | + very frequently), but it makes querying data much faster and easier. |
| 46 | + For example, plays can be queried for `passing_yds` without ever |
| 47 | + joining with `play_player`. (Which is wonky because of the |
| 48 | + one-to-many relationship.) |
| 49 | + To reflect this clearer separation of concerns, the `Query.play` |
| 50 | + method will no longer add criteria that hits the `play_player` table. |
| 51 | + Instead, if you really want the `play_player` table, then you can use |
| 52 | + the new `play_player` method. The only field that was accepted in the |
| 53 | + `play` that is no longer allowed is the `team` and `player_id` |
| 54 | + fields. This is because there is no sensible way to aggregate these |
| 55 | + values into a single play. |
| 56 | + |
| 57 | + To the best of my knowledge, that is the only possible breaking |
| 58 | + change here. |
| 59 | + |
| 60 | + |
1 | 61 | 0.1.6 |
2 | 62 | ===== |
3 | 63 | - Add better error message when config file can't be found. |
|
0 commit comments