You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
training.py: two tweaks to feature selection (#226)
1. Include posting amounts as a feature. This allows us to distinguish
different classes of payments to the same payee (e.g. recurring membership
fees, which often have a constant amount, from individual purchases).
2. For example key/value pairs, include the key by itself (with no substring
of the value) as a feature. This is useful because different account types
often have non-overlapping sets of example keys, and including the bare key as
a value allows the decision tree to be effectively segmented by account type
fairly close to the root.
These two very small changes significantly improve training accuracy on my
journal, from 94.81% to 99.32% (an 86% reduction in error rate!).
0 commit comments