|
| 1 | +# Quamina |
| 2 | + |
| 3 | +### Fast pattern-matching library |
| 4 | + |
| 5 | +Quamina provides APIs to create an interface called |
| 6 | +a **Matcher**, |
| 7 | +add multiple **patterns** to it, and then query JSON blobs |
| 8 | +called **events** to discover which of the patterns match |
| 9 | +the fields in the event. |
| 10 | + |
| 11 | +### Patterns |
| 12 | + |
| 13 | +Consider the following JSON event, taken from the example |
| 14 | +in RFC 8259: |
| 15 | + |
| 16 | +```json |
| 17 | +{ |
| 18 | + "Image": { |
| 19 | + "Width": 800, |
| 20 | + "Height": 600, |
| 21 | + "Title": "View from 15th Floor", |
| 22 | + "Thumbnail": { |
| 23 | + "Url": "http://www.example.com/image/481989943", |
| 24 | + "Height": 125, |
| 25 | + "Width": 100 |
| 26 | + }, |
| 27 | + "Animated" : false, |
| 28 | + "IDs": [116, 943, 234, 38793] |
| 29 | + } |
| 30 | +} |
| 31 | +``` |
| 32 | + |
| 33 | +The following patterns would match it: |
| 34 | + |
| 35 | +```json |
| 36 | +{"Image": {"Width": [800]}} |
| 37 | +``` |
| 38 | +```json |
| 39 | +{ |
| 40 | + "Image": { |
| 41 | + "Animated": [ false ], |
| 42 | + "Thumbnail": { |
| 43 | + "Height": [ 125 ] |
| 44 | + }, |
| 45 | + "IDs": [ 943 ] |
| 46 | + } |
| 47 | +} |
| 48 | +``` |
| 49 | +```json |
| 50 | +{"Image": { "Title": [ { "exists": true } ] } } |
| 51 | +``` |
| 52 | +```json |
| 53 | +{ |
| 54 | + "Image": { |
| 55 | + "Width": [800], |
| 56 | + "Title": [ { "exists": true } ], |
| 57 | + "Animated": [ false ] |
| 58 | + } |
| 59 | +} |
| 60 | +``` |
| 61 | +```json |
| 62 | +{"Image": { "Width": [800], "IDs": [ { "exists": true } ] } } |
| 63 | +``` |
| 64 | +```json |
| 65 | +{"Foo": [ { "exists": false } ] } |
| 66 | +``` |
| 67 | +The structure of the pattern, in terms of field names |
| 68 | +and nesting, must be the same as the structure of the event |
| 69 | +to be matched. The field values are always given |
| 70 | +as an array; if any element of the array matches |
| 71 | +the value in the event, the match is good. If the |
| 72 | +field in the event is array-valued, matching is true |
| 73 | +if the intersection of the arrays is non-empty. |
| 74 | + |
| 75 | +Fields which are not mentioned in the pattern will |
| 76 | +be assumed to match, but all Fields must match. So the |
| 77 | +semantics are effectively an OR on each field's values, |
| 78 | +but an AND on the field names. |
| 79 | + |
| 80 | +Number matching is weak - the number has to appear |
| 81 | +exactly the same in the pattern and the event. I.e., |
| 82 | +Quamina doesn't know that 35, 35.000, and 3.5e1 are the |
| 83 | +same number. |
| 84 | + |
| 85 | +## APIs |
| 86 | + |
| 87 | +```go |
| 88 | +func NewMatcher() *Matcher |
| 89 | +``` |
| 90 | +Creates a new Matcher, takes no arguments. |
| 91 | +```go |
| 92 | +func (m *Matcher) AddPattern(x X, patternJSON string) error |
| 93 | +``` |
| 94 | + |
| 95 | +The first argument identifies the pattern and will be |
| 96 | +returned by a Matcher when asked to match against events. |
| 97 | +X is currently `interface{}` and should become a generic |
| 98 | +when Go has them. |
| 99 | +
|
| 100 | +The pattern must be provided as a string which is a |
| 101 | +JSON object as exemplified above in this document. |
| 102 | +
|
| 103 | +The `error` return is used to signal invalid pattern |
| 104 | +structure, which could be malformed JSON or leaf values |
| 105 | +which are not provided as arrays. |
| 106 | +
|
| 107 | +As many patterns as desired can be added to a Matcher |
| 108 | +but at this time there is no capability of removing any. |
| 109 | +
|
| 110 | +The `AddPattern` call is single-threaded; if multiple |
| 111 | +threads call it, they will block and execute sequentially. |
| 112 | +
|
| 113 | +```go |
| 114 | +func (m *Matcher) MatchesForJSONEvent(event []byte) ([]X, error) |
| 115 | +``` |
| 116 | + |
| 117 | +The `event` argument must be a JSON object. It would be |
| 118 | +easy to extend Matcher to handle other data formats; see the |
| 119 | +`Flattener` interface and its implementation in `FJ`. |
| 120 | + |
| 121 | +The `error` return value is nil unless there was a syntax |
| 122 | +error in the event JSON. |
| 123 | + |
| 124 | +The `[]X` return slice may be empty if none of the patterns |
| 125 | +match the provided event. |
| 126 | + |
| 127 | +`MatchesForJSONEvent` is thread-safe and many threads may |
| 128 | +be executing it concurrently, even while `AddPattern` is |
| 129 | +also executing. |
| 130 | + |
| 131 | +### Performance |
| 132 | + |
| 133 | +The performance of `MatchesForJSONEvent` is strongly |
| 134 | +sublinear in the number of patterns. It’s not quite `O(1)`, |
| 135 | +it does vary somewhat as a function of the number of |
| 136 | +unique fields that appear in all the patterns that have |
| 137 | +been added to the machine, but remains sublinear in that |
| 138 | +variation. |
| 139 | + |
| 140 | +A word of explanation is in order. Quamina compiles the |
| 141 | +patterns into a somewhat-decorated DFA and uses that to |
| 142 | +find matches in events; that DFA-based matching process is |
| 143 | +O(1) in the number of patterns. |
| 144 | + |
| 145 | +However, for this to work, the incoming event must be |
| 146 | +flattened into a list of pathname/value pairs and |
| 147 | +sorted. This process exceeds 50% of execution time, |
| 148 | +and is optimized by discarding any fields that |
| 149 | +do not appear in one or more of the patterns added |
| 150 | +to the matcher. Thus, adding a new pattern that only |
| 151 | +mentions fields mentioned in previous patterns is |
| 152 | +effectively free in terms of run-time performance. |
0 commit comments