Skip to content

Commit 8480d84

Browse files
committed
Initial upload
1 parent 22ab512 commit 8480d84

19 files changed

+2502
-0
lines changed

README.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Quamina
2+
3+
### Fast pattern-matching library
4+
5+
Quamina provides APIs to create an interface called
6+
a **Matcher**,
7+
add multiple **patterns** to it, and then query JSON blobs
8+
called **events** to discover which of the patterns match
9+
the fields in the event.
10+
11+
### Patterns
12+
13+
Consider the following JSON event, taken from the example
14+
in RFC 8259:
15+
16+
```json
17+
{
18+
"Image": {
19+
"Width": 800,
20+
"Height": 600,
21+
"Title": "View from 15th Floor",
22+
"Thumbnail": {
23+
"Url": "http://www.example.com/image/481989943",
24+
"Height": 125,
25+
"Width": 100
26+
},
27+
"Animated" : false,
28+
"IDs": [116, 943, 234, 38793]
29+
}
30+
}
31+
```
32+
33+
The following patterns would match it:
34+
35+
```json
36+
{"Image": {"Width": [800]}}
37+
```
38+
```json
39+
{
40+
"Image": {
41+
"Animated": [ false ],
42+
"Thumbnail": {
43+
"Height": [ 125 ]
44+
},
45+
"IDs": [ 943 ]
46+
}
47+
}
48+
```
49+
```json
50+
{"Image": { "Title": [ { "exists": true } ] } }
51+
```
52+
```json
53+
{
54+
"Image": {
55+
"Width": [800],
56+
"Title": [ { "exists": true } ],
57+
"Animated": [ false ]
58+
}
59+
}
60+
```
61+
```json
62+
{"Image": { "Width": [800], "IDs": [ { "exists": true } ] } }
63+
```
64+
```json
65+
{"Foo": [ { "exists": false } ] }
66+
```
67+
The structure of the pattern, in terms of field names
68+
and nesting, must be the same as the structure of the event
69+
to be matched. The field values are always given
70+
as an array; if any element of the array matches
71+
the value in the event, the match is good. If the
72+
field in the event is array-valued, matching is true
73+
if the intersection of the arrays is non-empty.
74+
75+
Fields which are not mentioned in the pattern will
76+
be assumed to match, but all Fields must match. So the
77+
semantics are effectively an OR on each field's values,
78+
but an AND on the field names.
79+
80+
Number matching is weak - the number has to appear
81+
exactly the same in the pattern and the event. I.e.,
82+
Quamina doesn't know that 35, 35.000, and 3.5e1 are the
83+
same number.
84+
85+
## APIs
86+
87+
```go
88+
func NewMatcher() *Matcher
89+
```
90+
Creates a new Matcher, takes no arguments.
91+
```go
92+
func (m *Matcher) AddPattern(x X, patternJSON string) error
93+
```
94+
95+
The first argument identifies the pattern and will be
96+
returned by a Matcher when asked to match against events.
97+
X is currently `interface{}` and should become a generic
98+
when Go has them.
99+
100+
The pattern must be provided as a string which is a
101+
JSON object as exemplified above in this document.
102+
103+
The `error` return is used to signal invalid pattern
104+
structure, which could be malformed JSON or leaf values
105+
which are not provided as arrays.
106+
107+
As many patterns as desired can be added to a Matcher
108+
but at this time there is no capability of removing any.
109+
110+
The `AddPattern` call is single-threaded; if multiple
111+
threads call it, they will block and execute sequentially.
112+
113+
```go
114+
func (m *Matcher) MatchesForJSONEvent(event []byte) ([]X, error)
115+
```
116+
117+
The `event` argument must be a JSON object. It would be
118+
easy to extend Matcher to handle other data formats; see the
119+
`Flattener` interface and its implementation in `FJ`.
120+
121+
The `error` return value is nil unless there was a syntax
122+
error in the event JSON.
123+
124+
The `[]X` return slice may be empty if none of the patterns
125+
match the provided event.
126+
127+
`MatchesForJSONEvent` is thread-safe and many threads may
128+
be executing it concurrently, even while `AddPattern` is
129+
also executing.
130+
131+
### Performance
132+
133+
The performance of `MatchesForJSONEvent` is strongly
134+
sublinear in the number of patterns. It’s not quite `O(1)`,
135+
it does vary somewhat as a function of the number of
136+
unique fields that appear in all the patterns that have
137+
been added to the machine, but remains sublinear in that
138+
variation.
139+
140+
A word of explanation is in order. Quamina compiles the
141+
patterns into a somewhat-decorated DFA and uses that to
142+
find matches in events; that DFA-based matching process is
143+
O(1) in the number of patterns.
144+
145+
However, for this to work, the incoming event must be
146+
flattened into a list of pathname/value pairs and
147+
sorted. This process exceeds 50% of execution time,
148+
and is optimized by discarding any fields that
149+
do not appear in one or more of the patterns added
150+
to the matcher. Thus, adding a new pattern that only
151+
mentions fields mentioned in previous patterns is
152+
effectively free in terms of run-time performance.

go.mod

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
module quamina
2+
3+
go 1.17
4+
5+
require (
6+
)

lib/arrays_test.go

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
package quamina
2+
3+
import (
4+
"testing"
5+
)
6+
7+
const bands = `{
8+
"bands": [
9+
{
10+
"name": "The Clash",
11+
"members": [
12+
{
13+
"given": "Joe",
14+
"surname": "Strummer",
15+
"role": [
16+
"guitar",
17+
"vocals"
18+
]
19+
},
20+
{
21+
"given": "Mick",
22+
"surname": "Jones",
23+
"role": [
24+
"guitar",
25+
"vocals"
26+
]
27+
},
28+
{
29+
"given": "Paul",
30+
"surname": "Simonon",
31+
"role": [
32+
"bass"
33+
]
34+
},
35+
{
36+
"given": "Topper",
37+
"surname": "Headon",
38+
"role": [
39+
"drums"
40+
]
41+
}
42+
]
43+
},
44+
{
45+
"name": "Boris",
46+
"members": [
47+
{
48+
"given": "Wata",
49+
"role": [
50+
"guitar",
51+
"vocals"
52+
]
53+
},
54+
{
55+
"given": "Atsuo",
56+
"role": [
57+
"drums"
58+
]
59+
},
60+
{
61+
"given": "Takeshi",
62+
"role": [
63+
"bass",
64+
"vocals"
65+
]
66+
}
67+
]
68+
}
69+
]
70+
}`
71+
72+
func TestArrayCorrectness(t *testing.T) {
73+
74+
// only pattern3 should match
75+
pattern1 := `{"bands": { "members": { "given": [ "Mick" ], "surname": [ "Strummer" ] } } }`
76+
pattern2 := `{"bands": { "members": { "given": [ "Wata" ], "role": [ "drums" ] } } }`
77+
pattern3 := `{"bands": { "members": { "given": [ "Wata" ], "role": [ "guitar" ] } } }`
78+
m := NewMatcher()
79+
err := m.AddPattern("Mick strummer", pattern1)
80+
if err != nil {
81+
t.Error(err.Error())
82+
}
83+
err = m.AddPattern("Wata drums", pattern2)
84+
if err != nil {
85+
t.Error(err.Error())
86+
}
87+
err = m.AddPattern("Wata guitar", pattern3)
88+
if err != nil {
89+
t.Error(err.Error())
90+
}
91+
92+
matches, err := m.MatchesForJSONEvent([]byte(bands))
93+
if err != nil {
94+
t.Error(err.Error())
95+
}
96+
97+
if len(matches) != 1 || matches[0].(string) != "Wata guitar" {
98+
t.Error("Matches across array boundaries")
99+
}
100+
}

0 commit comments

Comments
 (0)