1010
1111### Fast pattern-matching library
1212
13- Quamina provides APIs to create an interface called
14- a ** Matcher ** ,
15- add multiple ** Patterns ** to it, and then query JSON blobs
16- called ** Events ** to discover which of the patterns match
13+ ** Quamina** implements a data type that has APIs to
14+ create an instance and add multiple ** Patterns ** to it,
15+ and then query data objects called ** Events ** to
16+ discover which of the patterns match
1717the fields in the event.
1818
19+ Quamina [ welcomes contributions] ( CONTRIBUTING.md ) .
20+
1921### Status
2022
2123As of late May 2022, Quamina has a lot of unit tests and
22- they're all passing. We are working on getting the
23- GitHub-based CI/CD nailed down and stable. We have not
24- pressed the “release” button, so we reserve the right
25- to change APIs.
24+ they're all passing. It has a reasonable basis of
25+ GitHub-based CI/CD working. We intend to press the
26+ “release” button any day now, but for the moment
27+ we reserve the right to change APIs.
2628
2729### Patterns
2830
@@ -128,87 +130,79 @@ Number matching is weak - the number has to appear
128130exactly the same in the pattern and the event. I.e.,
129131Quamina doesn't know that 35, 35.000, and 3.5e1 are the
130132same number. There's a fix for this in the code which
131- is commented out because it causes a
132- significant performance penalty.
133+ is not yet activated because it causes a
134+ significant performance penalty, so the API needs to
135+ be enhanced to only ask for it when you need it.
133136
134137## Flattening and Matching
135138
136139The first step in finding matches for an Event is
137140“flattening” it, which is to say turning it
138141into a list of pathname/value pairs called Fields. Quamina
139- defines a ` Flattener ` interface type and provides a
140- JSON-specific implementation in the ` FJ ` type .
142+ defines a ` Flattener ` interface type and has a built-in
143+ ` Flattener ` for JSON .
141144
142145` Flattener ` implementations in general will have
143146internal state and thus not be thread-safe.
144147
145- The ` MatchesForJSONEvent ` API must create a new
146- ` FJ ` instance for each event so that it
147- can be thread-safe. This works fine, but creating a
148- new ` FJ ` instance is expensive enough to slow the
149- rate at which events can be matched by 15% or so.
150-
151- For maximum performance in matching JSON events,
152- you should create your own ` FJ ` instance with the
153- ` NewFJ(Matcher) ` method. You can then use
154- ` FJ.Flatten(event) ` API to turn multiple successive
155- JSON events into ` Field ` lists and pass them to
156- ` Matcher ` 's ` MatchesForFields() ` API, but ` FJ `
157- includes a convenience method ` FlattenAndMatch(event) `
158- which will call the ` Matcher ` for you. As long as
159- each thread has its own ` Flattener ` instance,
160- everything will remain thread-safe.
161-
162- Also note that should you wish to process events
148+ Note that should you wish to process events
163149in a format other than JSON, you can implement
164- the ` Flattener ` interface and use that to process
165- events in whatever format into Field lists.
150+ the ` Flattener ` interface yourself.
166151
167152## APIs
168-
169153** Note** : In all the APIs below, field names and values in both
170154Patterns and Events must be valid UTF-8. Unescaped characters
171155smaller than 0x1F (illegal per JSON), and bytes with value
172156greater than 0XF4 (can't occur in correctly composed UTF-8)
173157are rejected by the APIs.
174-
158+ ### Control APIs
175159``` go
176- type Matcher interface {
177- AddPattern (x X , pat string ) error
178- MatchesForJSONEvent (event []byte ) ([]X, error )
179- MatchesForFields (fields []Field ) []X
180- DeletePattern (x X ) error
181- }
182- ```
160+ func New (...Option ) (*Quamina , error )
183161
184- Above are the operations provided by a Matcher. Quamina
185- includes an implementation called ` CoreMatcher ` which
186- implements ` Matcher ` . In a forthcoming release it will
187- provider alternate implementations that offer extra
188- features.
189-
190- ``` go
191- func NewCoreMatcher () *Matcher
162+ func WithMediaType(mediaType string) Option
163+ func WithFlattener(f Flattener) Option
164+ func WithPatternDeletion(b bool) Option
165+ func WithPatternStorage(ps LivePatternsState) Option
192166```
167+ For example:
168+
193169```go
194- func pruner.NewMatcher() *Matcher
170+ q, err := quamina.New(quamina.New(quamina.WithMediaType("application/json")))
195171```
196-
197- Create new Matchers, take no arguments. The difference
198- is that the `pruner.NewMatcher` version supports the
199- `DeletePattern()` API. Be careful: It occasionally
200- rebuilds the Matcher in stop-the-world fashion, so if you
201- delete lots of Patterns in a large Matcher you may
202- encounter occasional elevated latencies.
172+ The meanings of the `Option` functions are:
173+
174+ `WithMediaType`: In the futue, Quamina will support
175+ Events not just in JSON but in other formats such as
176+ Avro, Protobufs, and so on. This option will make sure
177+ to invoke the correct Flattener. At the moment, the only
178+ supported value is `application/json`, the default.
179+
180+ `WithFlattener`: Requests that Quamina flatten events with
181+ the provided (presumably user-written) Flattener.
182+
183+ `WithPatternDeletion`: If true, arranges that Quamina
184+ allows Patterns to be deleted from an instance. This is
185+ not free; it can incur extra costs in memory and
186+ occasional stop-the-world Quamina rebuilds. (We plan
187+ to improve this.)
188+
189+ `WithPatternStorage`: If you provide an argument that
190+ supports the `LivePatternStorage` API, Quamina will
191+ use it to
192+ maintain a list of which patterns have currently been
193+ added but not deleted. This could be useful if you
194+ wanted to rebuild Quamina instances for sharded
195+ processing or after a system failure. ***Note: Not
196+ yet implemented.***
197+
198+ ### Data APIs
203199
204200```go
205- func (m *Matcher ) AddPattern(x X, patternJSON string) error
201+ func (q *Quamina ) AddPattern(x X, patternJSON string) error
206202```
207-
208203The first argument identifies the Pattern and will be
209- returned by a Matcher when asked to match against Events.
210- X is currently `interface{}` . Should it be a generic now
211- that Go has them?
204+ returned by Quamina when asked to match against Events.
205+ X is defined as `any`.
212206
213207The Pattern must be provided as a string which is a
214208JSON object as exemplified above in this document.
@@ -217,67 +211,47 @@ The `error` return is used to signal invalid Pattern
217211structure, which could be bad UTF-8 or malformed JSON
218212or leaf values which are not provided as arrays.
219213
220- As many Patterns as desired can be added to a Matcher.
221- The ` CoreMatcher` type does not support ` DeletePattern ()`
222- but ` pruner.Matcher ` does.
214+ As many Patterns as desired can be added to a Quamina
215+ instance.
223216
224217The `AddPattern` call is single-threaded; if multiple
225218threads call it, they will block and execute sequentially.
226219
227220```go
228- func (m * Matcher ) MatchesForJSONEvent (event []byte ) ([]X , error )
221+ func (q *Quamina) MatchesForEvent (event []byte) ([]X, error)
229222```
230223
231- The `event` argument must be a JSON object encoded in
232- correct UTF-8.
233-
234224The `error` return value is nil unless there was an
235- error in the Event JSON .
225+ error in the encoding of the Event .
236226
237227The `[]X` return slice may be empty if none of the Patterns
238228match the provided Event.
239229
240- ```go
241- func (m *Matcher) MatchesForFields([]Field) ([]X, error)
242- ```
243- Performs the functions of `MatchesForJSON` on an
244- Event which has been flattened into a list of `Field`
245- instances. At the moment, `CoreMatcher` only returns
246- an error if the `[]Field` argument is nil. `pruner.Matcher`
247- can return an error if it suffers a failure in its
248- Pattern storage.
249-
250- These matching calls are thread-safe. Many threads may
251- be executing it concurrently, even while `AddPattern` is
252- also executing. There is a significant performance
253- penalty if there is a high rate of `AddPattern` in
254- combination with matching.
255-
256- ```go
257- func NewFJ(*Matcher) Flattener
258- ```
259- Creates a new JSON-specific Flattener.
260- ```go
261- func (fj *FJ) Flatten([]byte event) []Field
262- ```
263- Transforms an event, which must be JSON object
264- encoded in UTF-8, into a list of `Field` instances.
230+ A single Quamina instance is not thread-safe. But
231+ instances can share the underlying data structures
232+ in a safe way.
265233
266- ```go
267- func (fj *FJ) FlattenAndMatch([]byte event) ([]X, error)
234+ ```json
235+ func (q *Quamina) Copy() *Quamina
268236```
269- Utility function which combines the functions of the
270- `FJ.Flatten` and `Matcher.MatchesForFields` APIs.
237+ This generates a copy of of the target instance
238+ which may be used in parallel on another thread,
239+ while sharing the underlying data structure. Many
240+ instances can execute `MatchesForEvent()` calls
241+ concurrently, even while one or more of them are
242+ also executing `AddPattern()`. There is a
243+ significant performance penalty if there is a high
244+ rate of `AddPattern` in parallel with matching.
271245
272246### Performance
273247
274248I used to say that the performance of
275- `MatchesForJSONEvent ` was `O(1)` in the number of
249+ `MatchesForEvent ` was `O(1)` in the number of
276250Patterns. While that’s probably the right way to think
277251about it, it’s not *quite* true,
278252as it varies somewhat as a function of the number of
279253unique fields that appear in all the patterns that have
280- been added to the matcher , but still remains sublinear
254+ been added to Quamina , but still remains sublinear
281255in that number.
282256
283257A word of explanation: Quamina compiles the
@@ -290,7 +264,7 @@ flattened into a list of pathname/value pairs and
290264sorted. This process exceeds 50% of execution time,
291265and is optimized by discarding any fields that
292266do not appear in one or more of the patterns added
293- to the matcher . Thus, adding a new pattern that only
267+ to Quamina . Thus, adding a new pattern that only
294268mentions fields mentioned in previous patterns is
295269effectively free i.e. `O(1)` in terms of run-time
296270performance.
@@ -306,7 +280,7 @@ colonies before slavery was abolished.
306280
307281### Credits
308282
309- @timbray: v0.1 and patches.
283+ @timbray: v0.0 and patches.
310284
311285@jsmorph: `Pruner` and concurrency testing.
312286
0 commit comments