@@ -32,7 +32,9 @@ In the example below, we use the
32
32
[ ` huggingfaceembedder ` ] ( https://pkg.go.dev/github.com/henomis/[email protected] /embedder/huggingface )
33
33
package from the
[ ` LinGoose ` ] ( https://pkg.go.dev/github.com/henomis/[email protected] )
34
34
framework to generate vector embeddings to store and index with
35
- Redis Query Engine.
35
+ Redis Query Engine. The code is first demonstrated for hash documents with a
36
+ separate section to explain the
37
+ [ differences with JSON documents] ( #differences-with-json-documents ) .
36
38
37
39
## Initialize
38
40
@@ -80,10 +82,10 @@ the embeddings for this example are both available for free.
80
82
81
83
The ` huggingfaceembedder ` model outputs the embeddings as a
82
84
` []float32 ` array. If you are storing your documents as
83
- [ hash] ({{< relref "/develop/data-types/hashes" >}}) objects
84
- (as we are in this example), then you must convert this array
85
- to a ` byte ` string before adding it as a hash field. In this example,
86
- we will use the function below to produce the ` byte ` string:
85
+ [ hash] ({{< relref "/develop/data-types/hashes" >}}) objects, then you
86
+ must convert this array to a ` byte ` string before adding it as a hash field.
87
+ The function shown below uses Go's [ ` binary ` ] ( https://pkg.go.dev/encoding/binary )
88
+ package to produce the ` byte ` string:
87
89
88
90
``` go
89
91
func floatsToBytes (fs []float32 ) []byte {
@@ -101,7 +103,8 @@ func floatsToBytes(fs []float32) []byte {
101
103
Note that if you are using [ JSON] ({{< relref "/develop/data-types/json" >}})
102
104
objects to store your documents instead of hashes, then you should store
103
105
the ` []float32 ` array directly without first converting it to a ` byte `
104
- string.
106
+ string (see [ Differences with JSON documents] ( #differences-with-json-documents )
107
+ below).
105
108
106
109
## Create the index
107
110
@@ -187,7 +190,7 @@ hf := huggingfaceembedder.New().
187
190
## Add data
188
191
189
192
You can now supply the data objects, which will be indexed automatically
190
- when you add them with [ ` hset ()` ] ({{< relref "/commands/hset" >}}), as long as
193
+ when you add them with [ ` HSet ()` ] ({{< relref "/commands/hset" >}}), as long as
191
194
you use the ` doc: ` prefix specified in the index definition.
192
195
193
196
Use the ` Embed() ` method of ` huggingfacetransformer `
@@ -310,6 +313,120 @@ As you would expect, the result for `doc:0` with the content text *"That is a ve
310
313
is the result that is most similar in meaning to the query text
311
314
* "That is a happy person"* .
312
315
316
+ ## Differences with JSON documents
317
+
318
+ Indexing JSON documents is similar to hash indexing, but there are some
319
+ important differences. JSON allows much richer data modelling with nested fields, so
320
+ you must supply a [ path] ({{< relref "/develop/data-types/json/path" >}}) in the schema
321
+ to identify each field you want to index. However, you can declare a short alias for each
322
+ of these paths (using the ` As ` option) to avoid typing it in full for
323
+ every query. Also, you must set ` OnJSON ` to ` true ` when you create the index.
324
+
325
+ The code below shows these differences, but the index is otherwise very similar to
326
+ the one created previously for hashes:
327
+
328
+ ``` go
329
+ _, err = rdb.FTCreate (ctx,
330
+ " vector_json_idx" ,
331
+ &redis.FTCreateOptions {
332
+ OnJSON : true ,
333
+ Prefix : []any{" jdoc:" },
334
+ },
335
+ &redis.FieldSchema {
336
+ FieldName : " $.content" ,
337
+ As : " content" ,
338
+ FieldType : redis.SearchFieldTypeText ,
339
+ },
340
+ &redis.FieldSchema {
341
+ FieldName : " $.genre" ,
342
+ As : " genre" ,
343
+ FieldType : redis.SearchFieldTypeTag ,
344
+ },
345
+ &redis.FieldSchema {
346
+ FieldName : " $.embedding" ,
347
+ As : " embedding" ,
348
+ FieldType : redis.SearchFieldTypeVector ,
349
+ VectorArgs : &redis.FTVectorArgs {
350
+ HNSWOptions: &redis.FTHNSWOptions {
351
+ Dim: 384 ,
352
+ DistanceMetric: " L2" ,
353
+ Type: " FLOAT32" ,
354
+ },
355
+ },
356
+ },
357
+ ).Result ()
358
+ ```
359
+
360
+ Use [ ` JSONSet() ` ] ({{< relref "/commands/json.set" >}}) to add the data
361
+ instead of [ ` HSet() ` ] ({{< relref "/commands/hset" >}}). The maps
362
+ that specify the fields have the same structure as the ones used for ` HSet() ` .
363
+
364
+ An important difference with JSON indexing is that the vectors are
365
+ specified using lists instead of binary strings. The loop below is similar
366
+ to the one used previously to add the hash data, but it doesn't use the
367
+ ` floatsToBytes() ` function to encode the ` float32 ` array.
368
+
369
+ ``` go
370
+ for i , emb := range embeddings {
371
+ _, err = rdb.JSONSet (ctx,
372
+ fmt.Sprintf (" jdoc:%v " , i),
373
+ " $" ,
374
+ map [string ]any{
375
+ " content" : sentences[i],
376
+ " genre" : tags[i],
377
+ " embedding" : emb.ToFloat32 (),
378
+ },
379
+ ).Result ()
380
+
381
+ if err != nil {
382
+ panic (err)
383
+ }
384
+ }
385
+ ```
386
+
387
+ The query is almost identical to the one for the hash documents. This
388
+ demonstrates how the right choice of aliases for the JSON paths can
389
+ save you having to write complex queries. An important thing to notice
390
+ is that the vector parameter for the query is still specified as a
391
+ binary string (using the ` floatsToBytes() ` method), even though the data for
392
+ the ` embedding ` field of the JSON was specified as an array.
393
+
394
+ ``` go
395
+ jsonQueryEmbedding , err := hf.Embed (ctx, []string {
396
+ " That is a happy person" ,
397
+ })
398
+
399
+ if err != nil {
400
+ panic (err)
401
+ }
402
+
403
+ jsonBuffer := floatsToBytes (jsonQueryEmbedding[0 ].ToFloat32 ())
404
+
405
+ jsonResults , err := rdb.FTSearchWithArgs (ctx,
406
+ " vector_json_idx" ,
407
+ " *=>[KNN 3 @embedding $vec AS vector_distance]" ,
408
+ &redis.FTSearchOptions {
409
+ Return : []redis.FTSearchReturn {
410
+ {FieldName: " vector_distance" },
411
+ {FieldName: " content" },
412
+ },
413
+ DialectVersion : 2 ,
414
+ Params : map [string ]any{
415
+ " vec" : jsonBuffer,
416
+ },
417
+ },
418
+ ).Result ()
419
+ ```
420
+
421
+ Apart from the ` jdoc: ` prefixes for the keys, the result from the JSON
422
+ query is the same as for hash:
423
+
424
+ ```
425
+ ID: jdoc:0, Distance:0.114169843495, Content:'That is a very happy person'
426
+ ID: jdoc:1, Distance:0.610845327377, Content:'That is a happy dog'
427
+ ID: jdoc:2, Distance:1.48624765873, Content:'Today is a sunny day'
428
+ ```
429
+
313
430
## Learn more
314
431
315
432
See
0 commit comments