Merge pull request #1466 from redis/DOC-5149-python-vec-json-examples

andy-stark-redis · web-flow · commit d321153faf0e · 2025-04-25T14:41:32.000+01:00
DOC-5149 and DOC-5153 added Python and Go vector JSON examples
diff --git a/content/develop/clients/go/vecsearch.md b/content/develop/clients/go/vecsearch.md
@@ -32,7 +32,9 @@ In the example below, we use the
 [`huggingfaceembedder`](https://pkg.go.dev/github.com/henomis/lingoose@v0.3.0/embedder/huggingface)
 package from the [`LinGoose`](https://pkg.go.dev/github.com/henomis/lingoose@v0.3.0)
 framework to generate vector embeddings to store and index with
-Redis Query Engine.
+Redis Query Engine.  The code is first demonstrated for hash documents with a
+separate section to explain the
+[differences with JSON documents](#differences-with-json-documents).
 
 ## Initialize
 
@@ -80,10 +82,10 @@ the embeddings for this example are both available for free.
 
 The `huggingfaceembedder` model outputs the embeddings as a
 `[]float32` array. If you are storing your documents as
-[hash]({{< relref "/develop/data-types/hashes" >}}) objects
-(as we are in this example), then you must convert this array
-to a `byte` string before adding it as a hash field. In this example,
-we will use the function below to produce the `byte` string:
+[hash]({{< relref "/develop/data-types/hashes" >}}) objects, then you
+must convert this array to a `byte` string before adding it as a hash field.
+The function shown below uses Go's [`binary`](https://pkg.go.dev/encoding/binary)
+package to produce the `byte` string:
 
 ```go
 func floatsToBytes(fs []float32) []byte {
@@ -101,7 +103,8 @@ func floatsToBytes(fs []float32) []byte {
 Note that if you are using [JSON]({{< relref "/develop/data-types/json" >}})
 objects to store your documents instead of hashes, then you should store
 the `[]float32` array directly without first converting it to a `byte`
-string.
+string (see [Differences with JSON documents](#differences-with-json-documents)
+below).
 
 ## Create the index
 
@@ -187,7 +190,7 @@ hf := huggingfaceembedder.New().
 ## Add data
 
 You can now supply the data objects, which will be indexed automatically
-when you add them with [`hset()`]({{< relref "/commands/hset" >}}), as long as
+when you add them with [`HSet()`]({{< relref "/commands/hset" >}}), as long as
 you use the `doc:` prefix specified in the index definition.
 
 Use the `Embed()` method of `huggingfacetransformer`
@@ -310,6 +313,120 @@ As you would expect, the result for `doc:0` with the content text *"That is a ve
 is the result that is most similar in meaning to the query text
 *"That is a happy person"*.
 
+## Differences with JSON documents
+
+Indexing JSON documents is similar to hash indexing, but there are some
+important differences. JSON allows much richer data modelling with nested fields, so
+you must supply a [path]({{< relref "/develop/data-types/json/path" >}}) in the schema
+to identify each field you want to index. However, you can declare a short alias for each
+of these paths (using the `As` option) to avoid typing it in full for
+every query. Also, you must set `OnJSON` to `true` when you create the index.
+
+The code below shows these differences, but the index is otherwise very similar to
+the one created previously for hashes:
+
+```go
+_, err = rdb.FTCreate(ctx,
+    "vector_json_idx",
+    &redis.FTCreateOptions{
+        OnJSON: true,
+        Prefix: []any{"jdoc:"},
+    },
+    &redis.FieldSchema{
+        FieldName: "$.content",
+        As:        "content",
+        FieldType: redis.SearchFieldTypeText,
+    },
+    &redis.FieldSchema{
+        FieldName: "$.genre",
+        As:        "genre",
+        FieldType: redis.SearchFieldTypeTag,
+    },
+    &redis.FieldSchema{
+        FieldName: "$.embedding",
+        As:        "embedding",
+        FieldType: redis.SearchFieldTypeVector,
+        VectorArgs: &redis.FTVectorArgs{
+            HNSWOptions: &redis.FTHNSWOptions{
+                Dim:            384,
+                DistanceMetric: "L2",
+                Type:           "FLOAT32",
+            },
+        },
+    },
+).Result()
+```
+
+Use [`JSONSet()`]({{< relref "/commands/json.set" >}}) to add the data
+instead of [`HSet()`]({{< relref "/commands/hset" >}}). The maps
+that specify the fields have the same structure as the ones used for `HSet()`.
+
+An important difference with JSON indexing is that the vectors are
+specified using lists instead of binary strings. The loop below is similar
+to the one used previously to add the hash data, but it doesn't use the
+`floatsToBytes()` function to encode the `float32` array.
+
+```go
+for i, emb := range embeddings {
+    _, err = rdb.JSONSet(ctx,
+        fmt.Sprintf("jdoc:%v", i),
+        "$",
+        map[string]any{
+            "content":   sentences[i],
+            "genre":     tags[i],
+            "embedding": emb.ToFloat32(),
+        },
+    ).Result()
+
+    if err != nil {
+        panic(err)
+    }
+}
+```
+
+The query is almost identical to the one for the hash documents. This
+demonstrates how the right choice of aliases for the JSON paths can
+save you having to write complex queries. An important thing to notice
+is that the vector parameter for the query is still specified as a
+binary string (using the `floatsToBytes()` method), even though the data for
+the `embedding` field of the JSON was specified as an array.
+
+```go
+jsonQueryEmbedding, err := hf.Embed(ctx, []string{
+    "That is a happy person",
+})
+
+if err != nil {
+    panic(err)
+}
+
+jsonBuffer := floatsToBytes(jsonQueryEmbedding[0].ToFloat32())
+
+jsonResults, err := rdb.FTSearchWithArgs(ctx,
+    "vector_json_idx",
+    "*=>[KNN 3 @embedding $vec AS vector_distance]",
+    &redis.FTSearchOptions{
+        Return: []redis.FTSearchReturn{
+            {FieldName: "vector_distance"},
+            {FieldName: "content"},
+        },
+        DialectVersion: 2,
+        Params: map[string]any{
+            "vec": jsonBuffer,
+        },
+    },
+).Result()
+```
+
+Apart from the `jdoc:` prefixes for the keys, the result from the JSON
+query is the same as for hash:
+
+```
+ID: jdoc:0, Distance:0.114169843495, Content:'That is a very happy person'
+ID: jdoc:1, Distance:0.610845327377, Content:'That is a happy dog'
+ID: jdoc:2, Distance:1.48624765873, Content:'Today is a sunny day'
+```
+
 ## Learn more
 
 See
diff --git a/content/develop/clients/redis-py/vecsearch.md b/content/develop/clients/redis-py/vecsearch.md
@@ -28,10 +28,12 @@ similarity of an embedding generated from some query text with embeddings stored
 or JSON fields, Redis can retrieve documents that closely match the query in terms
 of their meaning.
 
-In the example below, we use the
+The example below uses the
 [`sentence-transformers`](https://pypi.org/project/sentence-transformers/)
 library to generate vector embeddings to store and index with
-Redis Query Engine.
+Redis Query Engine. The code is first demonstrated for hash documents with a
+separate section to explain the
+[differences with JSON documents](#differences-with-json-documents).
 
 ## Initialize
 
@@ -50,6 +52,7 @@ from sentence_transformers import SentenceTransformer
 from redis.commands.search.query import Query
 from redis.commands.search.field import TextField, TagField, VectorField
 from redis.commands.search.indexDefinition import IndexDefinition, IndexType
+from redis.commands.json.path import Path
 
 import numpy as np
 import redis
@@ -86,7 +89,7 @@ except redis.exceptions.ResponseError:
     pass
 ```
 
-Next, we create the index.
+Next, create the index.
 The schema in the example below specifies hash objects for storage and includes
 three fields: the text content to index, a
 [tag]({{< relref "/develop/interact/search-and-query/advanced-concepts/tags" >}})
@@ -127,10 +130,10 @@ Use the `model.encode()` method of `SentenceTransformer`
 as shown below to create the embedding that represents the `content` field.
 The `astype()` option that follows the `model.encode()` call specifies that
 we want a vector of `float32` values. The `tobytes()` option encodes the
-vector components together as a single binary string rather than the
-default Python list of `float` values.
-Use the binary string representation when you are indexing hash objects
-(as we are here), but use the default list of `float` for JSON objects.
+vector components together as a single binary string.
+Use the binary string representation when you are indexing hashes
+or running a query (but use a list of `float` for
+[JSON documents](#differences-with-json-documents)).
 
 ```python
 content = "That is a very happy person"
@@ -226,6 +229,116 @@ As you would expect, the result for `doc:0` with the content text *"That is a ve
 is the result that is most similar in meaning to the query text
 *"That is a happy person"*.
 
+## Differences with JSON documents
+
+Indexing JSON documents is similar to hash indexing, but there are some
+important differences. JSON allows much richer data modelling with nested fields, so
+you must supply a [path]({{< relref "/develop/data-types/json/path" >}}) in the schema
+to identify each field you want to index. However, you can declare a short alias for each
+of these paths (using the `as_name` keyword argument) to avoid typing it in full for
+every query. Also, you must specify `IndexType.JSON` when you create the index.
+
+The code below shows these differences, but the index is otherwise very similar to
+the one created previously for hashes:
+
+```py
+schema = (
+    TextField("$.content", as_name="content"),
+    TagField("$.genre", as_name="genre"),
+    VectorField(
+        "$.embedding", "HNSW", {
+            "TYPE": "FLOAT32",
+            "DIM": 384,
+            "DISTANCE_METRIC": "L2"
+        },
+        as_name="embedding"
+    )
+)
+
+r.ft("vector_json_idx").create_index(
+    schema,
+    definition=IndexDefinition(
+        prefix=["jdoc:"], index_type=IndexType.JSON
+    )
+)
+```
+
+Use [`json().set()`]({{< relref "/commands/json.set" >}}) to add the data
+instead of [`hset()`]({{< relref "/commands/hset" >}}). The dictionaries
+that specify the fields have the same structure as the ones used for `hset()`
+but `json().set()` receives them in a positional argument instead of 
+the `mapping` keyword argument.
+
+An important difference with JSON indexing is that the vectors are
+specified using lists instead of binary strings. Generate the list
+using the `tolist()` method instead of `tobytes()` as you would with a
+hash.
+
+```py
+content = "That is a very happy person"
+
+r.json().set("jdoc:0", Path.root_path(), {
+    "content": content,
+    "genre": "persons",
+    "embedding": model.encode(content).astype(np.float32).tolist(),
+})
+
+content = "That is a happy dog"
+
+r.json().set("jdoc:1", Path.root_path(), {
+    "content": content,
+    "genre": "pets",
+    "embedding": model.encode(content).astype(np.float32).tolist(),
+})
+
+content = "Today is a sunny day"
+
+r.json().set("jdoc:2", Path.root_path(), {
+    "content": content,
+    "genre": "weather",
+    "embedding": model.encode(content).astype(np.float32).tolist(),
+})
+```
+
+The query is almost identical to the one for the hash documents. This
+demonstrates how the right choice of aliases for the JSON paths can
+save you having to write complex queries. An important thing to notice
+is that the vector parameter for the query is still specified as a
+binary string (using the `tobytes()` method), even though the data for
+the `embedding` field of the JSON was specified as a list.
+
+```py
+q = Query(
+    "*=>[KNN 3 @embedding $vec AS vector_distance]"
+).return_field("vector_distance").return_field("content").dialect(2)
+
+query_text = "That is a happy person"
+
+res = r.ft("vector_json_idx").search(
+    q, query_params={
+        "vec": model.encode(query_text).astype(np.float32).tobytes()
+    }
+)
+```
+
+Apart from the `jdoc:` prefixes for the keys, the result from the JSON
+query is the same as for hash:
+
+```
+Result{
+    3 total,
+    docs: [
+        Document {
+            'id': 'jdoc:0',
+            'payload': None,
+            'vector_distance': '0.114169985056',
+            'content': 'That is a very happy person'
+        },
+            .
+            .
+            .
+```
+
 ## Learn more
 
 See