Skip to content

Commit 3a3a31b

Browse files
authored
feat: optimize elasticsearch ai-search plugin and update related docs" (alibaba#2100)
1 parent 9456ae8 commit 3a3a31b

File tree

4 files changed

+44
-33
lines changed

4 files changed

+44
-33
lines changed

plugins/wasm-go/extensions/ai-search/README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -75,18 +75,22 @@ description: higress 支持通过集成搜索引擎(Google/Bing/Arxiv/Elastics
7575

7676
## Elasticsearch 特定配置
7777

78-
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
79-
|------|----------|----------|--------|-----------------------|
80-
| index | string | 必填 | - | 要搜索的Elasticsearch索引名称 |
81-
| contentField | string | 必填 | - | 要查询的内容字段名称 |
82-
| semanticTextField | string | 必填 | - | 要查询的 embedding 字段名称 |
83-
| linkField | string | 必填 | - | 结果链接字段名称 |
84-
| titleField | string | 必填 | - | 结果标题字段名称 |
85-
| username | string | 选填 | - | Elasticsearch 用户名 |
86-
| password | string | 选填 | - | Elasticsearch 密码 |
78+
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
79+
|------|----------|------|--------|------------------------------------|
80+
| index | string | 必填 | - | 要搜索的 Elasticsearch 索引名称 |
81+
| contentField | string | 必填 | - | 要查询的内容字段名称 |
82+
| semanticTextField | string | 必填 | - | 要查询的 embedding 字段名称 |
83+
| linkField | string | 选填 | - | 结果链接字段名称,当配置 `needReference` 时需要填写 |
84+
| titleField | string | 选填 | - | 结果标题字段名称,当配置 `needReference` 时需要填写 |
85+
| username | string | 选填 | - | Elasticsearch 用户名 |
86+
| password | string | 选填 | - | Elasticsearch 密码 |
8787

8888
混合搜索中使用的 [Reciprocal Rank Fusion (RRF)](https://www.elastic.co/guide/en/elasticsearch/reference/8.17/rrf.html) 查询要求 Elasticsearch 的版本在 8.8 及以上。
8989

90+
目前文档向量化依赖于 Elasticsearch 的 Embedding 模型,该功能需要 Elasticsearch 企业版 License,或可使用 30 天的 Trial License。安装 Elasticsearch 内置 Embedding 模型的步骤可参考[该文档](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser#alternative-download-deploy);若需安装第三方 Embedding 模型,可参考[该文档](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example)
91+
92+
有关 ai-search 插件集成 Elasticsearch 的完整教程,请参考:[使用 LangChain + Higress + Elasticsearch 构建 RAG 应用](https://cr7258.github.io/blogs/original/2025/15-rag-higress-es-langchain)
93+
9094
## Quark 特定配置
9195

9296
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
@@ -204,13 +208,9 @@ searchFrom:
204208
searchFrom:
205209
- type: elasticsearch
206210
serviceName: "es-svc.static"
207-
# 固定地址服务的端口默认是80
208-
servicePort: 80
209211
index: "knowledge_base"
210212
contentField: "content"
211213
semanticTextField: "semantic_text"
212-
linkField: "url"
213-
titleField: "title"
214214
# username: "elastic"
215215
# password: "password"
216216
```

plugins/wasm-go/extensions/ai-search/README_EN.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,17 @@ It is strongly recommended to enable this feature when using Arxiv or Elasticsea
8080
| index | string | Required | - | Elasticsearch index name to search |
8181
| contentField | string | Required | - | Content field name to query |
8282
| semanticTextField | string | Required | - | Embedding field name to query |
83-
| linkField | string | Required | - | Result link field name |
84-
| titleField | string | Required | - | Result title field name |
83+
| linkField | string | Optional | - | Result link field name, needed when `needReference` is configured |
84+
| titleField | string | Optional | - | Result title field name, needed when `needReference` is configured |
8585
| username | string | Optional | - | Elasticsearch username |
8686
| password | string | Optional | - | Elasticsearch password |
8787

8888
The [Reciprocal Rank Fusion (RRF)](https://www.elastic.co/guide/en/elasticsearch/reference/8.17/rrf.html) query used in hybrid search requires Elasticsearch version 8.8 or higher.
8989

90+
Currently, document vectorization relies on Elasticsearch's embedding model, which requires an Elasticsearch Enterprise license or a 30-day Trial license. To install the built-in embedding model in Elasticsearch, please refer to [this documentation](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser#alternative-download-deploy). If you want to install a third-party embedding model, please refer to [this guide](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example).
91+
92+
For a complete tutorial on integrating the ai-search plugin with Elasticsearch, please refer to: [Building a RAG Application with LangChain + Higress + Elasticsearch](https://cr7258.github.io/blogs/original/2025/15-rag-higress-es-langchain).
93+
9094
## Quark Specific Configuration
9195

9296
| Name | Data Type | Requirement | Default Value | Description |
@@ -203,13 +207,9 @@ Note that excessive concurrency may lead to rate limiting, adjust according to a
203207
searchFrom:
204208
- type: elasticsearch
205209
serviceName: "es-svc.static"
206-
# static ip service use 80 as default port
207-
servicePort: 80
208210
index: "knowledge_base"
209211
contentField: "content"
210212
semanticTextField: "semantic_text"
211-
linkField: "url"
212-
titleField: "title"
213213
# username: "elastic"
214214
# password: "password"
215215
```

plugins/wasm-go/extensions/ai-search/engine/elasticsearch/elasticsearch.go

Lines changed: 24 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -27,15 +27,21 @@ type ElasticsearchSearch struct {
2727
password string
2828
}
2929

30-
func NewElasticsearchSearch(config *gjson.Result) (*ElasticsearchSearch, error) {
30+
func NewElasticsearchSearch(config *gjson.Result, needReference bool) (*ElasticsearchSearch, error) {
3131
engine := &ElasticsearchSearch{}
3232
serviceName := config.Get("serviceName").String()
3333
if serviceName == "" {
3434
return nil, errors.New("serviceName not found")
3535
}
3636
servicePort := config.Get("servicePort").Int()
3737
if servicePort == 0 {
38-
return nil, errors.New("servicePort not found")
38+
if strings.HasSuffix(serviceName, ".static") {
39+
servicePort = 80
40+
} else if strings.HasSuffix(serviceName, ".dns") {
41+
servicePort = 443
42+
} else {
43+
return nil, errors.New("servicePort not found")
44+
}
3945
}
4046
engine.client = wrapper.NewClusterClient(wrapper.FQDNCluster{
4147
FQDN: serviceName,
@@ -54,14 +60,18 @@ func NewElasticsearchSearch(config *gjson.Result) (*ElasticsearchSearch, error)
5460
if engine.semanticTextField == "" {
5561
return nil, errors.New("semanticTextField not found")
5662
}
57-
engine.linkField = config.Get("linkField").String()
58-
if engine.linkField == "" {
59-
return nil, errors.New("linkField not found")
60-
}
61-
engine.titleField = config.Get("titleField").String()
62-
if engine.titleField == "" {
63-
return nil, errors.New("titleField not found")
63+
64+
if needReference {
65+
engine.linkField = config.Get("linkField").String()
66+
if engine.linkField == "" {
67+
return nil, errors.New("linkField not found")
68+
}
69+
engine.titleField = config.Get("titleField").String()
70+
if engine.titleField == "" {
71+
return nil, errors.New("titleField not found")
72+
}
6473
}
74+
6575
engine.timeoutMillisecond = uint32(config.Get("timeoutMillisecond").Uint())
6676
if engine.timeoutMillisecond == 0 {
6777
engine.timeoutMillisecond = 5000
@@ -93,6 +103,9 @@ func (e ElasticsearchSearch) generateAuthorizationHeader() string {
93103
func (e ElasticsearchSearch) generateQueryBody(ctx engine.SearchContext) string {
94104
queryText := strings.Join(ctx.Querys, " ")
95105
return fmt.Sprintf(`{
106+
"_source":{
107+
"excludes": "%s"
108+
},
96109
"retriever": {
97110
"rrf": {
98111
"retrievers": [
@@ -118,7 +131,7 @@ func (e ElasticsearchSearch) generateQueryBody(ctx engine.SearchContext) string
118131
]
119132
}
120133
}
121-
}`, e.contentField, queryText, e.semanticTextField, queryText)
134+
}`, e.semanticTextField, e.contentField, queryText, e.semanticTextField, queryText)
122135
}
123136

124137
func (e ElasticsearchSearch) CallArgs(ctx engine.SearchContext) engine.CallArgs {
@@ -145,9 +158,7 @@ func (e ElasticsearchSearch) ParseResult(ctx engine.SearchContext, response []by
145158
Link: source.Get(e.linkField).String(),
146159
Content: source.Get(e.contentField).String(),
147160
}
148-
if result.Valid() {
149-
results = append(results, result)
150-
}
161+
results = append(results, result)
151162
}
152163
return results
153164
}

plugins/wasm-go/extensions/ai-search/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ func parseConfig(json gjson.Result, config *Config, log wrapper.Log) error {
185185
arxivExists = true
186186
onlyQuark = false
187187
case "elasticsearch":
188-
searchEngine, err := elasticsearch.NewElasticsearchSearch(&e)
188+
searchEngine, err := elasticsearch.NewElasticsearchSearch(&e, config.needReference)
189189
if err != nil {
190190
return fmt.Errorf("elasticsearch search engine init failed:%s", err)
191191
}

0 commit comments

Comments
 (0)