feat: ai-token-ratelimit support setting global rate limit thresholds for routes #2667

hanxiantao · 2025-07-27T02:00:09Z

Ⅰ. Describe what this PR did

1）AI Token限流插件支持针对整个路由设置限流阈值
2）cluster-key-rate-limit和ai-token-ratelimit插件配置多个相同限流类型时日志提示
3）统一cluster-key-rate-limit和ai-token-ratelimit插件的基础逻辑

Ⅱ. Does this pull request fix one issue?

fixes #2659
fixes #2592

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

cluster-key-rate-limit相同rule_item日志提示：

1）相同的type+key有warn日志提示

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: cluster-key-rate-limit-1.0.0
  namespace: higress-system
spec:
  defaultConfigDisable: true
  failStrategy: FAIL_OPEN
  imagePullPolicy: UNSPECIFIED_POLICY
  imagePullSecret: aliyun
  matchRules:
    - config:
        redis:
          service_name: redis.default.svc.cluster.local
          service_port: 6379
        rule_items:
          - limit_by_param: apikey
            limit_keys:
              - key: OPO_allan.zhou_proj_contract-analysis_access
                query_per_minute: 2000
          - limit_by_param: apikey
            limit_keys:
              - key: OPO_allan.zhou_proj_contract-analysis_access
                query_per_minute: 20
        rule_name: routeA-consumer-limit-rule
        show_limit_quota_header: true
      configDisable: false
      ingress:
        - foo
  phase: UNSPECIFIED_PHASE
  priority: 20
  url: >-
    oci://registry.cn-hangzhou.aliyuncs.com/wasm-plugin/wasm-plugin:cluster-key-rate-limit-072603

2）相同的type+不同的key没有warn日志提示

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: cluster-key-rate-limit-1.0.0
  namespace: higress-system
spec:
  defaultConfigDisable: true
  failStrategy: FAIL_OPEN
  imagePullPolicy: UNSPECIFIED_POLICY
  imagePullSecret: aliyun
  matchRules:
    - config:
        redis:
          service_name: redis.default.svc.cluster.local
          service_port: 6379
        rule_items:
          - limit_by_param: apikey
            limit_keys:
              - key: OPO_allan.zhou_proj_contract-analysis_access
                query_per_minute: 2000
          - limit_by_param: apikey
            limit_keys:
              - key: OPO_allan.zhou_proj_contract-analysis_access
                query_per_minute: 20
        rule_name: routeA-consumer-limit-rule
        show_limit_quota_header: true
      configDisable: false
      ingress:
        - foo
  phase: UNSPECIFIED_PHASE
  priority: 20
  url: >-
    oci://registry.cn-hangzhou.aliyuncs.com/wasm-plugin/wasm-plugin:cluster-key-rate-limit-072603

ai token限流插件针对整个路由设置限流值：

1）功能回归

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-token-ratelimit-1.0.0
  namespace: higress-system
spec:
  defaultConfigDisable: true
  failStrategy: FAIL_OPEN
  imagePullPolicy: UNSPECIFIED_POLICY
  imagePullSecret: aliyun
  matchRules:
    - config:
        redis:
          service_name: redis.default.svc.cluster.local
          service_port: 6379
        rule_items:
          - limit_by_per_ip: from-remote-addr
            limit_keys:
              - key: 0.0.0.0/0
                token_per_minute: 100
        rule_name: default_rule
      configDisable: false
      ingress:
        - ai-route-qwen.internal
  phase: UNSPECIFIED_PHASE
  priority: 600
  url: >-
    oci://registry.cn-hangzhou.aliyuncs.com/wasm-plugin/wasm-plugin:ai-token-ratelimit-072701

2）全局限流功能验证

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-token-ratelimit-1.0.0
  namespace: higress-system
spec:
  defaultConfigDisable: true
  failStrategy: FAIL_OPEN
  imagePullPolicy: UNSPECIFIED_POLICY
  imagePullSecret: aliyun
  matchRules:
    - config:
        global_threshold:
          token_per_hour: 1000
        redis:
          service_name: redis.default.svc.cluster.local
          service_port: 6379
        rule_name: default_rule
      configDisable: false
      ingress:
        - ai-route-qwen.internal
  phase: UNSPECIFIED_PHASE
  priority: 600
  url: >-
    oci://registry.cn-hangzhou.aliyuncs.com/wasm-plugin/wasm-plugin:ai-token-ratelimit-072701

Ⅴ. Special notes for reviews

lingma-agents · 2025-07-27T02:00:42Z

AI Token限流插件支持全局限流及配置优化

变更概述

新功能
- 新增global_threshold配置项，支持对整个路由规则组设置全局的Token限流阈值，适用于需要对整个服务进行统一限流的场景
- 支持rule_name与global_threshold组合实现规则级全局限流，增强了插件的灵活性和适用性
- 在rule_items中增加重复规则检测逻辑，当存在相同的limit_by_*类型和key组合时，输出警告日志，提升配置的可维护性
重构
- 将原有的单一配置解析逻辑拆分为独立的配置包(config)，包含config.go和config_test.go，提升了代码的模块化和可测试性
- 重命名主配置结构体为AiTokenRateLimitConfig，并新增GlobalThreshold结构体专门处理全局限流配置，使结构更清晰
测试更新
- 新增针对ParseAiTokenRateLimitConfig函数的单元测试，覆盖了缺少rule_name、仅配置global_threshold、以及配置rule_items等多种场景，确保配置解析逻辑的正确性
文档
- 更新中英文README文档，详细说明新增的global_threshold配置项及其使用方法
- 补充了“自定义规则组全局限流”的配置示例，指导用户如何使用新功能
- 优化了配置项说明表格，使其更清晰易读

变更文件

文件路径	变更说明
plugins/wasm-go/extensions/ai-token-ratelimit/README.md	更新了功能说明，新增对规则级全局限流的支持描述；扩展了配置项说明，增加了`global_threshold`及其子项的详细解释；新增了全局限流的配置示例。
plugins/wasm-go/extensions/ai-token-ratelimit/README_EN.md	Updated the function description to include rule-level global rate limiting; expanded configuration descriptions with details for `global_threshold` and its sub-fields; added a configuration example for global rate limiting.
plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go	新建配置解析文件，定义了`AiTokenRateLimitConfig`和`GlobalThreshold`结构体；实现了`global_threshold`和`rule_items`的解析逻辑；增加了对`rule_items`中重复规则的检测和日志提示。
plugins/wasm-go/extensions/ai-token-ratelimit/config/config_test.go	新增配置解析的单元测试，覆盖了缺少`rule_name`、仅配置`global_threshold`、以及配置`rule_items`等多种场景。

💡 小贴士

与 lingma-agents 交流的方式

📜 直接回复评论
直接回复本条评论，lingma-agents 将自动处理您的请求。例如：

在当前代码中添加详细的注释说明。
请详细介绍一下你说的 LRU 改造方案，并使用伪代码加以说明。

📜 在代码行处标记
在文件的特定位置创建评论并 @lingma-agents。例如：

@lingma-agents 分析这个方法的性能瓶颈并提供优化建议。
@lingma-agents 对这个方法生成优化代码。

📜 在讨论中提问
在任何讨论中 @lingma-agents 来获取帮助。例如：

@lingma-agents 请总结上述讨论并提出解决方案。
@lingma-agents 请根据讨论内容生成优化代码。

lingma-agents

🔎 代码评审报告

🎯 评审意见概览

严重度	数量	说明
🔴 Blocker	0	阻断性问题，需立即修复。例如：系统崩溃、关键功能不可用或严重安全漏洞。
🟠 Critical	3	严重问题，高优先级修复。例如：核心功能异常或性能瓶颈影响用户体验。
🟡 Major	2	主要问题，建议修复。例如：非核心功能缺陷或代码维护性较差。
🟢 Minor	0	次要问题，酬情优化。例如：代码格式不规范或注释缺失。

总计: 5 个问题

📋 评审意见详情

💡 代码实现建议

以下是文件级别的代码建议，聚焦于代码的可读性、可维护性和潜在问题。

📖 plugins/wasm-go/extensions/ai-token-ratelimit/README.md (1 💬)

配置示例中的字段名与实际实现不一致，可能导致用户配置错误。 (L83)

📖 plugins/wasm-go/extensions/ai-token-ratelimit/README_EN.md (1 💬)

英文文档配置示例中的字段名与实际实现不一致，可能导致用户配置错误。 (L89)

🔹 plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go (3 💬)

🚀 架构设计建议

以下是对代码架构和设计的综合分析，聚焦于跨文件交互、系统一致性和潜在优化空间。

🔍1. 全局限流与规则限流的配置互斥逻辑可能导致用户配置错误

当前实现中，global_threshold 和 rule_items 被设计为互斥配置，即两者只能选其一。这种设计虽然简化了限流逻辑，但在实际使用场景中，用户可能期望同时应用全局限流和特定规则限流（例如，先进行全局限流再进行细粒度限流）。建议在文档中明确说明此限制，并考虑未来扩展支持两种模式的组合使用。

📌 关键代码

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go (180-186)

if !hasGlobal && !hasRule {
		return errors.New("at least one of 'global_threshold' or 'rule_items' must be set")
	} else if hasGlobal && hasRule {
		return errors.New("'global_threshold' and 'rule_items' cannot be set at the same time")
	}

⚠️ 潜在风险

用户可能会因不了解配置互斥逻辑而产生配置错误，导致限流策略未按预期生效，影响系统稳定性或用户体验。

🔍2. 全局限流阈值未校验是否为正数，存在无效配置风险

在解析 global_threshold 的过程中，虽然检查了时间窗口字段是否存在且大于0，但没有对 Count 字段做正数校验。如果用户配置了一个非正数的 Count 值，可能导致限流逻辑异常或 Redis 错误。建议增加对 Count 的正数校验，确保配置的有效性。

📌 关键代码

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go (230-240)

func parseGlobalThreshold(item gjson.Result) (*GlobalThreshold, error) {
	for timeWindowKey, duration := range timeWindows {
		q := item.Get(timeWindowKey)
		if q.Exists() && q.Int() > 0 {
			return &GlobalThreshold{
				Count:      q.Int(),
				TimeWindow: duration,
			}, nil
		}
	}
	return nil, errors.New("one of 'token_per_second', 'token_per_minute', 'token_per_hour', or 'token_per_day' must be set for global_threshold")
}

⚠️ 潜在风险

无效的全局限流阈值可能导致限流功能失效或 Redis 操作失败，影响系统的稳定性和可靠性。

🔍3. 限流键值匹配逻辑存在缺陷，正则表达式匹配可能失败

在处理 limit_by_per_* 类型的限流规则时，对于正则表达式的匹配逻辑存在潜在缺陷。如果用户配置的正则表达式不合法或匹配逻辑有误，可能导致限流键值无法正确匹配，从而影响限流效果。建议增强正则表达式校验和匹配逻辑的健壮性。

📌 关键代码

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go (342-354)

if itemKey == "*" {
				itemType = AllType
			} else if strings.HasPrefix(itemKey, "regexp:") {
				regexpStr := itemKey[len("regexp:"):]
				var err error
				regexp, err = re.Compile(regexpStr)
				if err != nil {
					return fmt.Errorf("failed to compile regex for key '%s': %w", itemKey, err)
				}
				itemType = RegexpType
			} else {
				return fmt.Errorf("the '%s' restriction must start with 'regexp:' or be exactly '*'", rule.LimitType)
			}

⚠️ 潜在风险

正则表达式匹配失败可能导致限流键值无法正确识别，使得部分请求绕过限流控制，影响系统的公平性和稳定性。

🔍4. 限流规则项的阈值解析逻辑未验证阈值是否为正数，可能导致无效配置

在解析限流规则项的阈值时，虽然检查了时间窗口字段是否存在且大于0，但没有对 Count 字段做正数校验。如果用户配置了一个非正数的 Count 值，可能导致限流逻辑异常或 Redis 错误。建议增加对 Count 的正数校验，确保配置的有效性。

📌 关键代码

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go (370-383)

func createConfigItemFromRate(item gjson.Result, itemType LimitConfigItemType, key string, ipNet *iptree.IPTree, regexp *re.Regexp) (*LimitConfigItem, error) {
	for timeWindowKey, duration := range timeWindows {
		q := item.Get(timeWindowKey)
		if q.Exists() && q.Int() > 0 {
			return &LimitConfigItem{
				ConfigType: itemType,
				Key:        key,
				IpNet:      ipNet,
				Regexp:     regexp,
				Count:      q.Int(),
				TimeWindow: duration,
			}, nil
		}
	}
	return nil, errors.New("one of 'token_per_second', 'token_per_minute', 'token_per_hour', or 'token_per_day' must be set for key: " + key)
}

⚠️ 潜在风险

无效的限流阈值可能导致限流功能失效或 Redis 操作失败，影响系统的稳定性和可靠性。

🔍5. 缺少对重复限流规则的严格校验，可能导致配置冲突

当前实现中，虽然记录了已出现的 LimitType 和 Key 组合，并在发现重复时发出警告，但并未阻止配置的加载。如果用户配置了多个相同的限流规则，可能导致限流行为不一致或难以调试。建议加强重复规则的校验，避免配置冲突。

📌 关键代码

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go (214-222)

// 构造LimitType和Key的唯一标识
		ruleKey := string(ruleItem.LimitType) + ":" + ruleItem.Key

		// 检查是否有重复的LimitType和Key组合
		if seenLimitRules[ruleKey] {
			log.Warnf("duplicate rule found: %s='%s' in rule_items", ruleItem.LimitType, ruleItem.Key)
		} else {
			seenLimitRules[ruleKey] = true
		}

⚠️ 潜在风险

重复的限流规则可能导致限流行为不一致，增加调试难度，影响系统的可维护性和稳定性。

审查详情

📒 文件清单 （13 个文件）

✅ 新增: 3 个文件
❌ 删除: 2 个文件
📝 变更: 8 个文件

✅ 新增文件:

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go
plugins/wasm-go/extensions/ai-token-ratelimit/config/config_test.go
plugins/wasm-go/extensions/ai-token-ratelimit/util/utils.go

❌ 删除文件:

plugins/wasm-go/extensions/ai-token-ratelimit/config.go
plugins/wasm-go/extensions/ai-token-ratelimit/utils.go

📝 变更文件:

plugins/wasm-go/extensions/ai-token-ratelimit/README.md
plugins/wasm-go/extensions/ai-token-ratelimit/README_EN.md
plugins/wasm-go/extensions/ai-token-ratelimit/go.mod
plugins/wasm-go/extensions/ai-token-ratelimit/go.sum
plugins/wasm-go/extensions/ai-token-ratelimit/main.go
plugins/wasm-go/extensions/cluster-key-rate-limit/config/config.go
plugins/wasm-go/extensions/cluster-key-rate-limit/go.sum
plugins/wasm-go/extensions/cluster-key-rate-limit/main.go

💡 小贴士

与 lingma-agents 交流的方式

📜 直接回复评论
直接回复本条评论，lingma-agents 将自动处理您的请求。例如：

在当前代码中添加详细的注释说明。
请详细介绍一下你说的 LRU 改造方案，并使用伪代码加以说明。

📜 在代码行处标记
在文件的特定位置创建评论并 @lingma-agents。例如：

@lingma-agents 分析这个方法的性能瓶颈并提供优化建议。
@lingma-agents 对这个方法生成优化代码。

📜 在讨论中提问
在任何讨论中 @lingma-agents 来获取帮助。例如：

@lingma-agents 请总结上述讨论并提出解决方案。
@lingma-agents 请根据讨论内容生成优化代码。

plugins/wasm-go/extensions/ai-token-ratelimit/README.md

plugins/wasm-go/extensions/ai-token-ratelimit/README_EN.md

plugins/wasm-go/extensions/ai-token-ratelimit/config/config.go

codecov-commenter · 2025-07-27T05:59:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.02%. Comparing base (ef31e09) to head (15a1633).
⚠️ Report is 624 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2667       +/-   ##
===========================================
+ Coverage   35.91%   46.02%   +10.11%     
===========================================
  Files          69       81       +12     
  Lines       11576    13020     +1444     
===========================================
+ Hits         4157     5993     +1836     
+ Misses       7104     6681      -423     
- Partials      315      346       +31

see 78 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… for routes (alibaba#2667)

hanxiantao added 4 commits July 26, 2025 14:57

cluster-key-rate-limit如果配置重复的LimitType+key，添加日志提醒

78faa7b

ai-token-ratelimit目录结构调整

c10bc2f

ai-token-ratelimit支持整个路由设置限流阈值

c8376d5

update README

9af778b

hanxiantao requested review from CH3CHO, erasernoob, johnlanni and rinfx as code owners July 27, 2025 02:00

lingma-agents bot reviewed Jul 27, 2025

View reviewed changes

hanxiantao added 3 commits July 27, 2025 10:03

update README

cffa3c7

修复lingma-agents review问题

54323ee

Update README

15a1633

hanxiantao mentioned this pull request Jul 27, 2025

doc: Sync ai-token-ratelimit docs higress-group/higress-group.github.io#453

Merged

erasernoob approved these changes Jul 27, 2025

View reviewed changes

hanxiantao merged commit 6a1557f into alibaba:main Jul 28, 2025
12 checks passed

hanxiantao deleted the ai-token-ratelimit-full-route-threshold branch July 28, 2025 00:14

ink-hz pushed a commit to ink-hz/higress-ai-capability-auth that referenced this pull request Nov 5, 2025

feat: ai-token-ratelimit support setting global rate limit thresholds…

57e91fe

… for routes (alibaba#2667)

hanxiantao mentioned this pull request Nov 26, 2025

官网文档错误，token限流没有show_limit_quota_header配置项 || Official website document error, token current limit does not have show_limit_quota_header configuration item #3165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: ai-token-ratelimit support setting global rate limit thresholds for routes #2667

feat: ai-token-ratelimit support setting global rate limit thresholds for routes #2667

Uh oh!

hanxiantao commented Jul 27, 2025 •

edited

Loading

Uh oh!

lingma-agents bot commented Jul 27, 2025

与 lingma-agents 交流的方式

Uh oh!

lingma-agents bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jul 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: ai-token-ratelimit support setting global rate limit thresholds for routes​ #2667

feat: ai-token-ratelimit support setting global rate limit thresholds for routes​ #2667

Uh oh!

Conversation

hanxiantao commented Jul 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Uh oh!

lingma-agents bot commented Jul 27, 2025

AI Token限流插件支持全局限流及配置优化

与 lingma-agents 交流的方式

Uh oh!

lingma-agents bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🔎 代码评审报告

与 lingma-agents 交流的方式

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jul 27, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: ai-token-ratelimit support setting global rate limit thresholds for routes #2667

feat: ai-token-ratelimit support setting global rate limit thresholds for routes #2667

hanxiantao commented Jul 27, 2025 •

edited

Loading

lingma-agents bot left a comment •

edited

Loading