feature(xjy): add multi-task learning pipeline in jericho environment #365

xiongjyu · 2025-05-27T12:22:06Z

No description provided.

puyuan1996 · 2025-05-30T03:43:38Z

lzero/entry/train_unizero_multitask.py

+from ding.config import compile_config
+from ding.envs import create_env_manager, get_vec_env_setting
+from ding.policy import create_policy, Policy
+# from ding.rl_utils import get_epsilon_greedy_fn # get_epsilon_greedy_fn 已被弃用，如果需要需要从 ding.exploration 导入


英文注释

puyuan1996 · 2025-05-30T03:46:00Z

lzero/entry/train_unizero_multitask_ddp.py

+
+    return weights
+
+def train_unizero_multitask_ddp(


英文注释，上面的工具函数可以移到entry/utils.py中去

puyuan1996 · 2025-05-30T03:48:15Z

lzero/model/unizero_world_models/moe.py

-    def __init__(self, experts: List[nn.Module], gate: nn.Module, num_experts_per_tok=1):
+class MoELayer(nn.Module):
+    """
+    Mixture-of-Experts (MoE) 层的实现，参考了如下的设计：


英文注释

puyuan1996 · 2025-05-30T03:49:35Z

lzero/model/unizero_world_models/transformer.py

+        # 若使用 Register Token，则将其拼到序列最前面
+        # 训练阶段和推理阶段都统一处理
+        if self.use_register_token:
+            sequences = self.add_register_tokens(sequences, task_id)


pull最新的opendilab:dev-multitask-balance-clean这个分支，rotary_emb等已实现功能不应该去掉

puyuan1996 · 2025-05-30T07:21:47Z

lzero/model/unizero_world_models/transformer.py

+            # self.feed_forward = MoELayer(moe_cfg)
+            # print("=" * 20)
+            # print(f"Use MoE feed_forward, num_experts={moe_cfg.num_experts_total}")
+            # print("=" * 20)


删除没用到的注释

puyuan1996 · 2025-05-30T07:26:29Z

lzero/policy/unizero_multitask.py

@@ -869,10 +858,12 @@ def _monitor_vars_learn(self, num_tasks=2) -> List[str]:
        # self.task_num_for_current_rank 作为当前rank的base_index
        num_tasks = self.task_num_for_current_rank
        # If the number of tasks is provided, extend the monitored variables list with task-specific variables
+        # TODO xiongjyu: 以下代码感觉有问题，如果num_tasks != 1（例如2）, 4个任务的self.task_id分别是0， 1， 2， 3；


这个具体的问题在群里截图发一下看看哈

puyuan1996 · 2025-05-30T07:28:46Z

zoo/jericho/configs/jericho_unizero_multitask_ddp_config.py

+            manual_temperature_decay=False,
+            num_simulations=num_simulations,
+            n_episode=n_episode,
+            train_start_after_envsteps=int(0), # TODO: ===== only for debug =====


去掉不用的注释

puyuan1996 · 2025-05-30T07:28:54Z

zoo/jericho/configs/jericho_unizero_multitask_config.py

+            import_names=['zoo.jericho.envs.jericho_env'],
+        ),
+        env_manager=dict(type='base'),
+        # env_manager=dict(type='subprocess'), # subprocess在jericho环境下不支持


英文注释

puyuan1996 · 2025-05-30T07:29:27Z

zoo/jericho/configs/jericho_unizero_multitask_config.py

+                    num_heads=24,
+                    obs_type="text",  # TODO: Modify as needed.
+                    env_num=max(collector_env_num, evaluator_env_num),              
+                    task_embed_option=None,   # ==============TODO: none ==============


全部改动都检查一下注释格式。删除不用的TODO

puyuan1996 · 2025-05-30T07:30:42Z

zoo/jericho/configs/jericho_unizero_multitask_ddp_config.py

+            max_action_num=max_action_num,
+            tokenizer_path=model_name,
+            max_seq_len=512,
+            game_path=f"./zoo/jericho/envs/z-machine-games-master/jericho-game-suite/{env_id}",


目前最新的结果是用这个跑的吗？

…o pr-353

xiongjyu added 2 commits May 4, 2025 16:15

feature(xjy): add multi-task learning pipeline in jericho environment

d002719

Standardized the format and added the ability to use moe in unizero

4adb3dc

puyuan1996 added enhancement New feature or request config New or improved configuration labels May 30, 2025

puyuan1996 requested changes May 30, 2025

View reviewed changes

xiongjyu added 2 commits June 4, 2025 01:10

Merge remote-tracking branch 'origin/dev-multitask-balance-clean' int…

285cd77

…o pr-353

fixed a bug in calculating dormant

a820eca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature(xjy): add multi-task learning pipeline in jericho environment #365

feature(xjy): add multi-task learning pipeline in jericho environment #365

Uh oh!

xiongjyu commented May 27, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

puyuan1996 May 30, 2025

Uh oh!

Uh oh!

feature(xjy): add multi-task learning pipeline in jericho environment #365

Are you sure you want to change the base?

feature(xjy): add multi-task learning pipeline in jericho environment #365

Uh oh!

Conversation

xiongjyu commented May 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!