Pinning liger-kernal version #9

Fiona-Waters · 2025-09-22T15:49:57Z

While working on creating a runtime image that encorporates kubeflow training and training hub related dependencies I came across the following error when using the image to run osft_llama_example.py and osft_llama_example.py:

[rank0]: ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/mini_trainer/train.py:691 in    │
[rank0]: │ main                                                                         │
[rank0]: │                                                                              │
[rank0]: │   688 │   # If Orthogonal Subspace Learning is enabled, loads a model with d │
[rank0]: │   689 │   # Convert user-facing osft_unfreeze_rank_ratio to internal osft_ra │
[rank0]: │   690 │   osft_rank_ratio = None if osft_unfreeze_rank_ratio is None else (1 │
[rank0]: │ ❱ 691 │   model = setup_model(                                               │
[rank0]: │   692 │   │   model_name_or_path=model_name_or_path,                         │
[rank0]: │   693 │   │   save_dtype=save_dtype,                                         │
[rank0]: │   694 │   │   use_liger_kernels=use_liger_kernels,                           │
[rank0]: │                                                                              │
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/mini_trainer/setup_model_for_tr │
[rank0]: │ aining.py:197 in setup_model                                                 │
[rank0]: │                                                                              │
[rank0]: │   194 │   │   │   liger_fixed_fused_linear_cross_entropy_none_reduction,     │
[rank0]: │   195 │   │   )                                                              │
[rank0]: │   196 │   │                                                                  │
[rank0]: │ ❱ 197 │   │   patch_target_module(                                           │
[rank0]: │   198 │   │   │   "liger_kernel.transformers.model.loss_utils.fixed_fused_li │
[rank0]: │   199 │   │   │   liger_fixed_fused_linear_cross_entropy_none_reduction,     │
[rank0]: │   200 │   │   )                                                              │
[rank0]: │                                                                              │
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/mini_trainer/utils.py:61 in     │
[rank0]: │ patch_target_module                                                          │
[rank0]: │                                                                              │
[rank0]: │    58 │                                                                      │
[rank0]: │    59 │   to_patch, obj_name_to_patch = to_patch[:-1], to_patch[-1]          │
[rank0]: │    60 │   to_patch = ".".join(to_patch)                                      │
[rank0]: │ ❱  61 │   source = importlib.import_module(to_patch)                         │
[rank0]: │    62 │   setattr(source, obj_name_to_patch, replace_with)                   │
[rank0]: │    63                                                                        │
[rank0]: │    64                                                                        │
[rank0]: │                                                                              │
[rank0]: │ /usr/lib64/python3.12/importlib/__init__.py:90 in import_module              │
[rank0]: │                                                                              │
[rank0]: │    87 │   │   │   if character != '.':                                       │
[rank0]: │    88 │   │   │   │   break                                                  │
[rank0]: │    89 │   │   │   level += 1                                                 │
[rank0]: │ ❱  90 │   return _bootstrap._gcd_import(name[level:], package, level)        │
[rank0]: │    91                                                                        │
[rank0]: │    92                                                                        │
[rank0]: │    93 _RELOADING = {}                                                        │
[rank0]: │ in _gcd_import:1387                                                          │
[rank0]: │ in _find_and_load:1360                                                       │
[rank0]: │ in _find_and_load_unlocked:1310                                              │
[rank0]: │ in _call_with_frames_removed:488                                             │
[rank0]: │ in _gcd_import:1387                                                          │
[rank0]: │ in _find_and_load:1360                                                       │
[rank0]: │ in _find_and_load_unlocked:1310                                              │
[rank0]: │ in _call_with_frames_removed:488                                             │
[rank0]: │ in _gcd_import:1387                                                          │
[rank0]: │ in _find_and_load:1360                                                       │
[rank0]: │ in _find_and_load_unlocked:1331                                              │
[rank0]: │ in _load_unlocked:935                                                        │
[rank0]: │ in exec_module:999                                                           │
[rank0]: │ in _call_with_frames_removed:488                                             │
[rank0]: │                                                                              │
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/liger_kernel/transformers/__ini │
[rank0]: │ t__.py:1 in <module>                                                         │
[rank0]: │                                                                              │
[rank0]: │ ❱  1 from liger_kernel.transformers.auto_model import AutoLigerKernelForCaus │
[rank0]: │    2 from liger_kernel.transformers.cross_entropy import LigerCrossEntropyLo │
[rank0]: │    3 from liger_kernel.transformers.fused_linear_cross_entropy import LigerF │
[rank0]: │    4 from liger_kernel.transformers.fused_linear_jsd import LigerFusedLinear │
[rank0]: │                                                                              │
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/liger_kernel/transformers/auto_ │
[rank0]: │ model.py:6 in <module>                                                       │
[rank0]: │                                                                              │
[rank0]: │    3 from transformers import AutoConfig                                     │
[rank0]: │    4 from transformers import AutoModelForCausalLM                           │
[rank0]: │    5                                                                         │
[rank0]: │ ❱  6 from liger_kernel.transformers.monkey_patch import MODEL_TYPE_TO_APPLY_ │
[rank0]: │    7 from liger_kernel.transformers.monkey_patch import _apply_liger_kernel  │
[rank0]: │    8                                                                         │
[rank0]: │    9                                                                         │
[rank0]: │                                                                              │
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/liger_kernel/transformers/monke │
[rank0]: │ y_patch.py:16 in <module>                                                    │
[rank0]: │                                                                              │
[rank0]: │    13 from liger_kernel.transformers.functional import liger_cross_entropy   │
[rank0]: │    14 from liger_kernel.transformers.geglu import LigerGEGLUMLP              │
[rank0]: │    15 from liger_kernel.transformers.layer_norm import LigerLayerNorm        │
[rank0]: │ ❱  16 from liger_kernel.transformers.model.gemma import lce_forward as gemma │
[rank0]: │    17 from liger_kernel.transformers.model.gemma import lce_forward_deprecat │
[rank0]: │    18 from liger_kernel.transformers.model.gemma2 import lce_forward as gemm │
[rank0]: │    19 from liger_kernel.transformers.model.gemma2 import lce_forward_depreca │
[rank0]: │                                                                              │
[rank0]: │ /opt/app-root/lib64/python3.12/site-packages/liger_kernel/transformers/model │
[rank0]: │ /gemma.py:11 in <module>                                                     │
[rank0]: │                                                                              │
[rank0]: │     8 from torch.nn import CrossEntropyLoss                                  │
[rank0]: │     9 from transformers.cache_utils import Cache                             │
[rank0]: │    10 from transformers.modeling_outputs import CausalLMOutputWithPast       │
[rank0]: │ ❱  11 from transformers.models.gemma.modeling_gemma import _CONFIG_FOR_DOC   │
[rank0]: │    12 from transformers.models.gemma.modeling_gemma import GEMMA_INPUTS_DOCS │
[rank0]: │    13 from transformers.utils import add_start_docstrings_to_model_forward   │
[rank0]: │    14 from transformers.utils import replace_return_docstrings               │
[rank0]: ╰──────────────────────────────────────────────────────────────────────────────╯
[rank0]: ImportError: cannot import name '_CONFIG_FOR_DOC' from 
[rank0]: 'transformers.models.gemma.modeling_gemma' 
[rank0]: (/opt/app-root/lib64/python3.12/site-packages/transformers/models/gemma/modeling
[rank0]: _gemma.py)
[rank0]:[W919 14:11:25.274921520 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
E0919 14:11:26.582000 502 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 567) of binary: /opt/app-root/bin/python3.12
Traceback (most recent call last):
  File "/opt/app-root/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/distributed/run.py", line 901, in main
    run(args)
  File "/opt/app-root/lib64/python3.12/site-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/opt/app-root/lib64/python3.12/site-packages/torch/distributed/launcher/api.py", line 143, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/distributed/launcher/api.py", line 277, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/opt/app-root/lib64/python3.12/site-packages/mini_trainer/train.py FAILED
------------------------------------------------------------

Updating the liger-kernel dependency to 0.5.10 fixes this issue as per huggingface/trl#3480 . This is an instruct-lab dependency so not sure where would be best to fix it.

Summary by CodeRabbit

Chores
- Added an optional CUDA-related dependency to enable GPU-accelerated functionality when selected.
- No functional behavior changes for users who do not opt into CUDA.
- Users opting into GPU support may need to update their environments; installation size/time may increase.
- Minor packaging/formatting tweak to dependency declarations (no impact on functionality).

coderabbitai · 2025-09-22T15:50:05Z

Walkthrough

Added an optional CUDA dependency liger-kernel>=0.5.10 under [project.optional-dependencies].cuda in pyproject.toml.

Changes

Cohort / File(s)	Summary
Project metadata `pyproject.toml`	Added optional CUDA dependency `liger-kernel>=0.5.10` under `[project.optional-dependencies].cuda`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble lines of toml tonight,
A kernel tiptoes into view,
Optional CUDA, tucked in tight,
Quiet hops of version new.
—🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Pinning liger-kernal version" concisely and accurately reflects the primary change (pinning the liger-kernel dependency) and is relevant to the changeset, but it contains a spelling error ("kernal" instead of "kernel").
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

pyproject.toml (3)
49-49: Tighten the “pin”: use a bounded range (or exact pin) and annotate the reason.

If the intent is to lock the working fix, prefer >=0.5.10,<0.5.11 (or ==0.5.10) to avoid future breakage; also add a brief comment referencing the upstream issue.

Apply one of the following:

Option A (bounded minor):
-    "liger-kernel>=0.5.10"
+    "liger-kernel>=0.5.10,<0.5.11",  # Fix Gemma _CONFIG_FOR_DOC import error; see HF TRL issue 3480
Option B (exact pin for apps):
-    "liger-kernel>=0.5.10"
+    "liger-kernel==0.5.10",  # Fix Gemma _CONFIG_FOR_DOC import error; see HF TRL issue 3480
39-47: Duplicate dependency: filelock listed twice with different minima.

You have both filelock>=3.0 (Line 39) and filelock>=3.19.1 (Line 46). Keep one, preferably the stricter one.

Apply this diff:
-    "filelock>=3.0",
@@
-    "filelock>=3.19.1",
+    "filelock>=3.19.1",
48-50: Confirm whether liger-kernel must be core or should live under a CUDA/kernels extra.

If Training Hub flows exist that don’t require liger-kernel, consider moving it to [project.optional-dependencies].cuda (or a new [project.optional-dependencies].kernels) to reduce base install weight. Otherwise, keeping it core is fine.

I can draft the extras rewire and minimal import smoke test for CI to validate Gemma import. Want me to push that?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8164824 and 7086176.

📒 Files selected for processing (1)

pyproject.toml (1 hunks)

pyproject.toml

Maxusmusti

Could you move this to the cuda dependencies, rather than the base dependencies? This package should be installed when installing via pip install training-hub[cuda].

Fiona-Waters · 2025-09-23T16:09:21Z

Could you move this to the cuda dependencies, rather than the base dependencies? This package should be installed when installing via pip install training-hub[cuda].

Updated. Thanks!

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

pyproject.toml (2)
39-47: Duplicate dependency: filelock listed twice with conflicting bounds.

Keep the stricter one once to avoid resolver ambiguity.

Apply this diff:
@@
-    "filelock>=3.0",
@@
-    "filelock>=3.19.1",
+    "filelock>=3.19.1",
45-45: Replace 'attr' with 'attrs' in pyproject.toml

PyPI shows 'attr' is a different project (v0.3.2); the intended runtime library is 'attrs' (latest v25.3.0). Update pyproject.toml (line 45): replace
"attr>=0.3.2"
with
"attrs>=23.2.0"

🧹 Nitpick comments (1)

pyproject.toml (1)
14-16: Remove build tools from runtime dependencies (setuptools, wheel).

They’re already in [build-system] and shouldn’t be required at runtime.

Apply this diff:
@@
-    "setuptools>=80.0",
@@
-    "wheel>=0.43",

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 870674a and d0815f2.

📒 Files selected for processing (1)

pyproject.toml (1 hunks)

🔇 Additional comments (3)

pyproject.toml (3)
48-49: Do not ship pytest as a runtime dependency; move to dev extra.

Keep pytest out of end-user installs; include it under [project.optional-dependencies].dev.

Apply this diff:
@@
-    "pytest>=8.0"
+    "pytest>=8.0",
@@
 dev = [
     "ipykernel",
-    "ipython"
+    "ipython",
+    "pytest>=8.0"
 ]
And remove pytest from [project].dependencies:
@@
-    "pytest>=8.0"
32-37: Constraints are satisfiable — PyPI publishes 2025+ releases

fsspec latest: 2025.9.0; regex latest: 2025.9.18 — the >=2025.0 floors are satisfiable.

61-62: LGTM: CUDA extra now pins liger-kernel to a fixed-good range.

Sandbox couldn't fetch PyPI metadata (SSL certificate verification failed), so verification couldn't be completed here — confirm instructlab/instructlab-training doesn't force a conflicting liger-kernel version and that both CPU- and CUDA-only Training Hub images resolve correctly.

Maxusmusti

LGTM, thanks!

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Maxusmusti requested changes Sep 22, 2025

View reviewed changes

Fiona-Waters force-pushed the liger-kernel branch from 7086176 to 870674a Compare September 23, 2025 16:07

Pinning liger-kernal version

d0815f2

Fiona-Waters force-pushed the liger-kernel branch from 870674a to d0815f2 Compare September 23, 2025 16:08

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

Maxusmusti approved these changes Sep 23, 2025

View reviewed changes

Maxusmusti merged commit fc2175d into Red-Hat-AI-Innovation-Team:main Sep 23, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pinning liger-kernal version #9

Pinning liger-kernal version #9

Uh oh!

Fiona-Waters commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Maxusmusti left a comment

Uh oh!

Fiona-Waters commented Sep 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Maxusmusti left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pinning liger-kernal version #9

Pinning liger-kernal version #9

Uh oh!

Conversation

Fiona-Waters commented Sep 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Maxusmusti left a comment

Choose a reason for hiding this comment

Uh oh!

Fiona-Waters commented Sep 23, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Maxusmusti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fiona-Waters commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading