Skip to content

Fix ESM2 + Geneformer regression #863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dorotat-nv
Copy link
Collaborator

Description

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

TODO: Add code snippet

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@codecov-commenter
Copy link

codecov-commenter commented May 8, 2025

❌ 10 Tests Failed:

Tests completed Failed Passed Skipped
1002 10 992 22
View the top 3 failed test(s) by shortest run time
sub-packages/bionemo-evo2/tests/bionemo/evo2/test_hyena_operators.py::TestParallelHyenaOperator::test_initialization
Stack Traces | 0.063s run time
self = <test_hyena_operators.TestParallelHyenaOperator object at 0x7f12b3f08440>
transformer_config = TransformerConfig(tensor_model_parallel_size=1, pipeline_model_parallel_comm_backend=None, pipeline_model_parallel_siz...groups=8, mamba_num_heads=None, use_mamba_mem_eff_path=True, mlp_chunks_for_prefill=1, heterogeneous_block_specs=False)
hyena_config = HyenaConfig(tie_projection_weights=False, to_upper='normalized_weighted', lowercase_loss_reweighting=0.1, short_conv_L...onv_mixer=False, hyena_short_conv_pregate=True, hyena_short_conv_postgate=True, proj_groups=1, grouped_attention=False)

    @pytest.fixture
    def operator(self, transformer_config: TransformerConfig, hyena_config: HyenaConfig) -> ParallelHyenaOperator:
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           yield ParallelHyenaOperator(
                hidden_size=transformer_config.hidden_size,
                transformer_config=transformer_config,
                hyena_config=hyena_config,
                max_sequence_length=1024,
                operator_type="hyena_medium_conv",
                init_method="small_init",
            )

.../bionemo/evo2/test_hyena_operators.py:47: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.12.../megatron/hyena/hyena_utils.py:797: in __init__
    self.conv_bias.data = conv_init_method(self.conv_bias.data)
.../local/lib/python3.12.../torch/nn/init.py:166: in uniform_
    return _no_grad_uniform_(tensor, a, b, generator)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tensor = tensor([-0.0000e+00, -7.8740e-05, -1.5748e-04, -2.3622e-04, -3.1496e-04,
        -3.9370e-04, -4.7244e-04, -5.5118e-04...3, -7.4713e-03, -7.5552e-03, -7.6392e-03,
        -7.7231e-03, -7.8071e-03, -7.8910e-03, -7.9750e-03], device='cuda:0')
a = -0.08838834764831845, b = 0.08838834764831845, generator = None

    def _no_grad_uniform_(tensor, a, b, generator=None):
        with torch.no_grad():
>           return tensor.uniform_(a, b, generator=generator)
E           RuntimeError: Offset increment outside graph capture encountered unexpectedly.

.../local/lib/python3.12.../torch/nn/init.py:17: RuntimeError
sub-packages/bionemo-evo2/tests/bionemo/evo2/test_hyena_operators.py::TestParallelCausalDepthwiseConv1d::test_gpu_forward
Stack Traces | 0.064s run time
self = <test_hyena_operators.TestParallelCausalDepthwiseConv1d object at 0x7f12b3f0a870>
transformer_config = TransformerConfig(tensor_model_parallel_size=1, pipeline_model_parallel_comm_backend=None, pipeline_model_parallel_siz...groups=8, mamba_num_heads=None, use_mamba_mem_eff_path=True, mlp_chunks_for_prefill=1, heterogeneous_block_specs=False)
hyena_config = HyenaConfig(tie_projection_weights=False, to_upper='normalized_weighted', lowercase_loss_reweighting=0.1, short_conv_L...onv_mixer=False, hyena_short_conv_pregate=True, hyena_short_conv_postgate=True, proj_groups=1, grouped_attention=False)

    @pytest.fixture
    def operator(
        self, transformer_config: TransformerConfig, hyena_config: HyenaConfig
    ) -> ParallelCausalDepthwiseConv1d:
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           yield ParallelCausalDepthwiseConv1d(
                d_model=transformer_config.hidden_size,
                transformer_config=transformer_config,
                hyena_config=hyena_config,
                kernel_size=hyena_config.short_conv_L,
                init_method=transformer_config.init_method,
                bias=hyena_config.conv_proj_bias,
                use_fast_causal_conv=hyena_config.fast_conv_proj,
            )

.../bionemo/evo2/test_hyena_operators.py:167: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.12.../megatron/hyena/hyena_utils.py:1070: in __init__
    initialize_affine_weight_gpu(self.short_conv_weight, conv_init_method, partition_dim=0)
.../local/lib/python3.12.../megatron/hyena/hyena_utils.py:688: in initialize_affine_weight_gpu
    init_method(weight.data)  # modify the data in place
.../local/lib/python3.12.../torch/nn/init.py:166: in uniform_
    return _no_grad_uniform_(tensor, a, b, generator)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tensor = tensor([[-0.0000, -0.0002, -0.0003],
        [-0.0005, -0.0006, -0.0008],
        [-0.0009, -0.0011, -0.0012],
       ...-0.0043, -0.0045, -0.0047],
        [-0.0049, -0.0051, -0.0053],
        [-0.0055, -0.0057, -0.0059]], device='cuda:0')
a = -0.5773502691896257, b = 0.5773502691896257, generator = None

    def _no_grad_uniform_(tensor, a, b, generator=None):
        with torch.no_grad():
>           return tensor.uniform_(a, b, generator=generator)
E           RuntimeError: Offset increment outside graph capture encountered unexpectedly.

.../local/lib/python3.12.../torch/nn/init.py:17: RuntimeError
sub-packages/bionemo-evo2/tests/bionemo/evo2/test_hyena_operators.py::TestParallelShortHyenaOperatorWithConvBias::test_gpu_forward
Stack Traces | 0.065s run time
self = <test_hyena_operators.TestParallelShortHyenaOperatorWithConvBias object at 0x7f12b3f08aa0>
transformer_config = TransformerConfig(tensor_model_parallel_size=1, pipeline_model_parallel_comm_backend=None, pipeline_model_parallel_siz...groups=8, mamba_num_heads=None, use_mamba_mem_eff_path=True, mlp_chunks_for_prefill=1, heterogeneous_block_specs=False)
hyena_config = HyenaConfig(tie_projection_weights=False, to_upper='normalized_weighted', lowercase_loss_reweighting=0.1, short_conv_L...onv_mixer=False, hyena_short_conv_pregate=True, hyena_short_conv_postgate=True, proj_groups=1, grouped_attention=False)

    @pytest.fixture
    def operator(self, transformer_config: TransformerConfig, hyena_config: HyenaConfig) -> ParallelShortHyenaOperator:
        with megatron_parallel_state_utils.distributed_model_parallel_state():
>           yield ParallelShortHyenaOperator(
                hidden_size=transformer_config.hidden_size,
                transformer_config=transformer_config,
                hyena_config=hyena_config,
                init_method="small_init",
                short_conv_class=ParallelCausalDepthwiseConv1d,
                use_fast_causal_conv=False,
                local_init=False,
                use_conv_bias=True,
            )

.../bionemo/evo2/test_hyena_operators.py:125: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.12.../megatron/hyena/hyena_utils.py:934: in __init__
    self.short_conv = short_conv_class(
.../local/lib/python3.12.../megatron/hyena/hyena_utils.py:1070: in __init__
    initialize_affine_weight_gpu(self.short_conv_weight, conv_init_method, partition_dim=0)
.../local/lib/python3.12.../megatron/hyena/hyena_utils.py:688: in initialize_affine_weight_gpu
    init_method(weight.data)  # modify the data in place
.../local/lib/python3.12.../torch/nn/init.py:166: in uniform_
    return _no_grad_uniform_(tensor, a, b, generator)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

tensor = tensor([[[-0.0000, -0.0004, -0.0009,  ..., -0.0017, -0.0021, -0.0026]],

        [[-0.0030, -0.0034, -0.0038,  ..., -0..., -0.0161, -0.0168]],

        [[-0.0176, -0.0183, -0.0190,  ..., -0.0204, -0.0211, -0.0218]]],
       device='cuda:0')
a = -0.5773502691896257, b = 0.5773502691896257, generator = None

    def _no_grad_uniform_(tensor, a, b, generator=None):
        with torch.no_grad():
>           return tensor.uniform_(a, b, generator=generator)
E           RuntimeError: Offset increment outside graph capture encountered unexpectedly.

.../local/lib/python3.12.../torch/nn/init.py:17: RuntimeError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants