TypeError: Conv2D.__init__() takes from 4 to 8 positional arguments but 12 were given

Hi.
When running [tutorial_transformer.ipynb](https://colab.research.google.com/github/PaddlePaddle/PaddleSpeech/blob/master/docs/tutorial/asr/tutorial_transformer.ipynb) notebook on the line `model = U2Model.from_config(model_conf)` the `TypeError: Conv2D.__init__() takes from 4 to 8 positional arguments but 12 were given` raised:
```python
model_conf cmvn_file: None
cmvn_file_type: json
decoder: transformer
decoder_conf:
  attention_heads: 4
  dropout_rate: 0.1
  linear_units: 2048
  num_blocks: 6
  positional_dropout_rate: 0.1
  self_attention_dropout_rate: 0.0
  src_attention_dropout_rate: 0.0
encoder: transformer
encoder_conf:
  attention_dropout_rate: 0.0
  attention_heads: 4
  dropout_rate: 0.1
  input_layer: conv2d
  linear_units: 2048
  normalize_before: True
  num_blocks: 12
  output_size: 256
  positional_dropout_rate: 0.1
input_dim: 80
model_conf:
  ctc_weight: 0.3
  length_normalized_loss: False
  lsm_weight: 0.1
output_dim: 4233
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[/tmp/ipython-input-1986631129.py](https://localhost:8080/#) in <cell line: 0>()
      5 model_conf.output_dim = 4233
      6 print ("model_conf", model_conf)
----> 7 model = U2Model.from_config(model_conf)

6 frames
[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/models/u2/u2.py](https://localhost:8080/#) in from_config(cls, configs)
    960             nn.Layer: U2Model
    961         """
--> 962         model = cls(configs)
    963         return model
    964 

[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/models/u2/u2.py](https://localhost:8080/#) in __init__(self, configs)
    862         init_type = model_conf.get("init_type", None)
    863         with DefaultInitializerContext(init_type):
--> 864             vocab_size, encoder, decoder, ctc = U2Model._init_from_config(
    865                 configs)
    866         super().__init__(

[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/models/u2/u2.py](https://localhost:8080/#) in _init_from_config(cls, configs)
    904         logger.debug(f"U2 Encoder type: {encoder_type}")
    905         if encoder_type == 'transformer':
--> 906             encoder = TransformerEncoder(
    907                 input_dim, global_cmvn=global_cmvn, **configs['encoder_conf'])
    908         elif encoder_type == 'conformer':

[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/modules/encoder.py](https://localhost:8080/#) in __init__(self, input_size, output_size, attention_heads, linear_units, num_blocks, dropout_rate, positional_dropout_rate, attention_dropout_rate, input_layer, pos_enc_layer_type, normalize_before, concat_after, static_chunk_size, use_dynamic_chunk, global_cmvn, use_dynamic_left_chunk)
    372         See Encoder for the meaning of each parameter.
    373         """
--> 374         super().__init__(input_size, output_size, attention_heads, linear_units,
    375                          num_blocks, dropout_rate, positional_dropout_rate,
    376                          attention_dropout_rate, input_layer,

[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/modules/encoder.py](https://localhost:8080/#) in __init__(self, input_size, output_size, attention_heads, linear_units, num_blocks, dropout_rate, positional_dropout_rate, attention_dropout_rate, input_layer, pos_enc_layer_type, normalize_before, concat_after, static_chunk_size, use_dynamic_chunk, global_cmvn, use_dynamic_left_chunk, max_len)
    136 
    137         self.global_cmvn = global_cmvn
--> 138         self.embed = subsampling_class(
    139             idim=input_size,
    140             odim=output_size,

[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/modules/subsampling.py](https://localhost:8080/#) in __init__(self, idim, odim, dropout_rate, pos_enc_class)
    112         super().__init__(pos_enc_class)
    113         self.conv = nn.Sequential(
--> 114             Conv2D(1, odim, 3, 2),
    115             nn.ReLU(),
    116             Conv2D(odim, odim, 3, 2),

[/usr/local/lib/python3.12/dist-packages/paddlespeech/s2t/modules/align.py](https://localhost:8080/#) in __init__(self, in_channels, out_channels, kernel_size, stride, padding, dilation, groups, padding_mode, weight_attr, bias_attr, data_format)
    163                         negative_slope=math.sqrt(5),
    164                         nonlinearity='leaky_relu'))
--> 165         super(Conv2D, self).__init__(
    166             in_channels, out_channels, kernel_size, stride, padding, dilation,
    167             groups, padding_mode, weight_attr, bias_attr, data_format)

TypeError: Conv2D.__init__() takes from 4 to 8 positional arguments but 12 were given
```

<img width="1265" height="705" alt="Image" src="https://github.com/user-attachments/assets/5bb2910f-293e-434e-96cc-2a15e66d9f74" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TypeError: Conv2D.init() takes from 4 to 8 positional arguments but 12 were given #4129

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TypeError: Conv2D.__init__() takes from 4 to 8 positional arguments but 12 were given #4129

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

TypeError: Conv2D.init() takes from 4 to 8 positional arguments but 12 were given #4129