A network with three maxout.MaxoutLocalC01B, a maxout.Maxout and a mlp.Softmax prints Parameter and initial learning rate summary like so:
W: 0.0025
b: 0.0025
W: 0.0025
b: 0.0025
W: 0.0025
b: 0.0025
layer_4_W: 0.05
layer_4_b: 0.05
softmax_b: 0.05
softmax_W: 0.05
This is all but consistent. I like the layer.name + '_' + W/b notation.
It is already used in most parts of the library, as grep -R "\.name = " shows.
Would you accept a PR with that? If not, please suggest other naming convention and I would be happy to write a PR.