Skip to content

dynamic routing #1

@InnerPeace-Wu

Description

@InnerPeace-Wu

thx for the amazing work you've done. Since i adapted dynamic routing from your code, and I wanna share some of my ideas about it. Here is my repo with tentorflow
bias updating
you mentioned that you fix bias to 0, but during dynamic routing you are updating it, is that so? code: here and here.
In my opinion, the bias should not be updated, since it's just the initial value for dynamic routing, with your implementation, you will update bias every time you send in some data, even with Variable be set as trainable=False, and of course, the same thing goes for testing procedure. I think the easiest way is make a temporal variable with temp_bias = bias, and use it for dynamic routing.
bias summing
code here, it seems that you are trying to keep the shape of bias as [num_caps, 10], and you sum over all the training examples. I think that's problematic. The paper mentioned that bias is independent from image, but during routing, capsule prediction from layer below varies for different image, so the updated bias should be different too. After bias updated, the shape of bias should be [batch_size, caps, 10].

I tried with 3 iterations of dynamic routing, after less than 4 epoch (2k iters) the validation accuracy is 99.16, it seems working. Still not as efficient as the paper said.
But i got a huge problem that training procedure is slow, with almost 2s per iteration with batch_size 100 in Nvidia 1060, which way more than yours.

Just some of my ideas, glad to discuss with you.
best.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions