-
Notifications
You must be signed in to change notification settings - Fork 0
Model Architecture Configuration specification #9
Comments
I feel the same dilemma.
I think this can be useful if users want to build un-trained model.
I think you can add a custom attribute like |
That's right. An alternative would to to somehow inform the function not to load the weights (in which case the architecture config would still be able to construct the right model but with uninitialized weights). Though this might create confusion in user's mind because the idea with Weights is really to serve multiple pre-trained checkpoints. One alternative would be to call it ModelConstructs or something more generic instead of weights. Then it might be safe to provide additional keyword arguments in factory function like
|
In Vision we have no choice but to have a builder per model variant. This is because we need to maintain BC, so that's not something we can change. In your cases, you might be able to keep things in a single builder method but that's an implementation detail that is in your control and beyond the scope of this proposal. Note that having one builder has pros/cons.
We don't actually have much redundant code. We typically end up calling a single private builder method. We just have multiple public interfaces for each supported method. See here.
That is correct. If you have
On the other hand, introducing a separate
Let's take a step back. The introduction of |
Since the topic of attaching the model building config inside the weights keeps coming up in our discussions, I think it's worth writing in detail why I think this is not a good idea. Here by model building config, I refer to all the params passed to the constructor of the Model class to build it. It includes things like model hyper-parameters, layer configuration etc: dapi-model-versioning/dapi_lib/models/resnet.py Lines 21 to 28 in c7f9302
As you recall each model builder method expects a specific Weights Enum for the specific model and that's how the two are associated. So instead of the weights data class storing the building config, we "store" this to the model builder method and just link to it: dapi-model-versioning/dapi_lib/models/resnet.py Lines 46 to 47 in c7f9302
This is useful because:
If you want to remove the model building config from the model builder method, one option is to store this information on its own on a separate Enum or Data class or dictionary. Here is what audio is currently doing: dapi-model-versioning/dapi_lib/models/tacotron2.py Lines 17 to 39 in c7f9302
Though this can be an interesting idea, I think that this is beyond the scope of this RFC and should be handle separately. IMO this is something that is on the control of the Domain libraries and as long as this info is not dumped in the weights (for the reasons I explained above) they should be able to use a solution that meets their needs. |
One of the common cases in text is to define base model architecture and create bigger version just by increasing the number of parameters in terms of number of layers, hidden dimensions etc. Take XLMR Model for instance. There are four variations of the model dubbed as "xlmr.base", "xlmr.large", "xlmr.xl", and "xlmr.xxl"
One way to provide these models to users is to have 4 different factory functions for each one of them. But the code is highly redundant since the only different here is the input configuration. One of the better ways would be to encode this information directly inside Weights Enum, such that user facing function only need to specify which weights to use, and internally model factory function will create the corresponding architecture for the user.
I wonder if conceptually
meta
argument is the right place to specify model configuration, or is it only reserved for informative attributes?The text was updated successfully, but these errors were encountered: