Optimize the grid dimensionality during KANLayer initialization to reduce memory/GPU usage significantly and greatly reduce the initialization time of KANLayer.#378
Open
congyue1977 wants to merge 12 commits into
Conversation
…duce memory/GPU usage significantly and greatly reduce the initialization time of KANLayer. In the initialization process of KANLayer, since the knots vector of B-Splines is constructed based on the grid_range parameter, it is identical across all input dimensions (in_dim). This means the data in the grid is redundant, so simply setting the size of the first dimension to 1 suffices. Subsequent calculations will automatically utilize tensor broadcasting and will not affect the grid update process. This optimization reduces memory or GPU usage significantly. After optimization, each layer of KANLayer can save (in_dim-1) * (G+2k+1) memory. If the depth is N and input dimensions are the same, this can save N*(in_dim-1) * (G+2k+1). Furthermore, this optimization drastically reduces the initialization time of KANLayer, improving network efficiency. Through testing, with a large G, for example 100, and a width of [4,100,100,100,1] with k=3 for KAN, before optimization, it took nearly 30s to start training on an Intel i9-12900K. After optimization, training starts in less than 1s.
…ate grid to run on GPUs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the initialization process of KANLayer, since the knots vector of B-Splines is constructed based on the grid_range parameter, it is identical across all input dimensions (in_dim). This means the data in the grid is redundant, so simply setting the size of the first dimension to 1 suffices. Subsequent calculations will automatically utilize tensor broadcasting and will not affect the grid update process.
This optimization reduces memory or GPU usage significantly. After optimization, each layer of KANLayer can save (in_dim-1) * (G+2k+1) memory. If the depth is N and input dimensions are the same, this can save N*(in_dim-1) * (G+2k+1).
Furthermore, this optimization drastically reduces the initialization time of KANLayer, improving network efficiency. Through testing, with a large G, for example 100, and a width of [4,100,100,100,1] with k=3 for KAN, before optimization, it took nearly 30s to start training on an Intel i9-12900K. After optimization, training starts in less than 1s.