Speed up coh_tmm in tmm_core_vec; introduced versions with cpu and gpu parallelization#271
Open
griddler-j wants to merge 12 commits intoqpv-research-group:developfrom
Open
Speed up coh_tmm in tmm_core_vec; introduced versions with cpu and gpu parallelization#271griddler-j wants to merge 12 commits intoqpv-research-group:developfrom
griddler-j wants to merge 12 commits intoqpv-research-group:developfrom
Conversation
added 7 commits
March 24, 2024 00:37
Member
|
sorry for my lack of input on this. I was wondering, for the parallelisation, what do you think the best way to incorporate this would be? I guess there are now three options: no parallelisation (i.e. just the old implementation), GPU parallelisation, or CPU parallelisation. Then there's the detailed vs. non-detailed mode. The user should be able to choose which one they want to use, but it would be better not to have three different files tmm_core_vec files, since most of the content is the same anyway. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
s and p polarization, 10000 wavelength x angles, 6 layers, calculate coh_tmm:
before speed increase: 4.178s
after speed increase: 2.399s
after speed increase, non-detailed mode: 1.948s
CPU parallelization with 24 cores
CPU parallelization: 0.844s
CPU parallelization, non-detailed mode: 0.719s
GPU parallelization with NVIDIA GeForce RTX 4060
GPU parallelization: 0.296s
GPU parallelization, non-detailed mode: 0.118s