Skip to content

Conversation

benfred
Copy link
Owner

@benfred benfred commented Jan 14, 2022

On the CPU, we were not achieving full thread utilization. The problem seems to be that
using the 'guided' schedule in openmp assigns a large range of ids to a single thread,
and its pretty common to have ids sorted by frequency. This caused 1 thread to process
all the high activity users/items - and left the others starved for work as they finished
early.

Fix by switching to a dynamic openmp schedule. For the ALS model on cpu, this change is 3.8x
faster on training movielens-20m, and 2.2x faster training the Github Stars dataset -
while being neutral on lastfm. For the cosine model, this change is 2.2x training movielens20m.

  • 'schedule=dynamic' times:
    • ALS model:
      • GitHub 60.96 s/it
      • Movielens: 1.05 it/s
      • LastFM 2.19 s/it
    • Cosine model:
      • Movielens: 49603.48 it/s
  • 'schedule=guided' openmp times (before this change):
    • ALS model:
      • Github: 133 s/it
      • Movielens: 3.62 s/it
      • Lastfm: 2.19 s/it
    • Cosine model:
      • Movielens 22804.29 it/s

On the CPU, we were not achieving full thread utilization. The problem seems to be that
using the 'guided' schedule in openmp assigns a large range of ids to a single thread,
and its pretty common to have ids sorted by frequency. This caused 1 thread to process
all the high activity users/items - and left the others starved for work as they finished
early.

Fix by switching to a dynamic openmp schedule. For the ALS model on cpu, this change is 3.8x
faster on training movielens-20m, and 2.2x faster training the Github Stars dataset -
while being neutral  on lastfm. For cosine the model, this change is 2.2x training movielens20m.

* 'schedule=dynamic' times
    * ALS model:
	  * GitHub 60.96 s/it
	  * Movielens: 1.05 it/s
	  * LastFM 2.19 s/it
    * Cosine model:
	  * Movielens: 49603.48 it/s
* 'schedule=guided' openmp times (before this change)
    * ALS
	  * Github: 133 s/it
	  * Lastfm: 2.19 s/it
	  * Movielens: 3.62 s/it
    * Cosine with change
	  * Movielens 22804.29 it/s
@benfred benfred merged commit 45c85a1 into main Jan 14, 2022
@benfred benfred deleted the cpu_speedups branch January 14, 2022 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant