Skip to content

Simplify wait in slices logic. NFC. #15739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 20, 2022

Conversation

kleisauke
Copy link
Collaborator

Split from PR #13007.

@@ -88,17 +88,16 @@ int pthread_barrier_wait(pthread_barrier_t *b)
a_spin();
a_inc(&inst->finished);
#ifdef __EMSCRIPTEN__
int is_runtime_thread = emscripten_is_main_runtime_thread();
const int is_runtime_thread = emscripten_is_main_runtime_thread();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think think this make any difference for basic types like int.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, reverted with commit 4de581a.

// If we have less than this many msecs left to wait, busy spin that instead.
const double minimumTimeSliceToSleep = 0.1;
const double minTimeSliceToSleep = 0.1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets use snake case since we are in C and drop the const. e.g. double min_time_slice_to_sleep. Lets make that into its own cleanup PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split that into its own PR. Although, this file contains many camelCase variables. Maybe we should do that only for the code we touch in musl?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this camelCase exists here because it was ported from JS where we do use camelCase. How about doing a naming-only PR.. either before or after this one is fine with.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming-only PR sounds good to me.

@@ -640,7 +634,7 @@ void emscripten_current_thread_process_queued_calls() {
pthread_mutex_unlock(&call_queue_lock);

// If the queue was full and we had waiters pending to get to put data to queue, wake them up.
emscripten_futex_wake((void*)&q->call_queue_head, 0x7FFFFFFF);
emscripten_futex_wake((void*)&q->call_queue_head, INT_MAX);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be part of the cleanup PR too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted with commit 1650785, I'll make a PR with these changes only.

timeoutMSecs = timeoutUntilTime - emscripten_get_now();
} while (!done && timeoutMSecs > 0);

emscripten_set_current_thread_status(EM_THREAD_STATUS_RUNNING);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the something else going on here other than some re-ordering of the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidation of the if-statements makes this diff difficult to read. Reverted with commit d78e59f for now.


emscripten_set_current_thread_status(EM_THREAD_STATUS_RUNNING);

return done ? EMSCRIPTEN_RESULT_SUCCESS : EMSCRIPTEN_RESULT_TIMED_OUT;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can go in the cleanup PR too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted with commit d78e59f for now, I'll make a PR later with this change.

@@ -95,11 +94,13 @@ int __timedwait_cp(volatile int *addr, int val,
r = -__futex4_cp(addr, FUTEX_WAIT|priv, val, top);
#endif
if (r != EINTR && r != ETIMEDOUT && r != ECANCELED) r = 0;
#ifndef __EMSCRIPTEN__ // XXX Emscripten revert musl commit a63c0104e496f7ba78b64be3cd299b41e8cd427f
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change doing? Can/should it split into its own PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was excluding code that is not related to Emscripten. Reverted with commit 6ea3ad3 for now.


int __timedwait_cp(volatile int *addr, int val,
clockid_t clk, const struct timespec *at, int priv)
{
int r;
struct timespec to, *top=0;

#ifndef __EMSCRIPTEN__
if (priv) priv = FUTEX_PRIVATE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary code size optimization change, reverted with commit 6ea3ad3.

const int is_runtime_thread = emscripten_is_main_runtime_thread();

// Main runtime thread may need to run proxied calls, so sleep in very small slices to be responsive.
const double maxMsecsToSleep = is_runtime_thread ? 1 : 100;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split out hoisting of self into a separate PR? (combine with use of self in
system/lib/libc/musl/src/thread/__wait.c)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted with commit a9550ff. I'll make a new PR with this.

const int is_runtime_thread = emscripten_is_main_runtime_thread();

// Main runtime thread may need to run proxied calls, so sleep in very small slices to be responsive.
const double maxMsecsToSleep = is_runtime_thread ? 1 : 100;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use snake_case in C code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 190b6b9 fixes this for the changed lines.

@kleisauke kleisauke force-pushed the simplify-wait-slices branch 3 times, most recently from fbdfa0c to 648a20e Compare January 12, 2022 13:59
@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2022

LGTM in general.

I have been thinking we can probable followup by removing a lot of this code... since emscripten_futex_wait itself already does time slicing on the main thread, I'm not sure its need at all in these places!

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2022

@juj, what do you think about this? If emscripten_futex_wait already does time slicing do you know why we are also doing it in all these higher level places?

@kleisauke
Copy link
Collaborator Author

kleisauke commented Jan 12, 2022

I think it's still needed for cancellation points (i.e. for cases where the thread is cancelled in between). This is done in __timedwait_cp, __wait and emscripten_thread_sleep. We might experiment with calling emscripten_futex_wait directly in pthread_barrier_wait.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 12, 2022

I think it's still needed for cancellation points (i.e. for cases where the thread is cancelled in between). This is done in __timedwait_cp, __wait and emscripten_thread_sleep. We might experiment with calling emscripten_futex_wait directly in pthread_barrier_wait.

What if we adding a cancellation point to the emscripten_futex_wait? Would that be a good idea anyway?

The alternative, would be to remove the time slicing from the emscripten_futex_wait loop. I tried that here but decided it would be probably be better to keep this one and drop the others: #15975

continue;

emscripten_futex_wait(&dummyZeroAddress, 0,
ms_to_sleep > max_ms_slice_to_sleep ? max_ms_slice_to_sleep : ms_to_sleep);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the code used to be of form

emscripten_futex_wait(..., ms_to_sleep);

but it gets refactored to form

emscripten_futex_wait(..., ms_to_sleep > max_ms_slice_to_sleep ? max_ms_slice_to_sleep : ms_to_sleep);

Above in file system/lib/libc/musl/src/thread/__wait.c the code used to be

emscripten_futex_wait(..., is_runtime_thread ? 1 : 100);

but it gets refactored to

emscripten_futex_wait(..., max_ms_to_sleep);

The PR looks to me like it is massaging the code back and forth, I am not sure if I see the guiding idea here. (sorry!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, it's a bit confusing. The difference between system/lib/libc/musl/src/thread/__wait.c and system/lib/pthread/library_pthread.c is that the former needs to wait indefinitely in slices, so it's doing this:

emscripten_futex_wait(..., max_ms_slice_to_sleep);

and the later must wait in slices for a certain time, so it does this:

emscripten_futex_wait(..., MAX(ms_to_sleep, max_ms_slice_to_sleep));

I've renamed the max_ms_to_sleep variable to max_ms_slice_to_sleep with commit 5b77bd9.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid the ternary operation here. I think it makes the code easier to read if the calculation for how long to sleep is doing before the line where the sleep occurs. It also make it easier for debugging since you have a name for the actual amount of time that is we are going to sleep for.

Don't you find "ms_to_sleep > max_ms_slice_to_sleep ? max_ms_slice_to_sleep : ms_to_sleep" in addition to need to function call split over two lines harder to read?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, those long variable names make it hard to read when being used within a ternary operator. Split over 2 lines with commit f84985c to improve readability.

} while(r == ETIMEDOUT);

msecsToSleep = sleepUntilTime - emscripten_get_now();
} while (r == ETIMEDOUT && msecsToSleep > 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has the subtle difference that before, if one had a wait time of zero, then the wait loop would process emscripten_main_thread_process_queued_calls() and exit, but after this change, there is now an unconditional call out to emscripten_futex_wait(). It is uncertain what that will do, it could yield back the remaining time slice at minimum, or it could do a full kernel wait that gets triggered via the general timeout mechanism, so performance could theoretically be affected.

Copy link
Collaborator Author

@kleisauke kleisauke Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I avoided calls to emscripten_futex_wait() when msecsToSleep <= 0 with commit 54f145c and c4dfaf3.

Copy link
Collaborator

@juj juj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@juj
Copy link
Collaborator

juj commented Jan 13, 2022

What if we adding a cancellation point to the emscripten_futex_wait?

I think it would be best not to have emscripten_futex_wait depend on the pthreads API. That was probably the guiding principle - the Emscripten API resides on a lower level that is not tied to POSIX pthreads itself.

@kleisauke kleisauke force-pushed the simplify-wait-slices branch from c4dfaf3 to 9477d68 Compare January 15, 2022 10:44
@kleisauke
Copy link
Collaborator Author

Test failures in https://circleci.com/gh/emscripten-core/emscripten/492229 looks real, I could reproduce that sometimes locally. It seems that emscripten_main_thread_process_queued_calls() must always called before emscripten_futex_wait() in order to fix that (see commit 9477d68).

@kleisauke kleisauke force-pushed the simplify-wait-slices branch from f84985c to 70bfc3b Compare January 20, 2022 12:34
@kleisauke
Copy link
Collaborator Author

Rebased to fix code size test failures in test_minimal_runtime_code_size_hello_webgl{,2}_wasm2js (commit f1fbd4c).

@sbc100
Copy link
Collaborator

sbc100 commented Jan 20, 2022

So is this change now completely non-functional? How sure are you that there are no observable changes?

Its not clear to me exactly what we gain from landing this PR? Is the idea just to make variable call sites be more consistent?
Or is there a codesize or readability improvement here?

I'm not against landing this if it really is NFC.

As it's essentially a bug fix.
@kleisauke
Copy link
Collaborator Author

So is this change now completely non-functional? How sure are you that there are no observable changes?

I think all changes are completely non-functional expect for the pthread_barrier_wait.c change (which unifies the emscripten_main_thread_process_queued_calls and emscripten_futex_wait call order). That change looks like a bug fix, as the commit prior to 40064d6 where causing test failures.

Its not clear to me exactly what we gain from landing this PR? Is the idea just to make variable call sites be more consistent?
Or is there a codesize or readability improvement here?

You're right, the current state of this PR is that there is not much to gain other than readability and consistency. The reason for this PR was mainly the now-reverted 21c0823 commit, which I'll open a PR for later.

I'm not against landing this if it really is NFC.

I completely understand that, I just reverted the pthread_barrier_wait.c change with commit ecbd5fa.

@kleisauke kleisauke changed the title Simplify/unify wait in slices logic. NFC. Simplify wait in slices logic. NFC. Jan 20, 2022
}
now = emscripten_get_now();
};
emscripten_futex_wait(&dummyZeroAddress, 0, ms_to_sleep);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop condition here seems a little odd to me. It doesn't seem to take into account that time has elapsed during emscripten_futex_wait. Perhaps adding ms_to_sleep = target - now; as the very last thing line of the loop would fix it? But then we are kind of re-using the single ms_to_sleep to mean two different things: How long to sleep for this loop cylcle, and the total remain time to sleep.. Maybe we should keep those things logically separate for readability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 7d74c53 takes the elapsed time during emscripten_futex_wait into account.

@sbc100 sbc100 enabled auto-merge (squash) January 20, 2022 19:47
@sbc100 sbc100 merged commit ba84d40 into emscripten-core:main Jan 20, 2022
@kleisauke kleisauke deleted the simplify-wait-slices branch January 29, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants