Skip to content

Task.sleep(nanoseconds:) causes a crash after resuming #3646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #3604
kateinoigakukun opened this issue Oct 3, 2021 · 3 comments
Closed
Tracked by #3604

Task.sleep(nanoseconds:) causes a crash after resuming #3646

kateinoigakukun opened this issue Oct 3, 2021 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@kateinoigakukun
Copy link
Member

kateinoigakukun commented Oct 3, 2021

Somehow it doesn't work due to a mysterious unreachable instruction at the end of thunk function.

The minimal reproducible code and steps are here.

@main
struct Main {
    static func main() async throws {
        try await Task.sleep(nanoseconds: 1_000_000_000)
    }
}
$ /Library/Developer/Toolchains/swift-wasm-5.5-SNAPSHOT-2021-10-02-a.xctoolchain/usr/bin/swiftc -target wasm32-unknown-wasi main.swift -parse-as-library
$ wasminspect  ./main
(wasminspect) run
Function exec failure unreachable
(wasminspect) bt
0: $sScTss5NeverORszABRs_rlE5sleep11nanosecondsys6UInt64V_tYaKFZyyYaKXEfU_ySccyyts5Error_pGXEfU_yyYacfU_TA
1: $sIegH_s5Error_pIegHzo_TRTA
2: $sIeghH_ytIeghHr_TRTATm
3: $ss5Error_pIegHzo_ytsAA_pIegHrzo_TRTA
4: future_adapter(swift::AsyncContext*)
5: swift::runJobInEstablishedExecutorContext(swift::Job*)
6: swift_job_run
7: swift::donateThreadToGlobalExecutorUntil(bool (*)(void*), void*)
8: swift_task_asyncMainDrainQueue
9: $ss13_runAsyncMainyyyyYaKcF
10: $s4main4MainV5$mainyyKFZ
11: main
12: main
13: __main_void
14: __original_main
15: _start

The unreachable instruction is located here.

37f3eb func[21885] <$sScTss5NeverORszABRs_rlE5sleep11nanosecondsys6UInt64V_tYaKFZyyYaKXEfU_ySccyyts5Error_pGXEfU_yyYacfU_TA>:
 37f3ec: 02 7f                      | local[0..1] type=i32
 37f3ee: 20 01                      | local.get 1
 37f3f0: 28 02 08                   | i32.load 2 8
 37f3f3: 21 04                      | local.set 4
 37f3f5: 20 00                      | local.get 0
 37f3f7: 41 00                      | i32.const 0
 37f3f9: 28 02 fc b6 8a 80 00       | i32.load 2 170876
 37f400: 20 01                      | local.get 1
 37f402: 20 01                      | local.get 1
 37f404: 10 f1 ad 81 80 00          | call 22257 <swift_task_alloc>
 37f40a: 22 01                      | local.tee 1
 37f40c: 36 02 0c                   | i32.store 2 12
 37f40f: 20 01                      | local.get 1
 37f411: 41 ca dc 80 80 00          | i32.const 11850
 37f417: 36 02 04                   | i32.store 2 4
 37f41a: 20 01                      | local.get 1
 37f41c: 20 00                      | local.get 0
 37f41e: 36 02 00                   | i32.store 2 0
 37f421: 20 01                      | local.get 1
 37f423: 20 04                      | local.get 4
 37f425: 20 01                      | local.get 1
 37f427: 20 01                      | local.get 1
 37f429: 10 fc a5 81 80 00          | call 21244 <$sScTss5NeverORszABRs_rlE5sleep11nanosecondsys6UInt64V_tYaKFZyyYaKXEfU_ySccyyts5Error_pGXEfU_yyYacfU_>
 37f42f: 00                         | unreachable <<< Why unreachable here?
 37f430: 0b                         | end

At first, we should check if this could happen on Linux with coop global executor also. I've confirmed it doesn't happen on the main channel on Linux with coop, but have not confirmed for 5.5 release channel yet.

Update (2021-10-04)

The unreachable instruction is inserted here. But I don't understand the reason and LLVM coroutine intrinsic, so I'll learn around it https://github.com/apple/swift/blob/7eb0ca1d47d921d50631b1d74f2670b323d085ed/lib/IRGen/GenCall.cpp#L4932

@kateinoigakukun
Copy link
Member Author

kateinoigakukun commented Oct 10, 2021

Investigation Update (2021-10-10)

I created a minimum reproducible LLVM IR code based on TaskSleep.swift. (At first, I used llvm-extract, but it doesn't help me because coroutine lowering assumes some inlinability and constantability of global metadata, so I did it by hand...)

Here is the IR snippet to reproduce and simple test case to assert unreachable inst at the end of the function.
https://gist.github.com/kateinoigakukun/8da2e2a8c257e9855f15b001e5c9c3d6

From this investigation, I guess there is a root reason in LLVM.

Based on print debugging, before and after of this line, unreachable basic blocks in the CFG are removed and also unreachable inst is added at the end of the function

https://github.com/swiftwasm/llvm-project/blob/f17a3166ba4f34e7a34c90b0e31c55d7828a00b1/llvm/lib/Transforms/Coroutines/CoroSplit.cpp#L1819

before after
sScTss5NeverORszABRs_rlE5sleep11nanosecondsys6UInt64V_tYaKFZyyYaKXEfU_ySccyyts5Error_pGXEfU_yyYacfU_TA after_removeUnreachableBlocks_sScTss5NeverORszABRs_rlE5sleep11nanosecondsys6UInt64V_tYaKFZyyYaKXEfU_ySccyyts5Error_pGXEfU_yyYacfU_TA

I'll dig into the algorithm of removeUnreachableBlocks

@kateinoigakukun
Copy link
Member Author

Investigation Update (2021-10-11)

At first, I forgot to mention that this issue is only reproduced with -O.

Based on print debugging, before and after of this line, unreachable basic blocks in the CFG are removed and also unreachable inst is added at the end of the function

From further investigation, that's because the call instruction at the end of BB calls a noreturn function directly, so it's ok to put unreachable here.

In no-optimization mode, the callee function is not marked as noreturn, so removeUnreachableBlocks doesn't put unreachable, and it works well.

Then, why the function is marked as noreturn only under optimization? That's because inlining and constant propagation help influencing the effect of the unreachable instruction emitted here

The unreachable instruction has right semantics when it's put before musttail call instruction because it doesn't return control to the caller. But some architectures like WebAssembly don't support such tail call, so the frontend emit the call with tail instead of musttail and it's lowered into a normal call instruction. In such case, unreachable instruction after the call is ill formed.

I fixed this issue by putting ret void instead of unreachable when the target doesn't support tail call.
#3688

@MaxDesiatov
Copy link

closing this as resolved for now, or should we wait until upstream PR is merged and then propagated to the swiftwasm branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants