Skip to content

Segfault on macOS and Linux, threading or asyncio related #820

@timoffex

Description

@timoffex

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I have a PR in a different repo that rewrites some things to run an asyncio loop in a separate thread, and that PR experienced surprisingly consistent segfaults in CI. I managed to repro it locally a couple times and extract a coredump as well as the Python tracebacks (but locally it is less consistent). The segfaults stopped completely when I disabled memray in CI.

The Python stacktrace is always like this, with the "Current thread" somewhere in asyncio.run():

Python faulthandler traceback from CI
tests/unit_tests/test_lib/test_progress.py::test_remaining_operations Fatal Python error: Segmentation fault

Current thread 0x000000017d78b000 (most recent call first):
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/selectors.py", line 517 in register
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/selector_events.py", line 284 in _add_reader
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/selector_events.py", line 124 in _make_self_pipe
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/selector_events.py", line 66 in __init__
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/unix_events.py", line 64 in __init__
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/events.py", line 720 in new_event_loop
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/events.py", line 823 in new_event_loop
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 137 in _lazy_init
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 58 in __enter__
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194 in run
  File "/Users/distiller/project/wandb/sdk/lib/asyncio_compat.py", line 75 in run
  File "/Users/distiller/project/wandb/sdk/lib/asyncio_manager.py", line 233 in _main
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1012 in run
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1032 in _bootstrap

Thread 0x000000016c74f000 (most recent call first):
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 534 in read
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 567 in from_io
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 1160 in _thread_receiver
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 341 in run
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn

Thread 0x00000001e41d5ec0 (most recent call first):
  File "/opt/homebrew/Cellar/[email protected]/3.12.11/Frameworks/Python.framework/Versions/3.12/lib/python3.12/contextlib.py", line 137 in __enter__
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pytest_memray/plugin.py", line 212 in wrapper
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/python.py", line 157 in pytest_pyfunc_call
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/python.py", line 1671 in runtest
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 178 in pytest_runtest_call
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 246 in <lambda>
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 344 in from_call
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 245 in call_and_report
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 136 in runtestprotocol
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/runner.py", line 117 in pytest_runtest_protocol
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/xdist/remote.py", line 227 in run_one_test
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/xdist/remote.py", line 206 in pytest_runtestloop
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/main.py", line 343 in _main
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/main.py", line 289 in wrap_session
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/_pytest/main.py", line 336 in pytest_cmdline_main
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_callers.py", line 121 in _multicall
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/pluggy/_hooks.py", line 512 in __call__
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/xdist/remote.py", line 427 in <module>
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 1291 in executetask
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 341 in run
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 411 in _perform_spawn
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 389 in integrate_as_primary_thread
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 1273 in serve
  File "/Users/distiller/project/.nox/unit_tests-3-12/lib/python3.12/site-packages/execnet/gateway_base.py", line 1806 in serve
  File "<string>", line 8 in <module>
  File "<string>", line 1 in <module>

The C tracebacks I managed to get a couple times are all essentially the same, with Thread 3 crashing in memray::tracking_api::PythonStackTracker::emitPendingPushesAndPops():

C traceback from local run
Thread 0::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x187afd3cc __psynch_cvwait + 8
1   libsystem_pthread.dylib       	       0x187b3c0e0 _pthread_cond_wait + 984
2   Python                        	       0x1039a5e5c take_gil + 456
3   Python                        	       0x1039a65fc PyEval_RestoreThread + 24
4   Python                        	       0x103a00b54 posix_do_stat + 160
5   Python                        	       0x1039f9ce8 os_lstat + 168
6   Python                        	       0x1038d369c cfunction_vectorcall_FASTCALL_KEYWORDS + 92
7   Python                        	       0x103977414 _PyEval_EvalFrameDefault + 42244
8   Python                        	       0x10388527c method_vectorcall + 184
9   Python                        	       0x1038842f0 object_vacall + 228
10  Python                        	       0x103884538 PyObject_CallFunctionObjArgs + 56
11  tracer.cpython-312-darwin.so  	       0x1031e6cd4 CTracer_trace + 1228
12  Python                        	       0x1039bbefc call_trace_func + 116
13  Python                        	       0x1039b8548 call_one_instrument + 132
14  Python                        	       0x1039b7dcc call_instrumentation_vector + 288
15  Python                        	       0x10396d2a4 _PyEval_EvalFrameDefault + 916
16  Python                        	       0x103882944 _PyVectorcall_Call + 152
17  Python                        	       0x103979288 _PyEval_EvalFrameDefault + 50040
18  Python                        	       0x103881e60 _PyObject_FastCallDictTstate + 208
19  Python                        	       0x10388331c _PyObject_Call_Prepend + 136
20  Python                        	       0x1038f7ff8 slot_tp_call + 144
21  Python                        	       0x103881fcc _PyObject_MakeTpCall + 128
22  Python                        	       0x103977560 _PyEval_EvalFrameDefault + 42576
23  Python                        	       0x103881e60 _PyObject_FastCallDictTstate + 208
24  Python                        	       0x10388331c _PyObject_Call_Prepend + 136
25  Python                        	       0x1038f7ff8 slot_tp_call + 144
26  Python                        	       0x103882cdc _PyObject_Call + 124
27  Python                        	       0x103978e74 _PyEval_EvalFrameDefault + 48996
28  Python                        	       0x103881e60 _PyObject_FastCallDictTstate + 208
29  Python                        	       0x10388331c _PyObject_Call_Prepend + 136
30  Python                        	       0x1038f7ff8 slot_tp_call + 144
31  Python                        	       0x103881fcc _PyObject_MakeTpCall + 128
32  Python                        	       0x103977560 _PyEval_EvalFrameDefault + 42576
33  Python                        	       0x10388527c method_vectorcall + 184
34  Python                        	       0x103978e74 _PyEval_EvalFrameDefault + 48996
35  Python                        	       0x103881e60 _PyObject_FastCallDictTstate + 208
36  Python                        	       0x10388331c _PyObject_Call_Prepend + 136
37  Python                        	       0x1038f7ff8 slot_tp_call + 144
38  Python                        	       0x103881fcc _PyObject_MakeTpCall + 128
39  Python                        	       0x103977560 _PyEval_EvalFrameDefault + 42576
40  Python                        	       0x103881e60 _PyObject_FastCallDictTstate + 208
41  Python                        	       0x10388331c _PyObject_Call_Prepend + 136
42  Python                        	       0x1038f7ff8 slot_tp_call + 144
43  Python                        	       0x103881fcc _PyObject_MakeTpCall + 128
44  Python                        	       0x103977560 _PyEval_EvalFrameDefault + 42576
45  Python                        	       0x10396cca0 PyEval_EvalCode + 184
46  Python                        	       0x103968f20 builtin_exec + 448
47  Python                        	       0x103978258 _PyEval_EvalFrameDefault + 45896
48  Python                        	       0x10388527c method_vectorcall + 184
49  Python                        	       0x103978e74 _PyEval_EvalFrameDefault + 48996
50  Python                        	       0x10396cca0 PyEval_EvalCode + 184
51  Python                        	       0x1039cfccc run_eval_code_obj + 88
52  Python                        	       0x1039cddac run_mod + 132
53  Python                        	       0x1039cd3f4 PyRun_StringFlags + 124
54  Python                        	       0x103969058 builtin_exec + 760
55  Python                        	       0x1038d369c cfunction_vectorcall_FASTCALL_KEYWORDS + 92
56  Python                        	       0x103977414 _PyEval_EvalFrameDefault + 42244
57  Python                        	       0x10396cca0 PyEval_EvalCode + 184
58  Python                        	       0x1039cfccc run_eval_code_obj + 88
59  Python                        	       0x1039cddac run_mod + 132
60  Python                        	       0x1039cd3f4 PyRun_StringFlags + 124
61  Python                        	       0x1039cd320 PyRun_SimpleStringFlags + 64
62  Python                        	       0x1039f1754 Py_RunMain + 720
63  Python                        	       0x1039f1c40 pymain_main + 304
64  Python                        	       0x1039f1ce0 Py_BytesMain + 40
65  dyld                          	       0x18779ab98 start + 6076

Thread 1:
0   libsystem_kernel.dylib        	       0x187afa7dc read + 8
1   Python                        	       0x1039efab0 _Py_read + 76
2   Python                        	       0x103a1a398 _io_FileIO_readinto + 172
3   Python                        	       0x10388e240 method_vectorcall_FASTCALL_KEYWORDS_METHOD + 136
4   Python                        	       0x103884058 PyObject_VectorcallMethod + 148
5   Python                        	       0x103a1fd24 _bufferedreader_raw_read + 156
6   Python                        	       0x103a1f5e0 _bufferedreader_fill_buffer + 64
7   Python                        	       0x103a20834 _io__Buffered_read + 936
8   Python                        	       0x103978108 _PyEval_EvalFrameDefault + 45560
9   Python                        	       0x103885338 method_vectorcall + 372
10  Python                        	       0x103978e74 _PyEval_EvalFrameDefault + 48996
11  Python                        	       0x10388527c method_vectorcall + 184
12  Python                        	       0x103a4dad8 thread_run + 144
13  Python                        	       0x1039e1e10 pythread_wrapper + 48
14  libsystem_pthread.dylib       	       0x187b3bc0c _pthread_start + 136
15  libsystem_pthread.dylib       	       0x187b36b80 thread_start + 8

Thread 2:
0   libsystem_pthread.dylib       	       0x187b36b6c start_wqthread + 0

Thread 3 Crashed:
0   libsystem_kernel.dylib        	       0x187b02388 __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x187b3b88c pthread_kill + 296
2   libsystem_c.dylib             	       0x187a0cd04 raise + 32
3   Python                        	       0x1039f73a8 faulthandler_fatal_error + 416
4   libsystem_platform.dylib      	       0x187b756a4 _sigtramp + 56
5   _memray.cpython-312-darwin.so 	       0x104f676ac memray::tracking_api::PythonStackTracker::emitPendingPushesAndPops() + 2544
6   _memray.cpython-312-darwin.so 	       0x104f676ac memray::tracking_api::PythonStackTracker::emitPendingPushesAndPops() + 2544
7   _memray.cpython-312-darwin.so 	       0x104f69b84 memray::tracking_api::Tracker::trackAllocationImpl(void*, unsigned long, memray::hooks::Allocator, std::__1::optional<memray::tracking_api::NativeTrace> const&) + 100
8   _memray.cpython-312-darwin.so 	       0x104f3e42c memray::intercept::malloc(unsigned long) + 376
9   Python                        	       0x1038dbdac _PyObject_Malloc + 112
10  Python                        	       0x1039b9058 _Py_Instrument + 1932
11  Python                        	       0x10396d230 _PyEval_EvalFrameDefault + 800
12  Python                        	       0x103885338 method_vectorcall + 372
13  Python                        	       0x103978e74 _PyEval_EvalFrameDefault + 48996
14  Python                        	       0x103885338 method_vectorcall + 372
15  Python                        	       0x103a4dad8 thread_run + 144
16  Python                        	       0x1039e1e10 pythread_wrapper + 48
17  libsystem_pthread.dylib       	       0x187b3bc0c _pthread_start + 136
18  libsystem_pthread.dylib       	       0x187b36b80 thread_start + 8

Thread 4:
0   libsystem_kernel.dylib        	       0x187afd3cc __psynch_cvwait + 8
1   libsystem_pthread.dylib       	       0x187b3c0e0 _pthread_cond_wait + 984
2   libc++.1.dylib                	       0x187a6c300 std::__1::condition_variable::__do_timed_wait(std::__1::unique_lock<std::__1::mutex>&, std::__1::chrono::time_point<std::__1::chrono::system_clock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) + 104
3   _memray.cpython-312-darwin.so 	       0x104f69634 void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, memray::tracking_api::Tracker::BackgroundThread::start()::$_3>>(void*) + 200
4   libsystem_pthread.dylib       	       0x187b3bc0c _pthread_start + 136
5   libsystem_pthread.dylib       	       0x187b36b80 thread_start + 8


Thread 3 crashed with ARM Thread State (64-bit):
    x0: 0x0000000000000000   x1: 0x0000000000000000   x2: 0x0000000000000001   x3: 0x0000000000000000
    x4: 0x0000000000000073   x5: 0x0000000000000069   x6: 0x0000000000000100   x7: 0x000000039f9061f8
    x8: 0xfca33e065c7d4e37   x9: 0xfca33e05c3ed3e37  x10: 0xcccccccccccccccd  x11: 0x000000000000000a
   x12: 0x000000039f9061d1  x13: 0x0000000000000000  x14: 0x0000000000000032  x15: 0x0000000000000001
   x16: 0x0000000000000148  x17: 0x00000001f6b25558  x18: 0x0000000000000000  x19: 0x000000000000000b
   x20: 0x0000000000001843  x21: 0x000000039f9070e0  x22: 0x0000000103cb0e90  x23: 0x0000000000000000
   x24: 0x0000000000000004  x25: 0x0000000000000000  x26: 0x000000000000015c  x27: 0x000000039e45b3c0
   x28: 0x0000000103d20778   fp: 0x000000039f906250   lr: 0x0000000187b3b88c
    sp: 0x000000039f906230   pc: 0x0000000187b02388 cpsr: 0x40001000
   far: 0x0000000000000000  esr: 0x56000080  Address size fault

So far I have only seen this with Python 3.12. We also run CI with Python 3.8 and it never segfaulted.

Expected Behavior

Shouldn't segfault.

Steps To Reproduce

See details; I haven't figured out a minimal repro yet, but I'm hoping something can be figured out from the C traceback.

Memray Version

1.18.0

Python Version

3.12

Operating System

Linux, macOS

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions