Bump zenoh to 1.8.0 - 2nd attempt#964
Conversation
- zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10)
zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close().
Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR ros2#945.
Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it.
This commit re-applies changes made in ros2#935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown()
|
Pulls: #964 |
eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test
|
Pulls: #964 |
|
The fix for ANSI color escape was not working and not sufficient for Those error logs in Zenoh are not legit anyway, since they occur at Session closure when it tries to call an already removed callback on a link closure event. In a branch based on version 1.8.0 I made removed those error logs in this commit: In distinct branches, zenoh-c is using this commit:
This PR is now using those branches. |
|
This CI is now green for all Linux. @sloretz OK to merge before RMW freeze ? |
|
Will leave to @sloretz on the final call here, but much better than we were a week ago. |
sloretz
left a comment
There was a problem hiding this comment.
Thank you for resolving those two issues! The Windows CI results LGTM
|
https://github.com/Mergifyio backport kilted jazzy humble |
✅ Backports have been createdDetails
|
* chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp - zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10) * fix: close session explicitly in shutdown() to prevent hang on Windows zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close(). * chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945 Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR #945. * fix: disable ANSI color codes in Zenoh log output (#951) Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it. * Use zenoh-c commits for Zenoh 1.8.0 + #2493 * Fix synchronization due to changes in undeclare in zenoh 1.8.0 This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown() * Use zenoh 2687c5135 eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test * revert disable ANSI color codes in Zenoh log output --------- Co-authored-by: Julien Enoch <julien.e@zettascale.tech> (cherry picked from commit ba1ab30)
* chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp - zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10) * fix: close session explicitly in shutdown() to prevent hang on Windows zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close(). * chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945 Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR #945. * fix: disable ANSI color codes in Zenoh log output (#951) Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it. * Use zenoh-c commits for Zenoh 1.8.0 + #2493 * Fix synchronization due to changes in undeclare in zenoh 1.8.0 This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown() * Use zenoh 2687c5135 eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test * revert disable ANSI color codes in Zenoh log output --------- Co-authored-by: Julien Enoch <julien.e@zettascale.tech> (cherry picked from commit ba1ab30)
* chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp - zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10) * fix: close session explicitly in shutdown() to prevent hang on Windows zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close(). * chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945 Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR #945. * fix: disable ANSI color codes in Zenoh log output (#951) Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it. * Use zenoh-c commits for Zenoh 1.8.0 + #2493 * Fix synchronization due to changes in undeclare in zenoh 1.8.0 This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown() * Use zenoh 2687c5135 eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test * revert disable ANSI color codes in Zenoh log output --------- Co-authored-by: Julien Enoch <julien.e@zettascale.tech> (cherry picked from commit ba1ab30)
* chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp - zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10) * fix: close session explicitly in shutdown() to prevent hang on Windows zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close(). * chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945 Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR #945. * fix: disable ANSI color codes in Zenoh log output (#951) Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it. * Use zenoh-c commits for Zenoh 1.8.0 + #2493 * Fix synchronization due to changes in undeclare in zenoh 1.8.0 This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown() * Use zenoh 2687c5135 eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test * revert disable ANSI color codes in Zenoh log output --------- (cherry picked from commit ba1ab30) Co-authored-by: Yuyuan Yuan <az6980522@gmail.com> Co-authored-by: Julien Enoch <julien.e@zettascale.tech>
* Bump zenoh to 1.8.0 - 2nd attempt (#964) * chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp - zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10) * fix: close session explicitly in shutdown() to prevent hang on Windows zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close(). * chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945 Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR #945. * fix: disable ANSI color codes in Zenoh log output (#951) Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it. * Use zenoh-c commits for Zenoh 1.8.0 + #2493 * Fix synchronization due to changes in undeclare in zenoh 1.8.0 This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown() * Use zenoh 2687c5135 eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test * revert disable ANSI color codes in Zenoh log output --------- Co-authored-by: Julien Enoch <julien.e@zettascale.tech> (cherry picked from commit ba1ab30) * Make uncrustify happy --------- Co-authored-by: Yuyuan Yuan <az6980522@gmail.com> Co-authored-by: Julien Enoch <julien.e@zettascale.tech>
* chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp - zenoh-c main: 102df1a3 (2026-04-10) - zenoh-c ROS/rust-1.75: 0193595c (2026-04-07) - zenoh-cpp main: af381b42 (2026-04-10) * fix: close session explicitly in shutdown() to prevent hang on Windows zenoh commit e5db0ce changed session.close() to call wait_callbacks(), which blocks until all in-flight callbacks finish. With the older teardown order, session_.reset() was called while node-level entities (publishers, subscriptions, etc.) still held shared_ptr<Session> refs, so the session wasn't actually destroyed until ~Data() called nodes_.clear() — at which point wait_callbacks() would deadlock against callbacks being concurrently destroyed on Windows. Fix: call session_->close() explicitly in shutdown() before session_.reset(). At shutdown time the spin loop has already exited, so no callbacks are in-flight and wait_callbacks() returns immediately. The session is then marked closed; when the shared_ptr refcount eventually drops to zero during normal rcl teardown, the session destructor finds is_closed()==true and skips the blocking close(). * chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945 Extract cargo version detection into a reusable CMake function instead of inlining execute_process, matching the approach from PR #945. * fix: disable ANSI color codes in Zenoh log output (#951) Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that color escape sequences do not leak into captured command output. This fixes YAML parsing failures in ros2param tests where the ESC character was treated as an unacceptable character. The env var is set with overwrite=0 so callers can still override it. * Use zenoh-c commits for Zenoh 1.8.0 + #2493 * Fix synchronization due to changes in undeclare in zenoh 1.8.0 This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown() * Use zenoh 2687c5135 eclipse-zenoh/zenoh@2687c51 from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test * revert disable ANSI color codes in Zenoh log output --------- (cherry picked from commit ba1ab30) Co-authored-by: Yuyuan Yuan <az6980522@gmail.com> Co-authored-by: Julien Enoch <julien.e@zettascale.tech>
Summary
shutdown()before releasing the shared_ptr referencezenoh_cpp_vendorto latest zenoh-c and zenoh-cpp, restoringget_cargo_version.cmakefrom Build against rust >= 1.75 for ROS Lyrical #945Key Changes
rmw_context_impl_s.cpp: callsession_->close()beforesession_.reset()inData::shutdown()zenoh_cpp_vendor/CMakeLists.txt: update to zenoh-c commit from ROS/zenoh-2687c5135 branch; add fallback to rust-1.75-zenoh-2687c5135 branch for Rust < 1.88zenoh_cpp_vendor/get_cargo_version.cmake: restored from Build against rust >= 1.75 for ROS Lyrical #945Root Cause (hang)
eclipse-zenoh/zenoh@e5db0ce changed
Session::close()to callwait_callbacks()internally, blocking until all in-flight callbacks finish. The old teardown order letsession_.reset()run while rmw entities (nodes, subscriptions) still held shared_ptr references. The session was only destroyed later inside~Data()duringnodes_.clear()— at which point callback handlers were being torn down simultaneously, causing a deadlock orSTATUS_STACK_BUFFER_OVERRUNon Windows.The fix calls
session_->close()explicitly inshutdown(), at which pointrclcpp::shutdown()has already exited the spin loop so no callbacks are in-flight.wait_callbacks()returns immediately, and the subsequent destructor path findsis_closed() == trueand skips the blocking call.Root Cause (ANSI codes, #951)
Zenoh 1.8.0 emits a new error log at Session shutdown, when a TCP link is closed at the same time and it fails to send an event to an already removed callback.
The Rust logger (
env_logger) emits ANSI color escape sequences by default. These bled into captured output fromros2 paramcommands, causingyaml.reader.ReaderErrorwhen the output was parsed as YAML.ros2topic.ros2topic.test.test_cli.test_cliis also parsing the test output and failing on this error log.The fix is in Zenoh (commit eclipse-zenoh/zenoh@2687c51), removing those logs.
This PR makes rmw_zenoh to use this commit.
Related
get_cargo_version.cmake)Breaking Changes
None
Did you use Generative AI?
Yes. Claude (claude-sonnet-4-6) via Claude Code was used to assist with root cause analysis, reproducing the bug on Windows, and creating an initial prototype of the changes in this PR.