Skip to content

Bump zenoh to 1.8.0 - 2nd attempt (backport #964)#965

Merged
JEnoch merged 1 commit intokiltedfrom
mergify/bp/kilted/pr-964
Apr 13, 2026
Merged

Bump zenoh to 1.8.0 - 2nd attempt (backport #964)#965
JEnoch merged 1 commit intokiltedfrom
mergify/bp/kilted/pr-964

Conversation

@mergify
Copy link
Copy Markdown

@mergify mergify Bot commented Apr 13, 2026

Summary

Key Changes

Root Cause (hang)

eclipse-zenoh/zenoh@e5db0ce changed Session::close() to call wait_callbacks() internally, blocking until all in-flight callbacks finish. The old teardown order let session_.reset() run while rmw entities (nodes, subscriptions) still held shared_ptr references. The session was only destroyed later inside ~Data() during nodes_.clear() — at which point callback handlers were being torn down simultaneously, causing a deadlock or STATUS_STACK_BUFFER_OVERRUN on Windows.

The fix calls session_->close() explicitly in shutdown(), at which point rclcpp::shutdown() has already exited the spin loop so no callbacks are in-flight. wait_callbacks() returns immediately, and the subsequent destructor path finds is_closed() == true and skips the blocking call.

Root Cause (ANSI codes, #951)

Zenoh 1.8.0 emits a new error log at Session shutdown, when a TCP link is closed at the same time and it fails to send an event to an already removed callback.
The Rust logger (env_logger) emits ANSI color escape sequences by default. These bled into captured output from ros2 param commands, causing yaml.reader.ReaderError when the output was parsed as YAML.
ros2topic.ros2topic.test.test_cli.test_cli is also parsing the test output and failing on this error log.

The fix is in Zenoh (commit eclipse-zenoh/zenoh@2687c51), removing those logs.
This PR makes rmw_zenoh to use this commit.

Related

Breaking Changes

None


Did you use Generative AI?

Yes. Claude (claude-sonnet-4-6) via Claude Code was used to assist with root cause analysis, reproducing the bug on Windows, and creating an initial prototype of the changes in this PR.


This is an automatic backport of pull request #964 done by Mergify.

* chore(zenoh_cpp_vendor): bump to latest zenoh-c and zenoh-cpp

- zenoh-c main: 102df1a3 (2026-04-10)
- zenoh-c ROS/rust-1.75: 0193595c (2026-04-07)
- zenoh-cpp main: af381b42 (2026-04-10)

* fix: close session explicitly in shutdown() to prevent hang on Windows

zenoh commit e5db0ce changed session.close() to call wait_callbacks(),
which blocks until all in-flight callbacks finish. With the older
teardown order, session_.reset() was called while node-level entities
(publishers, subscriptions, etc.) still held shared_ptr<Session> refs,
so the session wasn't actually destroyed until ~Data() called
nodes_.clear() — at which point wait_callbacks() would deadlock against
callbacks being concurrently destroyed on Windows.

Fix: call session_->close() explicitly in shutdown() before
session_.reset(). At shutdown time the spin loop has already exited,
so no callbacks are in-flight and wait_callbacks() returns immediately.
The session is then marked closed; when the shared_ptr refcount
eventually drops to zero during normal rcl teardown, the session
destructor finds is_closed()==true and skips the blocking close().

* chore(zenoh_cpp_vendor): restore get_cargo_version.cmake from #945

Extract cargo version detection into a reusable CMake function instead
of inlining execute_process, matching the approach from PR #945.

* fix: disable ANSI color codes in Zenoh log output (#951)

Set RUST_LOG_STYLE=never before initializing the Zenoh logger so that
color escape sequences do not leak into captured command output. This
fixes YAML parsing failures in ros2param tests where the ESC character
was treated as an unacceptable character.

The env var is set with overwrite=0 so callers can still override it.

* Use zenoh-c commits for Zenoh 1.8.0 + #2493

* Fix synchronization due to changes in undeclare in zenoh 1.8.0

This commit re-applies changes made in #935 , while keeping the explicit call to session_.close() added in rmw_context_impl_s::shutdown()

* Use zenoh 2687c5135

eclipse-zenoh/zenoh@2687c51

from branch https://github.com/eclipse-zenoh/zenoh/tree/suppress-admin-err-message-on-session-close

based on 1.8.0 plus few fixes, including removal of a error log at closure causing failure of a ros2cli test

* revert disable ANSI color codes in Zenoh log output

---------

Co-authored-by: Julien Enoch <julien.e@zettascale.tech>
(cherry picked from commit ba1ab30)
@JEnoch
Copy link
Copy Markdown
Contributor

JEnoch commented Apr 13, 2026

Pulls: #965
Gist: https://gist.githubusercontent.com/JEnoch/c8228f53dd1f71dfaefffc877eb705ca/raw/4a5ce77ce1a782935ba8a273862d65f110f58ab6/ros2.repos
BUILD args: "--continue-on-error" --packages-above-and-dependencies zenoh_cpp_vendor zenoh_security_tools rmw_zenoh_cpp
TEST args: --packages-above zenoh_cpp_vendor zenoh_security_tools rmw_zenoh_cpp
ROS Distro: kilted
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/18938

  • Linux Build Status Jenkins disconnection; rebuild Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@JEnoch
Copy link
Copy Markdown
Contributor

JEnoch commented Apr 13, 2026

CI failures on Linux-rhel are related to rmw_cyclonedds.
CI failures on Windows are due to missing tests isolation.
Those as been fixed in rolling with PRs listed here: #881 (comment). But those PRs have not yet been backported to kilted branches.

@JEnoch JEnoch merged commit a364db9 into kilted Apr 13, 2026
6 checks passed
@JEnoch JEnoch deleted the mergify/bp/kilted/pr-964 branch April 13, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants