Description
During multi-sled deployment testing with falcon, the static unique local underlay0/bootstrap6
address fails to properly initialize and we end up with the following (note the question mark):
underlay0/linklocal addrconf ok fe80::8:20ff:fe95:d1b7/10
underlay0/? static ok fdb0:208:20c7:ea0f::1/64
The following error message gets written to the sled-agent svc log:
sled-agent: Failed to initialize bootstrap address: Failed to create address fdb0:208:20c7:ea0f::1 with name bootstrap6 in the GZ on "underlay0": Failed to create address Static(V6(Ipv6Network { addr: fdb0:208:20c7:ea0f::1, prefix: 64 })) with name underlay0/bootstrap6 in global: Zone execution error: Command [/usr/sbin/ipadm create-addr -t -T static -a fdb0:208:20c7:ea0f::1/64 underlay0/bootstrap6] executed and failed with status: exit status: 1. Stdout:
The creation of the static IPv6 address must occur after the creation of the link-local address on the same interface (underlay0 in this case). After some testing, and based on prior discoveries, @rcgoodfellow realized that there appears to be a race condition. The call to ipadm
to create the link-local address is not exactly synchronous, and so it returns before the kernel is done with initialization. When the call to create the static
address occurs too soon after the creation of the link-local, we get the failure above.
This theory was further verified by inserting a 10 second sleep in the call to zone::ensure_has_global_zone_v6_address
between the call to create the link-local address and the static address. With that sleep inserted, the static address is created successfully on the underlay0/bootstrap6
network.
The ideal fix for this is to make the call to ipadm
synchronous inside illumos. As a workaround though, we can ensure that we see that the link-local address has been successfully created before we try to create the underlay0
address.