You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently experienced an issue with a single-node deploy of Cassandra in a docker-swarm stack. Upon using a LWT (INSERT .. IF NOT EXISTS), the statement would block and time-out.
This appears in the logs when the error occurs:
DEBUG [MessagingService-Outgoing-/10.0.0.9-Small] 2018-06-22 20:52:02,656 OutboundTcpConnection.java:545 - Unable to connect to /10.0.0.9
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_171]
at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_171]
at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_171]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_171]
at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146) ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132) ~[apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:433) [apache-cassandra-3.11.2.jar:3.11.2]
at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262) [apache-cassandra-3.11.2.jar:3.11.2]
After a bit of diagnosing, I concluded that cassandra was trying to coordinate the LWT with the wrong IP address. I saw that cassandra was using the VIP loopback interface, this config was auto generated for the docker-entrypoint.sh script which has a comment indicated that the Container IP should be chosen by _ip_address.
root@87a29d37bdd9:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.0.0.9/32 brd 10.0.0.9 scope global lo
valid_lft forever preferred_lft forever
12: eth0@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:00:00:0a brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.0.10/24 brd 10.0.0.255 scope global eth0
valid_lft forever preferred_lft forever
14: eth1@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 172.18.0.3/16 brd 172.18.255.255 scope global eth1
valid_lft forever preferred_lft forever
For this case, I addressed the issue with the following changes to docker-entrypoint.sh
diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 871f7f4..77e6212 100644
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -17,7 +17,7 @@ _ip_address() {
# scrape the first non-localhost IP address of the container
# in Swarm Mode, we often get two IPs -- the container IP, and the (shared) VIP, and the container IP should always be first
ip address | awk '
- $1 == "inet" && $2 !~ /^127[.]/ {
+ $1 == "inet" && $2 !~ /^127[.]/ && $NF != "lo" {
gsub(/\/.+$/, "", $2)
print $2
exit
This makes things look a lot saner
root@8a3a665ee17f:/# grep _address: /etc/cassandra/cassandra.yaml
listen_address: 10.0.1.6
broadcast_address: 10.0.1.6
# listen_on_broadcast_address: false
rpc_address: 0.0.0.0
broadcast_rpc_address: 10.0.1.6
root@8a3a665ee17f:/# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.0.1.3/32 brd 10.0.1.3 scope global lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
3: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1
link/tunnel6 :: brd ::
71: eth0@if72: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
link/ether 02:42:0a:00:01:06 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.1.6/24 brd 10.0.1.255 scope global eth0
valid_lft forever preferred_lft forever
93: eth1@if94: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:18:00:12 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 172.24.0.18/16 brd 172.24.255.255 scope global eth1
valid_lft forever preferred_lft forever
The text was updated successfully, but these errors were encountered:
Ignoring lo entirely makes a ton of sense -- do you want to make a PR to update all the versions/variants with that change, or would you prefer I carry your change from here? 🙏
I recently experienced an issue with a single-node deploy of Cassandra in a docker-swarm stack. Upon using a LWT (INSERT .. IF NOT EXISTS), the statement would block and time-out.
This appears in the logs when the error occurs:
After a bit of diagnosing, I concluded that cassandra was trying to coordinate the LWT with the wrong IP address. I saw that cassandra was using the VIP loopback interface, this config was auto generated for the
docker-entrypoint.sh
script which has a comment indicated that the Container IP should be chosen by_ip_address
.The config file contained
My interface list was
For this case, I addressed the issue with the following changes to
docker-entrypoint.sh
This makes things look a lot saner
The text was updated successfully, but these errors were encountered: