-
Notifications
You must be signed in to change notification settings - Fork 150
Routing not properly working in DC/OS #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @neurofoo, thanks for reporting this problem. Attached code works fine for me with a local cluster. I'm able to execute both queries and routing table looks properly updated. You are right about Unfortunately we do not have debug logging in JS driver right now. Do you have a possibility to modify driver's source code in
I hope with this logging we can get more insight into what is wrong. |
Hello @lutovich, Here's the console output:
|
@lutovich I don't think I specified above, the DC/OS cluster is 1.9 EE. Is your local cluster 1.8 or 1.9? |
@neurofoo so it looks like DNS resolution did not do anything wrong with the provided IP address and connection was just closed by the server. Could you please attach Sorry I do not get your question about versions. My local cluster is 3.1.2 EE, JS driver 1.2.0, node v6.7.0. Update: now I understand what you mean by 1.8 and 1.9. I did not use DC/OS locally, just started 3 separate processes. Never used DC/OS before actually. Do you have a script or tool to setup such DC/OS & Neo4j cluster locally? One more thing you could try is to turn off encryption. This can be done like:
|
@lutovich turning encryption off didn't work I haven't install dcos locally in quite a while, but this guide should work: https://dcos.io/docs/1.9/installing/local/ I'm working with unterstein on the slack #neo4j-dcos channel right now. Looks like it might be related to having upgraded from dcos 1.8 to 1.9. Working through a few issues. Will post back here with updates. |
@lutovich I can confirm that on a new DC/OS cluster with a clean installation of neo4j from the Mesosphere universe, the driver appears to work as expected. But, I haven't finished all tests. I'm investigating what differences there are between the new cluster (1.9; no upgrades) and the others that were upgraded from 1.8 to 1.9 that had the driver issues. |
@neurofoo thanks for the update! |
@lutovich welcome. it looks like the problem was due to user roles. In the examples that produced the errors above, I was using a user that had a publisher role. If I switch to an admin user, then I don't have the issues. I would expected based upon this: https://neo4j.com/docs/operations-manual/current/security/authentication-authorization/native-user-role-management/native-roles/ that I ought to be able to use a user with publisher role and that this seems to be best practice. I don't want my webapp's running around with admin privileges. Can you confirm that if you use a user with just a publisher role that you get the errors that I describe? |
@lutovich okay. I think I have the source of the error and this issue can be closed. The original user I was using had role publisher, but that user was not properly propagated through the cluster (only showed up in the leader). So, when I tried connecting to a non-leader, I got the above described errors. However, those errors are pretty cryptic because there wasn't an indication that the user wasn't on the node. Perhaps adding something to error message that the user wasn't found? |
@neurofoo yes, I can reproduce the same issue when using This is the error I see in
You are right, error messages should be much better. I'll keep this issue open until we decide how to improve error messages. Thanks a lot for tracking this issue this far! |
@lutovich fantastic! many thanks for the confirmation! Might I also suggest noting this in the docs (http://neo4j.com/docs/operations-manual/current/clustering/causal-clustering/) as a giant !!NB!! for users. Again, thanks! |
Hi @neurofoo, Authentication error propagation was released with 1.3.0 driver. |
Hello all. I was requested to raise this as an issue from the neo4j-dcos slack channel.
tl;dr routing doesn't appear to work using the dcos neo4j ee universe package.
Below is the longer writeup.
We have an issue that we hope that you might be able to help us solve. The main issue is that the javascript driver's bolt routing doesn't appear to work. It is our understanding that by specifying the scheme "bolt+routing" in the javascript driver we can connect to any node and the driver will take care of discovering the network and selecting which nodes to use for different operations.
We have neo4j-ee running in a Mesosphere DC/OS cluster in Google Cloud.
We used the neo4j Mesosphere Universe package to launch a basic three node cluster (just core nodes). The neo4j cluster is using the standard private dcos network (9.0.0.0/8).
To confirm that we are running the EE version, we can check the version:
The cluster also looks like it was installed and booted correctly. The output logs show that each node found all the other nodes. E.g.,
We have a docker node container in which we are running a few tests. In the container we load the neo4j js drivers as:
(We have also tried the neo4j-driver and neo4j-driver@latest)
We have a simple node app for checking connectivity (write/read):
This works so long as the ip specified is that for the leader. If we give it an ip for one of the followers, we get the following error message:
If we use just the 'bolt' scheme instead of 'bolt+routing', we can't write to non-leader nodes and receive a 'not a leader' error message.
As a sanity check, we checked to make sure that all the nodes have route roles. They do:
We also checked to make sure that the general cluster routes were correct (from inside the neo4j running containers):
We also ran the above node app during an interactive session to look at the routing tables of the driver/session.
When we use the leader ip, we get:
When we use either of the follower ips, we get:
and the ip just changes based upon the one that we used. It appears that the routing table hasn't been properly loaded.
So, at this point we are a little stuck and not sure what the issue is. Everything appears to be configured correctly, but no dice on connecting to a non-leader node and using the routing, which is a feature we would love to use.
The text was updated successfully, but these errors were encountered: