Skip to content

Fix: regarding PR #110 edge cases #113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 4, 2016
Merged

Fix: regarding PR #110 edge cases #113

merged 3 commits into from
Aug 4, 2016

Conversation

legraphista
Copy link
Contributor

@legraphista legraphista commented Jul 26, 2016

Regarding PR #110, in some cases, the file would still duplicate lines and here's an attempt to explain why:

Node Timeline:

Javascript handler:
driver.session() 
    loadFingerprint() 
        fs.existsSync()
        fs.createReadStream  I/O Task #1 queued

driver.session() 
    loadFingerprint() 
        fs.existsSync()
        fs.createReadStream  I/O Task #2 queued

I/O Handler:
READ_FILE_CHUNK(knownHostsPath)
READ_FILE_CHUNK(knownHostsPath)

Javascript handler:
on('line') 
    line.startsWith( serverId )
    cb( line.split(" ")[1] );
        storeFingerprint 
            !!_lockFingerprintFromAppending[serverId] // check lock
            _lockFingerprintFromAppending[serverId] = 1; // lock
            fs.appendFile  I/O Task #3 queued
            setImmediate  Immediate Task #1 (case 1)
            setTimeout  Timeout Task #1 (case 2)

[Case 1] Sometimes, if the disk is too slow, The second on('line') will be slower than setImmediate but still faster than setTimeout

Bad
Javascript Handler:
Immediate Task #1 
    delete _lockFingerprintFromAppending[serverId]; // Bad

on('line') 
    line.startsWith( serverId )
    cb( line.split(" ")[1] );
        storeFingerprint 
            !!_lockFingerprintFromAppending[serverId] // check lock
            _lockFingerprintFromAppending[serverId] = 1; // lock
            fs.appendFile  I/O Task #3 queued
            setImmediate  Immediate Task #2

[Case 2] Set timeout is low priority and is ran after all queued execution blocks for that tick

Good
Javascript Handler:
on('line') 
    line.startsWith( serverId )
    cb( line.split(" ")[1] );
        storeFingerprint 
            !!_lockFingerprintFromAppending[serverId] // check lock

CHECK_TIMEOUT_QUEUE 
     Task #1 is due to fire
Timeout Task #1 
    delete _lockFingerprintFromAppending[serverId]; // Good

This can still duplicate if the disk is really slow at reading the I/O Task #2 chunk but setting a timeout of 0 should cover most of the cases.

@legraphista legraphista reopened this Jul 26, 2016
@swist
Copy link

swist commented Jul 26, 2016

Apologies if I am being thick here, buy why can't you just remove the lock in the callback from fs.readFile?

@legraphista
Copy link
Contributor Author

legraphista commented Jul 26, 2016

I'm not sure I understand, are you talking about readline in loadFingerprint?

@swist
Copy link

swist commented Jul 26, 2016

My bad, a bit distracted. I meant fs.appendFile in storeFingerprint and also pass in the onSuccess callback in loadFingerprint to storeFingerprint. Something like this:

function storeFingerprint( serverId, knownHostsPath, fingerprint, cb) {

  ... snip ...
  // we append to file
  fs.appendFile(knownHostsPath, serverId + " " + fingerprint + EOL, "utf8", (err) => {
    delete _lockFingerprintFromAppending[serverId];
    if (err) {
      console.log(err);
    }
    cb(err)
  });
}

Later on in TRUST_ON_FIRST_USE

loadFingerprint(serverId, knownHostsPath, (knownFingerprint) => {
       ... 
  } else if( knownFingerprint == null ) {
    storeFingerprint( serverId, knownHostsPath, serverFingerprint, (err) => {
      if (err) {
        return err
      }
      return onSuccess()
    } );
  } ...
});

Keeps read/write asynchronous, but at a cost of being a bit uglier.

@legraphista
Copy link
Contributor Author

legraphista commented Jul 26, 2016

Oh, now I see. You are write, that's actually an even better idea, and since the driver.session() calls are not in series, I believe that there is no reason for awaiting storeFingerprint. I think a got a bit blinded by the fact that it was a one tick issue that I didn't see the callback in fs.append. :D

@swist
Copy link

swist commented Jul 26, 2016

Better to actually handle the error though:

loadFingerprint(serverId, knownHostsPath, (knownFingerprint) => {
       ... 
  } else if( knownFingerprint == null ) {
    storeFingerprint( serverId, knownHostsPath, serverFingerprint, (err) => {
      if (err) {
        return onFailure(err)
      }
      return onSuccess()
    } );
  } ...
});

@legraphista
Copy link
Contributor Author

Yes, I do agree, error handling is a must.

@legraphista legraphista reopened this Jul 26, 2016
and lock the server ID while I/O event, thanks to @swist
@zhenlineo
Copy link
Contributor

Most of the tests actually failed this time. Except the FFFFF..., the fail error message is

[08:06:53][Step 4/4] /opt/teamcity-agent/work/724ebdf14baddd87/test/v1/examples.test.js:195
[08:06:53][Step 4/4]       expect(out[0].length).toBe(3);
[08:06:53][Step 4/4]                    ^
[08:06:53][Step 4/4] 
[08:06:53][Step 4/4] TypeError: Cannot read property 'length' of undefined
[08:06:53][Step 4/4]     at null._onTimeout (/opt/teamcity-agent/work/724ebdf14baddd87/test/v1/examples.test.js:195:20)
[08:06:53][Step 4/4]     at Timer.listOnTimeout (timers.js:92:15)
[08:06:53][Step 4/4] npm ERR! Test failed.  See above for more details.

I guess you will also see the similar error if you run locally too.

I reverted the commits to protect from duplicates in the known_host file to let the test system avoid redness everywhere. So pls do a rebase and add the reverted commits back to solve the conflicts.

This PR is too hard for me to understand, so I am calling help from our JS export :)

@legraphista
Copy link
Contributor Author

legraphista commented Jul 27, 2016

@zhenlineo I find this really weird as all the test pass except No operations allowed until you send an INIT message successfully. tck one that you were talking about a couple of days ago.
http://pastebin.com/ZnbZZNM1

@swist
Copy link

swist commented Jul 28, 2016

@legraphista @zhenlineo what do I need to do to have a look at team city builds? I have sent the CLA over.

Been trying to reproduce the error locally, but I am getting only the same error as @legraphista

@zhenlineo
Copy link
Contributor

zhenlineo commented Jul 28, 2016

@legraphista @swist Pls rebase your PR. But as I reverted your previous change for the duplicates in known_host file. So using the following git code to fix the conflicts with 1.0:

  • checkout a new branch based on new 1.0, e.g.
git checkout 1.0
git pull 
git checkout -b 1.0-fix-duplicates-in-known-host
  • revert the revert I did
git revert c64c727d942958b6b10d956e22f0fd143a90311d
  • cherry-pick your current commit
git cherry-pick b1aa869bfbde0990fd1c665968e11519ebe048af

FYI: The "fix" for the INIT message error is 1e27eb5

Let me know if you feel hard to rebase, then I will bring a new PR with your new commit for you.
I already asked our JS star @oskarhane to help us review this change. He will be back to you guys soon :)

@zhenlineo
Copy link
Contributor

Or maybe git rebase 1.0 and fix conflicts if you want to keep this PR thread :/ I usually find fixing conflicts is harder.

@oskarhane
Copy link
Member

Sorry for the delay @legraphista and @swist.

With #116 cherry picked on top of this pr, all tests passes locally for me where they used to fail just like they did for @zhenlineo (because of dir ~/.neo4j didn't exist).

I'll have a deeper look at the actual contents of this PR later today.

@oskarhane
Copy link
Member

I had to rewrite parts of #116 to make it work on windows, so don't look too much on that code :)
Anyway, this pr including the changes in #110 looks good to me.

So if you could rebase or solve the conflicts like @zhenlineo described we'll happily merge this.

…erprint-duplication-error/1.0

# Conflicts:
#	src/v1/internal/ch-node.js
@legraphista
Copy link
Contributor Author

@oskarhane Rebasing would delete chunks of PR #116, so I did a merge from upstream 1.0 instead.

@oskarhane
Copy link
Member

Thanks @legraphista, but I think your test should not duplicate fingerprint entries in tls.test.js got lost in the merge process.
Could you please add it again?

@legraphista
Copy link
Contributor Author

@oskarhane I'm sorry for the delay, i've re-added the missing test.

@oskarhane
Copy link
Member

No worries @legraphista, thanks a lot.
lgtm @zhenlineo

@zhenlineo zhenlineo merged commit 20c702a into neo4j:1.0 Aug 4, 2016
oskarhane added a commit that referenced this pull request Aug 10, 2016
This code was in PR #113, but got lost in merge process
@legraphista legraphista deleted the fix/known_hosts-fingerprint-duplication-error/1.0 branch January 6, 2017 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants