Skip to content

Support for streaming results for queries via new option (stream=true) #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from
Closed

Conversation

sagiegurari
Copy link

This pull requests provides ability to stream query results but also provides 2 more additional items:

  1. ability to extend the connection object via javascript. most nodejs developers won't want to develop in C++ i believe and with this change now they can contribute to the oracledb library via js.
    This could help a lot to this project and should provide a lot more contributions.
  2. editorconfig file since the standards in this project is a bit different than most js projects.

@cjbj
Copy link
Member

cjbj commented Jan 12, 2016

@sagiegurari thanks for submitting. Can you fill in the OCA as mentioned in CONTRIBUTING.md?

@sagiegurari
Copy link
Author

done

@dmcghan
Copy link

dmcghan commented Jan 13, 2016

@sagiegurari I agree that the driver could really benefit from more JavaScript code! :) To lay a solid foundation down in this area, I suggest an architecture where we create a JavaScript equivalent for each of the driver's base classes. End users of the driver would only work with the JavaScript classes which would wrap the underlying C classes. This would allow us to add our JavaScript code in a more organized fashion.

To show you what I'm talking about I've put together together this example:
https://github.com/dmcghan/node-oracledb/tree/javascript_classes_demo/lib

Keep in mind it's not fully baked, but it should get the idea across. In addition to demonstrating the architecture I'm suggesting, I've also added a connection request queuing feature for the Pool class. You can see how this works looking at pool.js and connection.js.

I'd be really interested to hear what you think about this architecture. Could you see yourself adding the streaming feature there? I've created a resultSet (resultset.js) class which may be used in the execute method of the connection class. I would imagine you'd create a similar "ReadStream" class that would be used instead based on the options object...

@sagiegurari
Copy link
Author

I couldn't agree more with the need and I see what you did in your fork.
You definitely put some much needed order in the code (talking about the lob and oracledb objects).

Basically it is similar approach in terms of extending oracledb like I did (take a look at another project of mine called simple-oracledb which I provide a lot more extensions) to provide additional logic in js.
However, while I extend the C++ object with JS, your approach is to create a wrapper object.
This which leads to the need of having every public c++ api also defined in the js.
My approach only requires the actual js modified api to be defined, and if a certain api there is no need to extend (lets say the connection.commit) than there is no need to define it.

The wrapper approach leads to more maintenance work, theoretically more bugs and in some cases might not work.
So i'm definitely for the goal you set, but the way it is set I think might be high maintenance.
Take a look at https://github.com/sagiegurari/simple-oracledb and you can see how much you can push via js to this lib without really wrapping it, and I have pushed a lot of features there.
It allows me to support any version of oracledb without doing any changes, even if oracledb add new functions to connection or pool and so on....

@dmcghan
Copy link

dmcghan commented Jan 13, 2016

That's a fair point regarding the need to manually add whatever needs to be exposed to the JS layer. Though for simple pass throughs, this is quite simple so I'm not so worried about bugs there. When you say, "and in some cases might not work" what did you mean?

Another issue I ran into with this approach is that properties like "pool.connectionsInUse" can't be mapped through very easily (at least I don't believe they can be as they are not like). Instead, getter methods would need to be introduced.

On the other hand, this allows us to use JavaScript prototypes for all of the methods we'll be overriding. In the long term, as the code builds up, this could yield some memory benefits as there'll only be a single instance of the function on the prototype. Additionally, in one sweep, we could introduce a well laid out JS layer that folks could probably understand a bit easier and build out when needed.

Food for thought! :)

@sagiegurari
Copy link
Author

when I said might not work, is one scenario where in the native object it might call another public function which would result in bypassing the wrapper.
now... i'm not sure what will happen when you extend the object like I did as I never extended node modules via C++, but i'm hoping it will go to the js overridden function if called correctly in c++.
but i'm not sure.

@dmcghan
Copy link

dmcghan commented Jan 13, 2016

A native object calling an instance method seems unlikely - and as you say, it may or may not be an issue with either implementation.

Also, one of the enhancements I'd like to add to the driver is to make all async methods return promises over just undefined. This would mean that all methods (they're currently all async), including commit and release, would need to be overridden.

@cjbj
Copy link
Member

cjbj commented Jan 13, 2016

  • I'm keen to see one or the other solution merged as long as it moves the driver forwards.
  • The C layer can be changed as needed to support the JS layer
  • The node-oracledb JS coding standard can easily be changed

@dmcghan
Copy link

dmcghan commented Jan 14, 2016

I rounded out the JS layer today in the demo branch:
https://github.com/dmcghan/node-oracledb/tree/javascript_classes_demo/lib

I was able to figure out a solution for the the attribute values using Object.defineProperties. I think things are looking pretty good.

@sagiegurari When your OCA is processed I'd like to have your streaming changes merged to see how it all hangs together. What do you think?

@sagiegurari
Copy link
Author

I'm ok with doing changes whereever needed.
But why not use the power of js to change any object and simply extend instead of wrap? What is the advantage of wrapping?

@dmcghan
Copy link

dmcghan commented Jan 14, 2016

Normally we'd extend classes in JS using either prototypes (vanilla JS) or util.inherits (Node.js). The idea with these solutions is to extend out one constructor function (sub) via another constructor function (super). However, the driver currently only exposes the base class: oracledb.Oracledb. In all other cases, we only get instances of classes. To do things via prototypes or util.inherits, the C layer would need to expose all base classes. That's definitely worth having a discussion about. Definitely! :) But in the mean time, I think this is the next best thing.

Our current approaches are actually quite similar. We are both sorting a reference to the C layer method before adding a JS layer in front. I'm just doing it wholesale which would happen anyway when adding the Promises.

@sagiegurari
Copy link
Author

not always. many times you augment an instance to provide more capabilities or change existing behavior. I believe that is what expressjs for example does with the nodejs request/response objects.
I'm just not crazy about wrapping every function and every attribute instead of just using the existing object and winning forward compatibility of any change.

@cjbj
Copy link
Member

cjbj commented Jan 14, 2016

@sagiegurari can you quantify 'not crazy'? We'd obviously be keeping C & JS in sync going forward.

@dmcghan
Copy link

dmcghan commented Jan 14, 2016

I was wrong! Seems the base object DOES expose all the other classes (Oracledb, Connection, Pool, ResultSet, and ILob)!

I'll work on a new branch today that implements a better pattern using prototypes.

@dmcghan
Copy link

dmcghan commented Jan 14, 2016

Well, I tried, but I'm giving up on this approach (for now anyway). It was able to get this working with the base class. But the problem with this approach is that the other classes were not meant to be instantiated via "new" in the JavaScript layer. So, for example, when creating a pool you'd still have to call .createPool from the base class which would return a pool instance that wouldn't have any of the changes you made in the JS layer. :(

@sagiegurari
Copy link
Author

@cjbj "not crazy" :) means that javascript gives you the power to modify any object.
that's not something you see in other languages.
you have many libraries using this power to enhance other libs, for example I use it to change the nodejs spawn to first do something before the spawn happens.

Wrapping objects leads to more maintenance it has no advantage (only disadvantages) compared to modifying the existing c++ instances.
For example

  1. all public apis MUST be in js object
  2. all attributes MUST be in sync. If C++ changes it, js must also (I think @dmcghan is using define for that)
  3. you might impact other libs (like mine) that modify the base objects (not sure....)
  4. C++ should never call other C++ functions that are defined public as they will bypass the js layer
  5. The biggest thing however, is that I can't think of any advantage of wrapping. It just doesn't give something more than modifying. The code from @dmcghan is very organized and properly written and gives a lot of order to the code for people to do changes, but why wrap? you can give order and help by modifying like I did and I don't mind you taking it forward and do it better than I did. But wrapping just doesn't give anything extra.

@sagiegurari
Copy link
Author

I think i can give a better solution, in my pull request, i specifically changed the execute function and kept the old one.
if you look what i did in simple-oracledb, there I did it in generic way for ALL functions, so no need to specifically state what function we want to extend.

In connection.js:
https://github.com/sagiegurari/simple-oracledb/blob/31a73553056a2ff206fa4e5f4c5b4c886c2f7e75/lib/connection.js#L838-L846

In pool.js:
https://github.com/sagiegurari/simple-oracledb/blob/31a73553056a2ff206fa4e5f4c5b4c886c2f7e75/lib/pool.js#L136-L144

If I would do the same in the connection.js of this pull request as I did in simple-oracledb, I think it should help give a solution that would satisfy all?

@dmcghan
Copy link

dmcghan commented Jan 15, 2016

I spent some time yesterday exploring a standard prototypal inheritance pattern that didn't work out. Let me spend some cycles today on an extend/mixin pattern.

Frankly, I could go either way. What I really want to see is a release where each base class is extended in a consistent manner so that we have a good foundation for JS developers to build out.

@sagiegurari
Copy link
Author

👍 i'll be happy to give any input once you have something to look at.

basically you can also modify the result set by modifying the connection.execute and wrapping the provided callback with your callback which would extend the c++ resultset with js level capabilities.
however i'm uncertain if that is needed.
I believe most developers would have good contributions mainly to the connection and pool classes and that only extending those 2 would be enough.
it is a classic 80 20 rule :)

@dmcghan
Copy link

dmcghan commented Jan 15, 2016

Okay, here's a first pass at an extends implementation: https://github.com/dmcghan/node-oracledb/tree/javascript_extends_demo/lib

Let me know if you what you think...

@sagiegurari
Copy link
Author

better, few comments that I think can improve:

  • no need to write extender on every file. easier for people to have connection.js, pool.js and not extender. developers would look for connection.js not for connectionextender.
  • no need for oracledb and oracledb extender, i mean, you are in the library doing the changes, so do it directly in oracledb.js, makes more sense and a lot more organized.
  • just food for thought, i know it is intentional but keep in mind that empty functions that just do apply on the args have performance impact on the process due to 2 things.
    one is the fact that you access arguments which takes time to create (only created if acccessed), second is that apply is slow. overall you won't see much impact since the IO to the DB would take much longer, but like I said, think about it.

overall i'm much for this change 👍 and if you need me to do after your pull request, another one to push the streaming I'll do it.

@dmcghan
Copy link

dmcghan commented Jan 16, 2016

Cool!

Regarding file names, I just want to be consistent. I started with a class=file approach. However, the extenders are not true classes (constructor functions invoked with the new keyword) which led to them being modules. Then the clash between the oracledb class and the extends module for it lead to those names...

If we take a standard Node.js module approach, grouping things by related functionality, and just expose the interface (constructor or extend functions) then we'd end up with something like this:
https://github.com/dmcghan/node-oracledb/tree/javascript_extends_demo/lib

I do like that better, what about you?

As for the empty functions, that was really just do demonstrate the approach we take when extending instances out. Keep in min that I do plan on building them all out so that they return Promises rather than just undefined. Those placeholders will help me when I go to do that (maybe this weekend?).

@sagiegurari
Copy link
Author

looks good.
as for promises, I suggest you take some library that implements it (there are plenty) and not es6 promise as you will want oracledb to support node 0.12 and 0.10

@dmcghan
Copy link

dmcghan commented Jan 16, 2016

Thanks! And thanks for your suggestions! :)

We would use es6 promises by default. If undefined, we could fall back on the es6 promise polyfill (a subset of rsvp.js):
https://github.com/jakearchibald/es6-promise

@ecowden
Copy link

ecowden commented Jan 16, 2016

FYI: I've seen Node.js library authors leaning towards pinkie-promise as a promise polyfill. It manages the process of using es2105 promises when available and adds its own implementation only when not running in an es2015 environment. It's also very small, so it doesn't add a much extra weight to the library.

@dmcghan
Copy link

dmcghan commented Jan 16, 2016

lol, clever name! Thanks for the heads up, we'll have to look into both...

@cjbj cjbj mentioned this pull request Jan 20, 2016
@Bigous
Copy link
Contributor

Bigous commented Jan 20, 2016

Well this is a very good job, congrats @sagiegurari.
Instead of writing it directly in JS, we could write the script part in Typescript. It gives us the advantage of write code with the newest features (class, annotation, etc...), generate type definitions for people that use the library to have code completion, hint and help while editing code and you end writing less code.
About the wrapping thing, I personally doesn't like it. I think that the code gets more difficult to maintain and to follow what is happening.
But I agree with you guys, people get afraid of C++ code often... Probably if the code were written in script more people would try to help to improve it.
I think that the best solution to be fast to maintain and have more people on it, should be a pure script driver (like oracle did with thinclient for java and oracle managed data access for .NET)... The extra would be no compilation or binary distribution needed... the drawback, writing a lot of code to build the node driver.

@dmcghan
Copy link

dmcghan commented Mar 12, 2016

@ecowden Thanks for the feedback! 👍

@dmcghan
Copy link

dmcghan commented Mar 12, 2016

I just did a blog post that sums up the current status of this PR (ResultSet Streaming) and the promise support (that doesn't have it's own home yet - sorry Sagie!):
https://jsao.io/2016/03/extending-the-javascript-layer/

@sagiegurari
Copy link
Author

The promise code changes look good to me.
Didn't run the tests but it seems to look simple (which is good 👍)

@dmcghan
Copy link

dmcghan commented Mar 13, 2016

Thanks for taking a look Sagie!

@sagiegurari
Copy link
Author

ok so I

  • added the metadata event
  • changed the getRows option used to streamNumRows
  • Updated docs

Missing

  • performance/stress tests

Hope to setup some tests today/this week.
if there is anything still missing please tell me.

@cjbj
Copy link
Member

cjbj commented Mar 16, 2016

Thanks @sagiegurari. I know @dmcghan wants to pound on it a bit. We will also work on tests too.
When we merge, the comment format will change.

@cjbj
Copy link
Member

cjbj commented Mar 16, 2016

To set expectations, Promises will be post 1.8.s

@sagiegurari
Copy link
Author

Added a stress test of 500 queries using single connection with random streamNumRows value.
I guess I'm done with the PR coding, so if anything is needed just tell me.

@cjbj
Copy link
Member

cjbj commented Mar 16, 2016

@sagiegurari that's a good point to stop so we can work with a stable PR. Thanks!

@cjbj
Copy link
Member

cjbj commented Mar 17, 2016

@sagiegurari do you feel it's OK to emit a close event but not an end?

@sagiegurari
Copy link
Author

I only emit the metadata and error events.
the nodejs stream code is the one that emits the other events based on if I pass data or not.

if you look at the following doc, you will see that 'close' is an optional event.
https://nodejs.org/api/stream.html#stream_event_close

I can add code to emit that as well, but i'm pretty sure that error and end are doing the job just fine.

@cjbj
Copy link
Member

cjbj commented Mar 17, 2016

Aren't you emitting close here: https://github.com/oracle/node-oracledb/pull/321/files#diff-259dec4414a400a6e895f16ff1d0ca3bR53

My tests show that end is also emitted - and you use this in tests too.

Can you check the following, which is crashing for me.

    var stream = connection.queryStream(
      'SELECT first_name FROM employees ORDER BY employee_id',
      [], { streamNumRows: 100 });

    stream.on('error', function (error) {
      // console.log("stream 'error' event");
      console.error(error);
      return;
    });

    stream.on('metadata', function (metadata) {
      // console.log("stream 'metadata' event");
      console.log(metadata);
    });

    stream.on('data', function (data) {
      // console.log("stream 'data' event");
      console.log(data);
    });

    stream.on('close', function () {
      console.log("stream 'close' event");
    });

    stream.on('end', function () {
       console.log("stream 'end' event");
    });

@sagiegurari
Copy link
Author

  • my code doesn't emit the 'close', it doesn't emit 'data' or 'end', it is the nodejs stream code that does that. It is triggered when I do push(data) or push(null) in my stream class.
  • I run tests on multiple node.js versions. I can tell you that node 0.10 behaves very very differently than 0.12 and up.

@cjbj
Copy link
Member

cjbj commented Mar 17, 2016

I recall the issue: because there is no explicit use of the connection it gets closed before all data is fetched. If I add a connection.release() to the end event, all is well.

@sagiegurari
Copy link
Author

can you explain? is it a test error?

@cjbj
Copy link
Member

cjbj commented Mar 18, 2016

@sagiegurari no problem with tests/stream.js. It was a problem with my standalone test closing the connection too early.

In other news to keep you in the loop, we're getting this PR stable in our internal repo for stress testing etc. Dan's fixed an issue with pooling due to the pool connection.extend not passing oracledb. I'm still trying to find a better name than streamNumRows. We will likely move some functionality from connection.js to resultset-read-stream.js. I've done a some work on doc. Dan was wondering whether highWaterMark should be settable - this could be a post 1.8 topic to discuss.

@sagiegurari
Copy link
Author

Why use an internal repo and not a branch per release that is open and visible to everyone? You will get better feedback early before you release it officially.

@dmcghan
Copy link

dmcghan commented Mar 18, 2016

@sagiegurari I think the discussion on a close/destroy method got lost. I think really important as without it we can't stop the stream, we can only pause it which keeps the resultset open (which blocks connection operations, like release).

As there is no close method, I'm confused by this if block here: https://github.com/oracle/node-oracledb/pull/321/files#diff-259dec4414a400a6e895f16ff1d0ca3bR51

How would the closed property ever be set?

@sagiegurari
Copy link
Author

That code is from when I did develop the close but removed it. Should remove that as well.
Wouldn't connection.break solve this?

@dmcghan
Copy link

dmcghan commented Mar 18, 2016

I just ran a test and break does work. However, I think it should be possible to stop the stream without breaking the connection. A close method seems the most appropriate way to do this, no?

@sagiegurari
Copy link
Author

I'm ok with that, I'll add it next week

@cjbj
Copy link
Member

cjbj commented Mar 20, 2016

@sagiegurari we've frozen 1.8 ready for release testing, so hold off making changes for the moment.

@sagiegurari
Copy link
Author

sure

@cjbj
Copy link
Member

cjbj commented Mar 24, 2016

@sagiegurari thanks for your perseverance. We merged this to 1.8. @dmcghan moved more of the code into the new js file. At the last minute we decided to revert to using oracledb.maxRows for the getRows() size. We want to revisit this when #361 is looked at.
Let's use new issues/PRs for the future work such as a way to close streams early. Also I see getRows() is called one extra time; maybe this can be optimized.

@cjbj cjbj closed this Mar 24, 2016
@jgoux
Copy link

jgoux commented Apr 14, 2016

Hello,
I tried to use this new feature, but I can't pipe the stream. Is pipe supported ?

EDIT : nevermind, it's working! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants