-
Notifications
You must be signed in to change notification settings - Fork 208
NODEJS-666: Extend driver vector support to arbitrary subtypes and fix handling of variable length types (OSS C* 5.0) #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good work @SiyaoIsHiding! I love the increased test coverage we're getting here; it's good to see the same robust coverage we have for the other drivers that support vectors coming to node.js as well. There are a few things here I'd really like to see changed, although fortunately most (all?) of them are internal things that shouldn't be super-visible to users.
return 0xff >> extraBytesToRead; | ||
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self (and perhaps future reviewers): everything above this spot was copied over from lib/types/duration.js, everything below was added with this PR.
uvintUnpack: uvintUnpack | ||
}; | ||
})(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Near as I can tell there aren't any tests for this functionality in isolation. We wind up testing some of the new stuff via the encoder tests but we don't test these functions directly. I think that's okay for now since the older functions (the ones that were copied here from the duration impl) also aren't tested anywhere.... but it probably is something we should remedy at some point.
test/unit/encoder-vector-tests.js
Outdated
// dimension: 3 | ||
// }, | ||
// value: [new types.Duration(1, 2, 3), new types.Duration(4, 5, 6), new types.Duration(7, 8, 9)] | ||
// }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SiyaoIsHiding is there a reason we can't just test duration directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't make it work when I submitted this PR but I just made it work with the following test case
{
subtypeString: 'duration',
typeInfo: {
code: types.dataTypes.custom,
info: [{
code: types.dataTypes.custom,
info: 'org.apache.cassandra.db.marshal.DurationType'
},3],
customTypeName: 'vector',
},
value: [new types.Duration(1, 2, 3), new types.Duration(4, 5, 6), new types.Duration(7, 8, 9)]
},
Note that I have to put 'org.apache.cassandra.db.marshal.DurationType'
instead of duration
. Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our conversation, this does not raise any concern.
}, | ||
value: [{ f1: 'a' }, { f1: 'b' }, { f1: 'c' }] | ||
} | ||
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a lot of duplication here between the data provider above and the one in encoder-vector-tests. Can we move these to a common spot in the testing framework so that both tests can leverage the same provider(s)?
examples/package-lock.json
Outdated
"link": true | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're checking this in for the examples module now? This seems.... kinda strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
lib/encoder.js
Outdated
* @param {ColumnInfo | null} columnInfo | ||
* @returns | ||
*/ | ||
this.decodeCustom = function (bytes, typeName, columnInfo = null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike the idea of adding an extra param here.
The original impl of this function passes in "type.info" from the type info it gets as an arg. Your new impl passes in "type.info" as well but also passes in... "type" as well. One of these args seems redundant; why pass "type.info" when you can just as easily get the "info" property of the type information you're also passing.
Seems like you can do the exact same thing by just passing "columnInfo" here and adding "typeName = columnInfo.info" near the top of this function.
if (offset + elemLength > buffer.length) { | ||
throw new TypeError('Not enough bytes to decode the vector'); | ||
} | ||
rv[i] = this.decode(buffer.subarray(offset, offset + elemLength), subtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the type code of the subtype here, don't we? So we can actually look up the decode function for the subtype using that code rather than having to go all the way back through decode() can't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't call decode()
. Then we need to
const decoder = this.decoders[type.code];
return decoder.call(this, buffer, 'info' in type? type.info : null);
Which is basically calling decode()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: this is a pretty insignificant one to me. It's a relatively minor savings not having to go back through decode() but the difference is pretty small.
I'm perfectly fine with continuing to use decode() here.
lib/encoder.js
Outdated
} | ||
return -1; | ||
default: | ||
throw new TypeError('Cannot calculate size'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm... I'm wondering if it's better to make the default case return -1. You wouldn't have to specify all the data types that should have this behaviour (like you do in this impl)... you'd just specify size functions for the types that are of a fixed size and everybody else returns -1.
Best argument I can think of not to do it that way is that there may be some benefit in being explicit for every single type: is it fixed size or not? It does make the code a bit more cumbersome, though.
lib/encoder.js
Outdated
throw new TypeError('Target data type could not be guessed, you should use prepared statements for accurate type mapping. Value: ' + util.inspect(value)); | ||
}else{ | ||
type = guessedType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why... do we have an else block on an if test which throws if it's successful? Seems like the original impl says exactly the same thing but in a more succinct way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
lib/encoder.js
Outdated
if (customTypeName.startsWith(customTypeNames.vector)) { | ||
return this.encodeVector(value, this.parseVectorTypeArgs(customTypeName, customTypeNames.vector, this.parseFqTypeName)); | ||
|
||
//TODO: here we cannot tell whether it's a vector or not, because we don't have the customTypeName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is here: Assume the type of a vector is
{
code: types.dataTypes.custom,
customTypeName: 'vector',
info: [{ code: types.dataTypes.int }, 3]
}
the customTypeName
argument we get here is
[{ code: types.dataTypes.int }, 3]
We basically lost the customTypeName: 'vector'
, and does not have a good way to tell it's a vector. Unless we say as long as we get an array and the second element is a number, then we treat it as a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per our discussion, we decided to change the API of encoding and decoding functions to (value, type) instead of (value, type.info).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good after the last round of changes! I love how the API seems simpler and more consistent now!
One lingering question for me about how best to represent custom types generally in a way that makes sense for vectors and other non-vector custom types but that's really it.
@@ -283,7 +316,8 @@ function defineInstanceMembers() { | |||
/* | |||
* Reads a list from bytes | |||
*/ | |||
this.decodeList = function (bytes, subtype) { | |||
this.decodeList = function (bytes, columnInfo) { | |||
const subtype = columnInfo.info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
Pausing briefly; @SiyaoIsHiding pointed out that we ought to update the docs to reflect the behavioural changes contained in this PR. Will be merged once those edits are in place. |
Will add more tests