-
Notifications
You must be signed in to change notification settings - Fork 14
Add declarations and examples for Python HDK API. #170
Conversation
1a6fd79
to
e2387d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we expose run parameters such as "use CPU only"? Are we going to have run()
method parameters?
python/pyhdk/hdk.py
Outdated
pass | ||
|
||
@not_implemented | ||
def cst(self, value, type=None, scale_decimal=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cst
looks more like a cast, not constant. Do we really want to save 2 letters in a very clear const
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a creation of an expression, not simply a cast. E. g. proj("a")
and proj(hdk.cst("a"))
are completely different things. type
parameter allows us to avoid additional cast. hdk.cst("2020-10-05", "date")
is equal to hdk.cst("2020-10-05").cast("date")
.
I'm used to having const
as a reserved word. But we can use it here I guess or create an alias to have both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about .literal()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about introducing another alias for constants. Representing expression is hdk::ir::Constant
, so it makes sense to use similar words in API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would probably spell it out then. Worth the extra typing. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks reasonable to me. To @kurapov-peter 's comment, it seems like we need to break out some sort of per-instantiation state (e.g. when you create the hdk
object) and per-query state (if you wanted to set per query execution parameters, for example). You could imagine something like:
query_state = hdk.getCurrentQueryState()
query_state.device = hdk.device.cpu
hdk.sql("SELECT * FROM t;", query_state=query_state)
Would that work with the table aliasing scheme?
python/pyhdk/hdk.py
Outdated
pass | ||
|
||
@not_implemented | ||
def cst(self, value, type=None, scale_decimal=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about .literal()
?
pass | ||
|
||
@not_implemented | ||
def import_pydict(self, values, table_name=None, fragment_size=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this expected to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect it to be used in actual workloads but it can be convenient in small experiments. E.g. in my API tests I use it like the following:
hdk = pyhdk.init()
ht = hdk.import_pydict(
{"a": [1, 2, 3, 4, 5], "b": [5, 4, 3, 2, 1], "x": [1.1, 2.2, 3.3, 4.4, 5.5]}
)
self.check_res(
ht.proj(a1=ht["a"] // ht["b"], a2=ht["a"] // 2, a3=ht["x"] // 2.0).run(),
{
"a1": [0, 0, 1, 2, 5],
"a2": [0, 1, 1, 2, 2],
"a3": [0.0, 1.0, 1.0, 2.0, 2.0],
},
)
In fact, it's simply a shorter version for import_arrow(pyarrow.Table.from_pydict(values))
.
It should be OK to reserve some arg for query parameters. Do you think we need a dedicated structure for that? Our
|
Ah I see, yes a simple dictionary is probably better. Though it would be nice to be able to generate that w/ default values, so I could print and know what was available (or what my config was). |
The |
We can use |
Signed-off-by: ienkovich <[email protected]>
Signed-off-by: ienkovich <[email protected]>
e2387d2
to
a78423d
Compare
Signed-off-by: ienkovich <[email protected]>
This is the first version of PyHDK API according to our latest discussions. There is no implementation, only declarations with inline documentation and some usage examples provided as disabled tests.
Please check if it matches your vision. Once we agree on this PR, I'll start implementation. Let me know if more detailed descriptions or more examples/tests are required to make the decision.