-
Notifications
You must be signed in to change notification settings - Fork 64
Design of atenlib with ONNX functions #601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for your question! Yes, your understanding of the We see onnx functions a nice unit for optimization, because sometimes backend optimization is easier with more clearly identified boundaries (functions). I will check with my team for a more comprehensive response. For now you may refer to #165 for some background info (may not be up to date on all aspects). LMK if you have other questions in the meantime! |
Hm, wouldn’t it be possible to just produce different ONNX based on the value of the attribute (as by definition it's known compile-time)? Is there a good reason to constrain to only passing attribute values as allowed by ONNX (and, as you mention, potentially pre-processing it in Python) - don't you lose out on expressivity?
I see, that makes sense. Describing local graph patterns with functions is definitely something that is interesting to explore. Is it meant for a runtime like ORT to leverage specialised implementations for pytorch/atenlib ops?
Thanks, that's an interesting resource to check out! I do actually have a question: has the |
To produce different ONNX graphs one will need to run those logic in Python I think? Once the values are pre-processed, an effectively specialized ONNX graph will be emitted. LMK if I am missing anything!
That's a great question. Yes, I think runtimes like ORT will be able to leverage specialized implementations when they do have. I know there is also a TVM execution provider ONNX Runtime can use. Beyond that I would like to learn more too.
We don't have a native implementation yet. There is a prototype in https://github.com/microsoft/onnx-script/blob/main/onnxscript/function_libs/torch_aten/graph_building.py which leverages torchscript. |
Yes, I think it is required to have 'compile-time' logic run through Python while still being able to produce ONNX. I have two main examples in mind. The first one is just a more theoretical one in types. Keeping to the current 'type-safe' ONNX semantics, branching based on different attributes (essentially compile-time parameters) may change the type (especially rank/shape) of the result, which invalidates The second one is practical, though it depends if having
I see. Did you consider using Spox, since this is essentially what we implemented in that project? 😃 |
Yes, from ONNX's perspective, this is one of the key motivating-goals for ONNX functions. ONNX is as much a "standard library interface" as a "standard programming language". This is one of the ways to strike a balance between expressiveness and efficiency, between having a large number of higher-level-ops and a smaller number of primitive-ops. In terms of backend implementations, we have competing approaches based on use of optimizing-compilers to achieve performance vs. using hand-written kernels by experts. ONNX functions allow both approaches to be used. |
This is a valid point. And not just a theoretical one, I think. ONNX does have operators ( |
Yes, there are. Eg., Nvidia has a TensorRT based compiler for ONNX, used as an EP inside onnxruntime. There are other similar compilers for other hardware backends, also plugged into onnxruntime as execution-providers. As, Justin mentioned above, there is also TVM, which itself supports multiple hardware backends. |
https://github.com/microsoft/olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. |
Thanks for your responses!
Yes, I definitely think that thinking of ONNX as a programming language is a way of thinking that can be useful at times. But the balance isn't just between expressiveness and efficiency, but also simplicity. Introducing mechanisms which are too complex at the IR level increases the bound of entry for new technologies and contributors. I am all for expressing ONNX itself with ONNX to increase simplicity by decreasing the number of 'primitive' (non-function) operators, but this does not increase efficiency (and only requires expressiveness at some level).
I was quite interested in that extension, but more details on the idea would be needed to give a judgement whether the simplicity/expressiveness trade-off is met. Would it be essentially 'pattern-matching' on the input types & attributes? Surely that would still not be enough to achieve the necessary expressivity. If overloads had the ability to arbitrarily check those 'compile-time' values (to dispatch an overload) and them refer to them in the graph (to e.g. construct proper Casts), that seems like it would be almost fully expressive. But it also seems to be too complicated for what it's worth. My point is rather - is this expressivity really necessary to be expressed in ONNX itself? While having elementary functions (i.e. referencing just a static body that the runtime may have specialised for) is definitely something I can get behind now, at some point the complexity increases. In the standard we already have context-dependent functions. They are essentially just a graph-level transformation of a node into a subgraph of (sufficiently) primitive operators, and impossible to express in the standard ONNX functions. This lack of expressiveness can be compensated for by the converter framework/builder (or the main Hence, from the perspective of building a converter library, the approach in Please do let me know your thoughts! |
I like its api designs and love it as a valuable part of the ONNX ecosystem. I personally look forward to potential collaboration in the future.
Thanks for sharing your thoughts in detail! To me, even having a subset of the functions to be context independent, for PyTorch, is valuable because that makes the downstream optimization (fusion etc. ) a lot easier and cleaner to implement. It’s ok for a function to be context dependent as long as it remains within a clear boundary. Creating context dependent functions is not currently possible without the overloading PR Rama mentioned. So we are doing it differently by capturing the independent part inside a function. We actually like the constraints in onnx functions because it pushes us to express operators as generally correct as possible for different inputs. This way PyTorch users will not need to re-export when their inputs change in size, for example. ONNX Script does not constrain itself to creating functions only, and it can be used in a completely eager way. If the approach we use on PyTorch doesn’t fit other frameworks, we can certainly support different ways. |
I see, that is definitely fair and I do believe it's good to work with context-independent implementations when possible as they are later easier to work with (both formally and pragmatically). Asking this question I was primarily wondering what is the end-game plan for this approach - approaching more and more context-dependent converters on many varying levels: from attribute-dependent output types, to what are effectively
Cool! Could you point me in a direction where I can see an example of how it can be used eagerly (though I'm not fully sure what you mean by that)? I would be interested in seeing one, as I don't think I've seen it in the docs. I had the impression ONNX models can only be created from Python functions with explicit ASTs via
That's great to hear, thank you! For interoperability, I think inline (using a valid |
That message covers a lot of ground. Let me address one specific part here: with regards to the overloaded function extension, the existing proposal is on the simple side. I am not convinced that we should have a complex dispatching semantics encoded in ONNX. The dispatching is based on just a name (just as before). In essence, the proposal can be conceptually thought of as attaching two names to a function-body: one is used for dispatching (e.g., similar to a mangled-name used in the output of a C++ compiler), the other serving to identify its specification (at a higher level, this is sort of the unmangled name). So, in this proposal, the IR doesn't care how the different instances (with different mangled names) are generated. That would be determined by the builder framework that generates the ONNX model. So, this may be in line with what you are suggesting to (IIUC). But the ONNX repo (or affiliated repos) can provide "builder" utilities that generate such models, along with some specific overload-resolution-semantics to help generate models from some extended representation. |
Another detail to clarify the previous point: so, in the existing proposal, the calling-node specifies the mangled (or full) name of the called function ... so, the runtime doesn't choose which of the overloaded function to call (or inline). But, an optimizer, that decides to dispatch to an alternative hand-written kernel would use the unmangled (shorter) name to decide which kernel to call. |
Thank you for the exhaustive response!
That's my bad - I didn't read into your proposal correctly. For some reason I assumed the overload selection/dispatch part was still WIP. That is interesting and seems better than creating mangled function names in the first place - which is probably what I would have ended up doing if I tried creating a context-dependent function-lib now.
Yes, I think that could be interesting to have an explicit association with what constitutes different versions (overloads) of the same operation. And indeed it leaves the 'dispatch' to the builder instead of extending the standard, which seems good to me. It also leaves possibility for extension. I guess it's also more elegant than name mangling.
I'd definitely be interested to see concrete examples of this, as it seems like an interesting direction :) Coming from more 'dynamic' sklearn-like converters (with often variadic inputs/outputs), this might not be as common of a situation there, but it seems applicable in other cases.
Right. I assume then it would essentially ignore the |
Hi, onnx-script team! I've been following your project for a while and I wanted to ask a bit about the design for atenlib/torchlib functions, as I couldn't find any design docs. On that note, I'm curious to know whether these are going to form a new
torch.onnx
converter library?I noticed that some functions with a more complex ONNX-build logic have a
trace_only=True
tag, which means (as far as I understand?) they cannot be built directly into ONNX, and you only use them for eager evaluation. For instance here, a checkdim is None
forces the function to be out-of-ONNX (and there are many other such cases):https://github.com/microsoft/onnx-script/blob/7986ef8ed7a3af51d6f4409ca7d07df6c51cb8e5/onnxscript/function_libs/torch_aten/ops/core.py#L448-L449
Is my assumption on how
trace_only
works correct? What is the motivation for expressing everything only as ONNX functions, and not also as 'inline' applications of the relevant operators (without wrapping them in a function)? I also noticed there were some issues raised about what this means for function bodies that are dependent on attributes and can't be expressed in anIf
, which seems like a limitation foronnxscript
and the torch converter.Maybe you could have some input on this, @justinchuby ? Apologies for opening an issue for this question, but there are no discussions enabled. Feel free to close it afterwards!
The text was updated successfully, but these errors were encountered: