-
Notifications
You must be signed in to change notification settings - Fork 12
ML Model format #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There Is also this interesting thread at iree-org/iree#2863 |
This raises the following questions to me:
- does this discussion need consensus beyond the Web browsers
community? beyond the JS community (cf #62
<#62>)? Can the
Web lead the way (or at least, one way) or would it be doomed to fail
without broader take up in other popular ML environments?
IMO, it would be valuable to have agreement from the major ML frameworks
and runtimes (WinML, CoreML, PyTorch, TensorFlow, etc). The success of any
web standard format would depend on enough ML models trained in the major
ML frameworks being convertible into the common format(s), and then
parseable and convertible back to any native runtime.
Any common format will likely support only a subset of what each ML
framework offers, since they're all evolving at different rates and have
only partial overlap in their supported operations. If the overlap is
valuable enough, and there's agreement, it could work. I know that Google
would be very reluctant to be limited to a standard that isn't expressive
enough for TensorFlow lite models that we're able to run in Android apps.
- is there an existing venue or a logical one for building or
verifying consensus on the direction to follow for such a format?
Microsoft formed ONNX as a community standard for model definitions, and
there has definitely been some traction in the WinML world, and perhaps
beyond. In the TensorFlow ecosystem, we haven't heard much interest yet,
and Google has declined to be involved in the ONNX efforts so far.
The main technical reason is that the TensorFlow team has been skeptical of
standardizing at the level of operations, due to operation fatigue with
growth of 20%/yr and well over 1,000 operations now. Google is looking to
move to a different approach with the Android NN API and TensorFlow, based
on some smaller set of composable operations. More than one such effort is
in progress: MLIR and Tensor Compute Primitives (TCP) are just 2 of the
options being explored. TCP is being done as a community project. MLIR is
open-source, and anyone can create a dialect. Google and Microsoft have
discussed the idea of a web dialect of MLIR, that's aligned with an ONNX
operation set. We haven't started working on it yet though.
If WASM + SIMD gives enough performance gains, that could buy us time until
the ML world stabilizes a bit more. Given how long the standards process
takes, exploring multiple options for model formats in parallel might be
pragmatic.
|
With regards to the format issue, the people working on WebNN including myself have spent considerable amount of time researching and comparing ML operation semantics across many popular frameworks and de-facto standards. An interesting observation from this exercise is that they are much more in common than most people think. This is probably due to the domain's historical root and the openness of the research community in the evolution of the domain knowledge. It is evident in the fact that most models can be converted to a different format reasonably well at the semantic level. Most problems people are facing today are tactical operational gaps such as redundancy and tool chain inefficiency. As a case study, when we started the development of the DirectML project a few years ago, we initially modeled its semantic operations on the earlier versions of ONNX, the format backing the WinML API. To our pleasant surprise, we found that more than 90% of DirectML functionality already built was readily transferable to our work on TensorFlow. If one would look at key building block functions such as In my view, the difference among these formats are more tactical and less semantic. They differ in breadth of variety and reusability, which are perhaps due to uncontrolled growth from rapid development. They in fact do share a very big overlap. In this sense, ML operations are not that different from regular API -- they are defined with the purpose of reuse. When Cho et al. introduced in their 2014 paper a novel recurrent network, there was no ML operation |
@cynthia in his talk points out the lack of consensus on a particular format for ML models:
The Model Loader API explainer (presented by @jbingham) offers some of the characteristics of what a good format would be in this context, with MLIR as a potential candidate.
This raises the following questions to me:
The text was updated successfully, but these errors were encountered: