Skip to content

SummerOfCodeIdeas

Marcus Edel edited this page Feb 14, 2025 · 182 revisions

Ideas for Google Summer of Code 2025

Google Summer of Code 2025 - Program Page

For GSoC 2025, we are doing something a little different: instead of a number of ideas in all kinds of varying different directions, we have decided that we want to focus all of our efforts in the direction that's most important to the library, with only one project. As a result, we will only accept an application that focuses on this project.

The ONNX-mlpack converter

Problem statement: one very common use case is that users have trained a neural network in a toolkit that is not mlpack, but then they want to use mlpack for deployment due to mlpack's minimal overhead and small compile size. Therefore, we need a working conversion tool to convert models built with other toolkits into models that can be run with mlpack.

Important context: make sure you are familiar with the mlpack vision document, especially points 4, 5, and 7; see also our successful NASA proposal to get an idea of where our development efforts currently are. A good candidate should be well-versed with the full picture of what mlpack is and how it is used; while this project primarily is about the neural network toolkit, it is a good idea to also know what else is available inside of mlpack.

Current status: the ONNX converter was also a project last year, and there is currently a prototype codebase at https://github.com/mlpack/onnx-mlpack. Proposals have the option of either using the existing code that is there, or starting over. Proposals that want to start over should justify why the current repository cannot be used.

Goals for the project: for a successful summer, any project application should complete some to all of the following objectives (listed in rough order of preference---things nearer to the bottom are less important). It's far more important to complete an objective well than to complete everything.

  1. Demonstrate a working end-to-end example of converting ONNX models to mlpack formats for basic feedforward networks and convolutional networks.
  2. Demonstrate a working end-to-end example of converting ONNX models to mlpack formats for recurrent neural networks (e.g. LSTMs).
  3. Provide a consistent and well-documented user interface for model conversion, both from C++ and as a standalone command line tool.
  4. Properly handle errors when an mlpack layer is not available during conversion, or when other errors are encountered.
  5. Ensure that the code follows the mlpack design and style guidelines.
  6. Ensure that the code and all code paths are well-tested and have unit tests.
  7. Integrate usages of the ONNX-mlpack converter into the examples repository.
  8. Integrate documentation for the ONNX-mlpack converter into the main mlpack documentation.
  9. Ensure that the code is at least reasonably efficient. (This is less important, since a user will only convert a network once.)
  10. Allow conversions from mlpack models back to ONNX.

A thing to keep in mind: Google Summer of Code is not just an internship; the goal is to grow the community of maintainers. Our aim is to find and mentor someone who has a long-term interest in developing and maintaining mlpack or open-source software in general. Why is that? A few reasons:

  • The open-source community functions by people devoting their time and effort; often that is their free time. It is always fun to contribute new things, but the more important and impactful work is involved in maintaining the software, helping users, and being a part of the community.

  • Pragmatically speaking, when GSoC contributors don't become long-term maintainers, maintainers are burdened with the additional code from their project. Countless hours, days, and weeks have gone into maintaining code where the original long-term authors are long gone; this hinders our community's ability to add new features and progress towards development.

  • From a human perspective, it's just not very fun to work on open source software alone. Maybe we all got into open source software because we had fun writing code, but it's the community and the interpersonal relationships that keep us interested in the project and contributing to it.

potential mentor(s): Ryan Curtin, Omar Shrit, Marcus Edel

project size: medium (~175 hours) and large project (~350 hours)

Clone this wiki locally