GSoC 2025 Work Product

This repository contains the final report for Google Summer of Code 2025.

Project: Improve OTX Classification via DoRA and Transformer Backbone.

Project Overview

Project Summary: At the time I applied for the project, OTX offered only two training options: Full Fine-Tuning and LoRA. The goal of this project is to go beyond that by providing users with a wider range of training options. To achieve this, it is essential to explore more efficient fine-tuning methods, identify model architectures that support these methods, and provide experimental results that demonstrate their effectiveness.

Expected Outcomes: The core of this project is not only to implement new features and models, but also to provide experimental results across various datasets so that OTX users can choose the most suitable training method for their needs. As mentioned earlier, PEFT methods often outperform Full Fine-Tuning (FFT), but the reduced number of trainable parameters inevitably leads to a trade-off with model performance (accuracy). Therefore, it is essential to conduct experiments on diverse types of datasets, measuring not only accuracy but also computational cost (such as GPU memory usage and training time) in order to clearly show users the degree of trade-off involved.

Implementation Details

This implementation is based on the following papers:

DoRA is designed to decompose weights into magnitude and direction.

The overall process consists of three main stages: 1. Decompose, 2. Adapt, and 3. Merge.

The core part of this implementation directly follows the formulation in the DoRA paper (Weight-Decomposed Low-Rank Adaptation, arXiv:2402.09353).

Each step in the code is aligned with the corresponding equations in the paper, and the comments explicitly indicate which equation is being implemented.

Experimental Results

0. Experimental Setup

All experiments followed the default OTX training configurations.

We used DINOv2-small as the backbone model.

1. Average Accuracy across 7 Datasets

We compared four different fine-tuning strategies (Full FT, Freeze Backbone, LoRA, DoRA) on 7 datasets: FGVC-Aircraft, Food-101, Stanford Cars, CUB-200, HAM10000, RESISC45, and Kitti Distance.

The average accuracy results are summarized in the bar chart below:

The dataset-wise accuracy results are summarized in the line chart below:

2. GPU Memory, Trainable Parameters & Avg. Accuracy

We measured GPU memory usage, trainable parameters, and average accuracy.

The comparison is shwon in the following bubble plot:

3. Other Backbone

We also experimented with another ViT-based backbone, TinyViT-21M. While PEFT methods generally led to performance drops compared to full fine-tuning, DoRA showed a relatively smaller decrease.

The average accuracy results are summarized in the bar chart below:

Conclusion

As mentioned earlier, the main goal of this project is to provide a broader range of fine-tuning options for users. These results can serve as a valuable reference for selecting the most suitable fine-tuning strategy depending on the user's goals - whether it's minimizing resource usage or maximizing performance.

For more information, please refer to the post here.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
PEFT		PEFT
figs		figs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GSoC 2025 Work Product

Project Overview

Implementation Details