Artificial Intelligence for Advancing Instruction (AIAI) Challenge

Goal of the Competition

The goal of this challenge was to build models that classify instructional activities using multimodal classroom data.

Classroom observation videos provide valuable insights into a teacher's instruction, student interactions, and classroom dynamics. Over the past 15 years, their use in teacher preparation and the study of teacher quality has increased significantly. Classroom videos are also a common source of data for educational researchers studying classroom interactions as well as a resource for professional development. Despite this growth, using video at scale remains challenging due to the time and resources required for processing and analysis.

What's in this Repository

This repository contains code from winning competitors in the Artificial Intelligence for Advancing Instruction (AIAI) DrivenData challenge. Code for all winning solutions are open source under the MIT License.

Winning code for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Place	Team or User	Phase 1 Score	Phase 2 Score	Summary of Model
1	SALEN	0.5827	0.5885	Modeled video and transcript data separately. Video labels were processed using an extended InternVideo2-1B transformer and discourse labels used a fine-tuned Qwen3 LLM. The team used temporal-aware sampling, data augmentation, and ensemble averaging of multiple checkpoints and cross-validation runs to address class imbalance and training data size.
2	TUM-UT	0.5283	0.5264	Developed separate models for vision and transcript labels. For vision labels, the final solution used an ensemble of Qwen2.5-VL-32B and V-JEPA2 models, selecting the best-performing model for each label based on validation F1. For discourse labels, generated a previous context of size two for each transcript, and then ran it through DeBERTa-V3 for embeddings, and a single Linear Layer for classification. Focal loss was used to help counter the class imbalance.
3	GoTerps	0.4283	0.4396	Combined text and video modalities using an ensemble approach. For transcripts, used finetuned RoBERTa-base and DeBERTa-V3-base models with multi-context preprocessing and an ensemble of 5 transformer models. For video, used a VideoMAE model for visual classification with weighted sampling to handle class imbalance.

Additional solution details can be found in the README inside the directory for each submission.

Winners Blog Post: Meet the winners of the AIAI Challenge

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
1st Place		1st Place
2nd Place		2nd Place
3rd Place		3rd Place
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Artificial Intelligence for Advancing Instruction (AIAI) Challenge

Goal of the Competition

What's in this Repository

Winning Submissions

About

Uh oh!

Releases

Packages

Languages

License

drivendataorg/aiai-challenge

Folders and files

Latest commit

History

Repository files navigation

Artificial Intelligence for Advancing Instruction (AIAI) Challenge

Goal of the Competition

What's in this Repository

Winning Submissions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages