CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

Hang Wu¹ Yujun Cai^†2,3 Zehao Li⁴ Haonan Ge¹ Bowen Sun¹
Junsong Yuan⁵ Yiwei Wang¹

¹University of California, Merced ²The University of Queensland ³Ant Group
⁴Institute of Computing Technology, Chinese Academy of Sciences
⁵University at Buffalo, State University of New York
^†Indicates Corresponding Author

🔥 Update

[2026-01-28]: 🚀 CamReasoner-7B released on Huggingface.
[2026-01-28]: 🚀 Codes and training dataset released.

🎯 Overview

Abstract: Understanding camera dynamics is a fundamental pillar of video spatial intelligence. However, existing multimodal models predominantly treat this task as a black-box classification, often confusing physically distinct motions by relying on superficial visual patterns rather than geometric cues. We present CamReasoner, a framework that reformulates camera movement understanding as a structured inference process to bridge the gap between perception and cinematic logic. Our approach centers on the Observation-Thinking-Answer (O-T-A) paradigm, which compels the model to decode spatio-temporal cues such as trajectories and view frustums within an explicit reasoning block. To instill this capability, we construct a Large-scale Inference Trajectory Suite comprising 18k SFT reasoning chains and 38k RL feedback samples. Notably, we are the first to employ RL for logical alignment in this domain, ensuring motion inferences are grounded in physical geometry rather than contextual guesswork. By applying Reinforcement Learning to the Observation-Think-Answer (O-T-A) reasoning paradigm, CamReasoner effectively suppresses hallucinations and achieves state-of-the-art performance across multiple benchmarks.

🕹️ Usage

Supervised Fine-tuning

Supervised Fine-Tuning establishes a foundational reasoning baseline by injecting structured templates and domain-specific knowledge, enabling the model to follow instructions and generate coherent initial responses.

git clone https://github.com/wuhang03/CamReasoner
cd CamReasoner

# build SFT environment
conda create -n sft python=3.11 
conda activate sft
cd LLaMA-Factory
bash setup.sh

# download data
bash download.sh

# run sft (modify parameters according to your need)
bash local_scripts/run_sft.sh

Our proposed SFT dataset CamReasoning-SFT-18k is in camerabench_sft.json. If you want to train models on your own curated data, you can convert their format as shown in our json file.

Reinforcement Learning

Reinforcement Learning drives the model to self-evolve through trial and error, refining the internal logic chain and optimizing decision-making performance beyond the limitations of static training data.

git clone https://github.com/wuhang03/CamReasoner
cd CamReasoner

# build RL environment
conda create -n rl python=3.11 
conda activate rl
cd EasyR1
bash setup.sh

# download data
bash download.sh

# run rl (modify parameters according to your need)
bash local_scripts/run_rl.sh

Our proposed RL dataset CamReasoning-RL-38k is in camerabench_rl.json. If you want to train models on your own curated data, you can convert their format as shown in our json file.

For more details for the SFT and RL environment installation, please refer to LLaMA-Factory, EasyR1

Inference

You can use CamReasoner-7B to inference and see the reasoning process following this part. We provide the results in CamReasoner_binary.json and CamReasoner_vqa.json to help visualizing the observe-think-answer reasoning paradigm.

git clone https://github.com/wuhang03/CamReasoner
cd CamReasoner

# build inference environment
conda create -n infer python=3.11 
conda activate infer
cd Inference
bash setup.sh

# download data
python data_download.py

# run inference (modify parameters according to your need)
bash infer/infer.sh

The evaluation data are from CameraBench, including questions and videos. You can refer to CameraBench for more details.

You can also refer to ShotBench and RefineShot for out-of-domain camera movement understanding evaluation and reasoning reliability evaluation. These evaluation can be run simply modifying MODEL_NAME and CATEGORY in evaluate_qwen.sh in RefineShot.

📈 Training Curves

📌 Examples

Qualitative results across four typical camera movements. For each case, we visualize the temporal frame sequence alongside the CamReasoner-7B response. The model demonstrates robust spatial reasoning by generating detailed of visual cues and a logical process to accurately identify the movement and provide the final .

📑 Citation

If you find our project useful, we hope you can star our repo and cite our paper as follows:

@article{wu2026camreasoner,
  title={CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning},
  author={Wu, Hang and Cai, Yujun and Li, Zehao and Ge, Haonan and Sun, Bowen and Yuan, Junsong and Wang, Yiwei},
  journal={arXiv preprint arXiv:2602.00181},
  year={2026}
}

📝 Acknowledgements

We sincerely appreciate the contributions of the open-source community. The related projects are as follows:

RL: OneThinker, Video-R1, DeepSeek-R1, EasyR1, verl
SFT: LLaMA-Factory
Evaluation: CameraBench, VLMEvalKit, ShotBench

License

This project is licensed under the terms of the Apache License 2.0. You are free to use, modify, and distribute this software under the conditions of the license. See the LICENSE file for details. This project is intended for academic and research purposes only. Any commercial use is strictly prohibited without prior written consent.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
EasyR1		EasyR1
Inference		Inference
LLaMA-Factory		LLaMA-Factory
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

🔥 Update

🎯 Overview

🕹️ Usage

Supervised Fine-tuning

Reinforcement Learning

Inference

📈 Training Curves

📌 Examples

📑 Citation

📝 Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

🔥 Update

🎯 Overview

🕹️ Usage

Supervised Fine-tuning

Reinforcement Learning

Inference

📈 Training Curves

📌 Examples

📑 Citation

📝 Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages