GitHub - UESTC-nnLab/MoPKL: [2025] Language-driven Motion Prior Knowledge Learning for Moving Infrared Small Target Detection

Language-driven Motion Prior Knowledge Learning for Moving Infrared Small Target Detection

This project is a substantial improvement and extension version of our MoPKL (Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection), published in the proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI’25).

Datasets (bounding box-based)

Datasets are available at ITSDT-15K, DAUB-R(code: jya7) and IRDST-H(code: c4ar). DAUB-R is a reconstructed version of DAUB, split into training, validation, and test sets. IRDST-H it is a hard version of IRDST.
You need to reorganize these datasets in a format similar to the coco_train_ITSDT.txt and coco_val_ITSDT.txt files we provided (.txt files are used in training). We provide the .txt files for ITSDT-15K, DAUB-R and IRDST-H. For example:

train_annotation_path = '/home/ITSDT/coco_train_ITSDT.txt'
val_annotation_path = '/home/ITSDT/coco_val_ITSDT.txt'

Or you can generate a new txt file based on the path of your datasets. .txt files (e.g., coco_train_ITSDT.txt) can be generated from .json files (e.g., instances_train2017.json). We also provide all .json files for ITSDT-15K, DAUB-R(code: jya7) and IRDST-H(code: c4ar).

python utils_coco/coco_to_txt.py

The folder structure should look like this:

ITSDT
├─instances_train2017.json
├─instances_test2017.json
├─coco_train_ITSDT.txt
├─coco_val_ITSDT.txt
├─images
│   ├─1
│   │   ├─0.bmp
│   │   ├─1.bmp
│   │   ├─2.bmp
│   │   ├─ ...
│   ├─2
│   │   ├─0.bmp
│   │   ├─1.bmp
│   │   ├─2.bmp
│   │   ├─ ...
│   ├─3
│   │   ├─ ...

Prerequisite

python==3.11.8
pytorch==2.1.1
torchvision==0.16.1
numpy==1.26.4
opencv-python==4.9.0.80
scipy==1.13
Tested on Ubuntu 20.04, with CUDA 11.8, and 1x NVIDIA 3090.

Usage of MoPKL

Language Descriptions

We provide the encoded embedding representations(code: fmag) of the language descriptions for ITSDT-15K, DAUB-R and IRDST-H datasets. There are three embedded representations in this file: emb_train_ITSDT.pkl, emb_train_DAUB.pkl and emb_train_IRDST-H.pkl.
We also provide initial the language description text files(code: yuy3) that you can explore further with vision-language models.
Take the ITSDT-15K dataset as an example, modify the path of the dataloader_for_ITSDT for language description embedding representations:

# Path to your emb_train_ITSDT.pkl

description = pickle.load(open('/home/MoPKL/emb_train_ITSDT.pkl', 'rb'))
embeddings = np.array(list(description.values()))
self.cap_idx =list(description.keys())
self.motion_cap_idx = np.array(list(description.values()))

In addition, you need to modify the dimension of text_input_dim in the network file MoPKL.py:

# ITSDT: 130 * 300
# DAUB-R: 20 * 300
# IRDST-H: 20 * 300

self.motion = MotionModel(text_input_dim=130*300, latent_dim=128, hidden_dim=1024)

We provide the encoded tensor(code: 45c6) of the motion relations for ITSDT-15K, DAUB-R and IRDST-H datasets. There are three embedded representations in this file: motion_relation_ITSDT.pkl, motion_relation_DAUB.pkl and motion_relation_IRDST-H.pkl.
Take the ITSDT-15K dataset as an example, modify the path of the dataloader_for_ITSDT for language description embedding representations:

# Path to your motion_relation_ITSDT.pkl

description = pickle.load(open('/home/MoPKL/motion_relation_ITSDT.pkl', 'rb'))
relations = np.array(list(relation.values()))
self.re_idx = list(relation.keys())
self.motion_re_idx = np.array(list(relation.values()))

Training

Note: Please use different dataloader for different datasets. For example, to train the model on ITSDT dataset, enter the following command:

CUDA_VISIBLE_DEVICES=0 python train_ITSDT.py

Test

Usually model_best.pth is not necessarily the best model. The best model may have a lower val_loss or a higher AP50 during verification.

"model_path": '/home/MoPKL/logs/model.pth'

You need to change the path of the json file of test sets. For example:

# Use ITSDT-15K dataset for test

cocoGt_path         = '/home/public/ITSDT-15K/instances_test2017.json'
dataset_img_path    = '/home/public/ITSDT-15K/'

python test.py

Visulization

We support video and single-frame image prediction.

# mode = "video" (predict a sequence)

mode = "predict"  # Predict a single-frame image

python predict.py

Parameters and FLOPs Calculation

python summary.py

Results

For bounding box detection, we use COCO's evaluation metrics:

Method	Dataset	mAP50 (%)	Precision (%)	Recall (%)	F1 (%)	Download
iMoPKL	ITSDT-15K	80.67	92.28	88.50	90.35	Baidu (code: 2u4k)
iMoPKL	DAUB-R	88.57	92.94	96.94	94.90
iMoPKL	IRDST-H	43.95	59.82	74.48	66.35

PR curves on ITSDT-15K, DAUB-R and IRDST-H datasets in this paper.
We also provided the result files(code:2544) for these PR curves, so you can directly plot curves yourself.

Contact

If any questions, kindly contact with Shengjia Chen via e-mail: [email protected].

References

S. Chen, L. Ji, J. Zhu, M. Ye and X. Yao, "SSTNet: Sliced Spatio-Temporal Network With Cross-Slice ConvLSTM for Moving Infrared Dim-Small Target Detection," in IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-12, 2024, Art no. 5000912, doi: 10.1109/TGRS.2024.3350024.
Bingwei Hui, Zhiyong Song, Hongqi Fan, et al. A dataset for infrared image dim-small aircraft target detection and tracking under ground / air background[DS/OL]. V1. Science Data Bank, 2019[2024-12-10]. https://cstr.cn/31253.11.sciencedb.902. CSTR:31253.11.sciencedb.902.
Ruigang Fu, Hongqi Fan, Yongfeng Zhu, et al. A dataset for infrared time-sensitive target detection and tracking for air-ground application[DS/OL]. V2. Science Data Bank, 2022[2024-12-10]. https://cstr.cn/31253.11.sciencedb.j00001.00331. CSTR:31253.11.sciencedb.j00001.00331.

Citation

If you find this repo useful, please cite our paper.

@ARTICLE{CheniMoPKL2025,
  author={Chen, Shengjia and Ji, Luping and Peng, Shuang and Zhu, Sicheng and Ye, Mao and Sang, Yongsheng},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={Language-Driven Motion Prior Knowledge Learning for Moving Infrared Small Target Detection}, 
  year={2025},
  volume={63},
  pages={1-14},
  doi={10.1109/TGRS.2025.3596902}}

@inproceedings{ChenMoPKL2025,
  title={{Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection}},
  author={Chen, Shengjia and Ji, Luping and Duan, Weiwei and Peng, Shuang and Ye, Mao},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={2},
  pages={2186--2194},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
model_data		model_data
nets		nets
utils		utils
utils_coco		utils_coco
PR.png		PR.png
README.md		README.md
predict.py		predict.py
summary.py		summary.py
test.py		test.py
train_DAUB.py		train_DAUB.py
train_IRDST-H.py		train_IRDST-H.py
train_ITSDT.py		train_ITSDT.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language-driven Motion Prior Knowledge Learning for Moving Infrared Small Target Detection

Datasets (bounding box-based)

Prerequisite

Usage of MoPKL

Language Descriptions

Training

Test

Visulization

Parameters and FLOPs Calculation

Results

Contact

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

UESTC-nnLab/MoPKL

Folders and files

Latest commit

History

Repository files navigation

Language-driven Motion Prior Knowledge Learning for Moving Infrared Small Target Detection

Datasets (bounding box-based)

Prerequisite

Usage of MoPKL

Language Descriptions

Training

Test

Visulization

Parameters and FLOPs Calculation

Results

Contact

References

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages