This project is a substantial improvement and extension version of our MoPKL (Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection), published in the proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI’25).
-
Datasets are available at
ITSDT-15K,DAUB-R(code: jya7) andIRDST-H(code: c4ar). DAUB-R is a reconstructed version of DAUB, split into training, validation, and test sets. IRDST-H it is a hard version of IRDST. -
You need to reorganize these datasets in a format similar to the
coco_train_ITSDT.txtandcoco_val_ITSDT.txtfiles we provided (.txt filesare used in training). We provide the.txt filesfor ITSDT-15K, DAUB-R and IRDST-H. For example:
train_annotation_path = '/home/ITSDT/coco_train_ITSDT.txt'
val_annotation_path = '/home/ITSDT/coco_val_ITSDT.txt'- Or you can generate a new
txt filebased on the path of your datasets..txt files(e.g.,coco_train_ITSDT.txt) can be generated from.json files(e.g.,instances_train2017.json). We also provide all.json filesforITSDT-15K,DAUB-R(code: jya7) andIRDST-H(code: c4ar).
python utils_coco/coco_to_txt.py- The folder structure should look like this:
ITSDT
├─instances_train2017.json
├─instances_test2017.json
├─coco_train_ITSDT.txt
├─coco_val_ITSDT.txt
├─images
│ ├─1
│ │ ├─0.bmp
│ │ ├─1.bmp
│ │ ├─2.bmp
│ │ ├─ ...
│ ├─2
│ │ ├─0.bmp
│ │ ├─1.bmp
│ │ ├─2.bmp
│ │ ├─ ...
│ ├─3
│ │ ├─ ...
- python==3.11.8
- pytorch==2.1.1
- torchvision==0.16.1
- numpy==1.26.4
- opencv-python==4.9.0.80
- scipy==1.13
- Tested on Ubuntu 20.04, with CUDA 11.8, and 1x NVIDIA 3090.
-
We provide the encoded embedding representations(code: fmag) of the language descriptions for
ITSDT-15K,DAUB-RandIRDST-Hdatasets. There are three embedded representations in this file:emb_train_ITSDT.pkl,emb_train_DAUB.pklandemb_train_IRDST-H.pkl. -
We also provide initial the language description text files(code: yuy3) that you can explore further with vision-language models.
-
Take the ITSDT-15K dataset as an example, modify the path of the
dataloader_for_ITSDTfor language description embedding representations:
# Path to your emb_train_ITSDT.pkl
description = pickle.load(open('/home/MoPKL/emb_train_ITSDT.pkl', 'rb'))
embeddings = np.array(list(description.values()))
self.cap_idx =list(description.keys())
self.motion_cap_idx = np.array(list(description.values()))- In addition, you need to modify the dimension of
text_input_dimin the network fileMoPKL.py:
# ITSDT: 130 * 300
# DAUB-R: 20 * 300
# IRDST-H: 20 * 300
self.motion = MotionModel(text_input_dim=130*300, latent_dim=128, hidden_dim=1024)-
We provide the encoded tensor(code: 45c6) of the
motion relationsforITSDT-15K,DAUB-RandIRDST-Hdatasets. There are three embedded representations in this file:motion_relation_ITSDT.pkl,motion_relation_DAUB.pklandmotion_relation_IRDST-H.pkl. -
Take the ITSDT-15K dataset as an example, modify the path of the
dataloader_for_ITSDTfor language description embedding representations:
# Path to your motion_relation_ITSDT.pkl
description = pickle.load(open('/home/MoPKL/motion_relation_ITSDT.pkl', 'rb'))
relations = np.array(list(relation.values()))
self.re_idx = list(relation.keys())
self.motion_re_idx = np.array(list(relation.values()))- Note: Please use different
dataloaderfor different datasets. For example, to train the model on ITSDT dataset, enter the following command:
CUDA_VISIBLE_DEVICES=0 python train_ITSDT.py - Usually
model_best.pthis not necessarily the best model. The best model may have a lower val_loss or a higher AP50 during verification.
"model_path": '/home/MoPKL/logs/model.pth'- You need to change the path of the
json fileof test sets. For example:
# Use ITSDT-15K dataset for test
cocoGt_path = '/home/public/ITSDT-15K/instances_test2017.json'
dataset_img_path = '/home/public/ITSDT-15K/'python test.py- We support
videoandsingle-frame imageprediction.
# mode = "video" (predict a sequence)
mode = "predict" # Predict a single-frame image python predict.pypython summary.py- For bounding box detection, we use COCO's evaluation metrics:
| Method | Dataset | mAP50 (%) | Precision (%) | Recall (%) | F1 (%) | Download |
|---|---|---|---|---|---|---|
| iMoPKL | ITSDT-15K | 80.67 | 92.28 | 88.50 | 90.35 |
Baidu (code: 2u4k)
|
| iMoPKL | DAUB-R | 88.57 | 92.94 | 96.94 | 94.90 | |
| iMoPKL | IRDST-H | 43.95 | 59.82 | 74.48 | 66.35 |
- PR curves on
ITSDT-15K,DAUB-RandIRDST-Hdatasets in this paper. - We also provided the result files(code:2544) for these PR curves, so you can directly plot curves yourself.
If any questions, kindly contact with Shengjia Chen via e-mail: [email protected].
- S. Chen, L. Ji, J. Zhu, M. Ye and X. Yao, "SSTNet: Sliced Spatio-Temporal Network With Cross-Slice ConvLSTM for Moving Infrared Dim-Small Target Detection," in IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-12, 2024, Art no. 5000912, doi: 10.1109/TGRS.2024.3350024.
- Bingwei Hui, Zhiyong Song, Hongqi Fan, et al. A dataset for infrared image dim-small aircraft target detection and tracking under ground / air background[DS/OL]. V1. Science Data Bank, 2019[2024-12-10]. https://cstr.cn/31253.11.sciencedb.902. CSTR:31253.11.sciencedb.902.
- Ruigang Fu, Hongqi Fan, Yongfeng Zhu, et al. A dataset for infrared time-sensitive target detection and tracking for air-ground application[DS/OL]. V2. Science Data Bank, 2022[2024-12-10]. https://cstr.cn/31253.11.sciencedb.j00001.00331. CSTR:31253.11.sciencedb.j00001.00331.
If you find this repo useful, please cite our paper.
@ARTICLE{CheniMoPKL2025,
author={Chen, Shengjia and Ji, Luping and Peng, Shuang and Zhu, Sicheng and Ye, Mao and Sang, Yongsheng},
journal={IEEE Transactions on Geoscience and Remote Sensing},
title={Language-Driven Motion Prior Knowledge Learning for Moving Infrared Small Target Detection},
year={2025},
volume={63},
pages={1-14},
doi={10.1109/TGRS.2025.3596902}}
@inproceedings{ChenMoPKL2025,
title={{Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection}},
author={Chen, Shengjia and Ji, Luping and Duan, Weiwei and Peng, Shuang and Ye, Mao},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={2},
pages={2186--2194},
year={2025}
}
