Add refill strategy Implementation and some evaluation results #816

tengxiao1 · 2025-07-22T23:10:10Z

This PR implements the refill strategy in the training pipeline and documents the resulting performance improvements.

The refill strategy fills incomplete training batches with unfiltered prompts to maintain batch size and maximize data usage efficiency.

The comparison below is based on the same 3000 training steps on GRPO.

Benchmark	Before Refill	After Refill	Δ (Change)
BBH	65.20	67.20	+2.00
MMLU	73.38	76.05	+2.67
PopQA	21.16	20.64	-0.52
ifeval	83.73	85.95	+2.22
GPQA	33.92	33.49	-0.43
AIME	7.78	11.11	+3.33
ZebraLogic	9.80	10.20	+0.40
MATH500	75.20	76.60	+1.40
GSM8K	88.02	89.84	+1.82
MATH	73.14	74.00	+0.86
AGI (average)	—	68.55	—

root added 2 commits July 22, 2025 15:39

refill the batchsize with after filtering

4cdd26f

refill the batchsize with after filtering

34179e2

tengxiao1 requested review from natolambert and hamishivi July 22, 2025 23:10

Provide feedback