-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I'm working on reproducing the results from your ICLR 2025 paper and I've run into a snag with the Fakejob dataset.
According to Table 2 in the paper, the AUC-ROC for the Fakejob dataset using the SmolLM-135M model should be 0.800. However, after running the provided scripts, my results are quite a bit different. I'm consistently getting an AUC-ROC around 0.6286.
Here's the full output from my get_results.py
run:
AUC-ROC: 0.6286 +- 0.0036 ( 1), AUC-PR: 0.1642 +- 0.0040 ( 1), F1: 0.2150 +- 0.0078 ( 1) P: 0.2150 +- 0.0078 ( 1) R: 0.2150 +- 0.0078 ( 1)
I've followed the run_anollm.sh
script for the mixed benchmark experiments, and the results for the other datasets seem to be in line with what's reported in the paper. It's just the Fakejob dataset that's showing this significant difference.
Could you please provide some clarification on this? I'm wondering if there might be a specific setting, dependency version, or preprocessing step I might have missed that's specific to the Fakejob dataset.
Any help you could offer would be greatly appreciated!
Thanks