Skip to content

Discrepancy in Fakejob Dataset Results with SmolLM-135M #6

@toandd2

Description

@toandd2

I'm working on reproducing the results from your ICLR 2025 paper and I've run into a snag with the Fakejob dataset.
According to Table 2 in the paper, the AUC-ROC for the Fakejob dataset using the SmolLM-135M model should be 0.800. However, after running the provided scripts, my results are quite a bit different. I'm consistently getting an AUC-ROC around 0.6286.
Here's the full output from my get_results.py run:

AUC-ROC: 0.6286 +- 0.0036 ( 1), AUC-PR: 0.1642 +- 0.0040 ( 1), F1: 0.2150 +- 0.0078 ( 1)  P: 0.2150 +- 0.0078 ( 1)  R: 0.2150 +- 0.0078 ( 1)

I've followed the run_anollm.sh script for the mixed benchmark experiments, and the results for the other datasets seem to be in line with what's reported in the paper. It's just the Fakejob dataset that's showing this significant difference.

Could you please provide some clarification on this? I'm wondering if there might be a specific setting, dependency version, or preprocessing step I might have missed that's specific to the Fakejob dataset.

Any help you could offer would be greatly appreciated!

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions