Discrepancy in Fakejob Dataset Results with SmolLM-135M

I'm working on reproducing the results from your ICLR 2025 paper and I've run into a snag with the _Fakejob_ dataset.
According to Table 2 in the paper, the AUC-ROC for the _Fakejob_ dataset using the SmolLM-135M model should be **0.800**. However, after running the provided scripts, my results are quite a bit different. I'm consistently getting an AUC-ROC around **0.6286**.
Here's the full output from my `get_results.py` run:
```bash
AUC-ROC: 0.6286 +- 0.0036 ( 1), AUC-PR: 0.1642 +- 0.0040 ( 1), F1: 0.2150 +- 0.0078 ( 1)  P: 0.2150 +- 0.0078 ( 1)  R: 0.2150 +- 0.0078 ( 1)
```

I've followed the `run_anollm.sh` script for the mixed benchmark experiments, and the results for the other datasets seem to be in line with what's reported in the paper. It's just the _Fakejob_ dataset that's showing this significant difference.

Could you please provide some clarification on this? I'm wondering if there might be a specific setting, dependency version, or preprocessing step I might have missed that's specific to the _Fakejob_ dataset.

Any help you could offer would be greatly appreciated!

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrepancy in Fakejob Dataset Results with SmolLM-135M #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancy in Fakejob Dataset Results with SmolLM-135M #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions