hello @hsiehjackson thanks your answer.
MV NIAH medium is considered correct, and QA agrees with your opinion.

It seems that for MV NIAH Easy, you used the "part" matching method, right?
However, the code uses "all". Please take a look at this section of code.

def prepare_task_for_ns(output_folder, task):
    """Adding proper __init__.py"""
    output_folder = Path(output_folder) / task
    Path(output_folder).mkdir(parents=True, exist_ok=True)
    with open(output_folder / "__init__.py", "w", encoding="utf-8") as init_file:
        if task in ["mk_niah_medium", "mk_niah_hard"]:
            metrics_type = "multichoice"
            eval_args = "++eval_type=multichoice"
        elif task in ["mv_niah_medium"]:
            metrics_type = "ruler2"
            eval_args = "++eval_type=ruler2 ++eval_config.match_type=2steps"
        elif "qa" in task:
            metrics_type = "ruler2"
            eval_args = "++eval_type=ruler2 ++eval_config.match_type=part"
        else:
            metrics_type = "ruler2"
            eval_args = "++eval_type=ruler2 ++eval_config.match_type=all"

Here, the string_match_all_single function has been revised as follows, and it was found that in "all", the edit distance does not take effect.

    def string_match_all_single_fixed(self, predsx, refs):
        """the metric function with input (predictions: [str], references: [[str]]) to compute score."""

        preds = predsx.split("\n\n")
        print(len(preds))
        print(len(refs))

        conscr = list()
        for ref in refs:
            score = 0
            if ref in predsx:
                score = 1
            conscr.append(score)
        
        scavg = sum(conscr) / len(refs)
        ww = list()
        for ref in refs:
            wx  = list()
            for pred in preds:
                wresc = 1 - self.wer([pred.lower()], [ref.lower()])
                wx.append(wresc)
            wm = max(wx)
            ww.append(wm)
        wscr = sum(ww) / len(refs)
        score = max(scavg, wscr)
        return score

Final score

512: mv_niah
|███ Basic (74.25%) |█ Easy 0.7743) |██ Medium (62.71%) |▌ Hard (43.87%)
1024:mv_niah
mv_niah Basic 67.75% mv_niah_basic
mv_niah Easy 0.541% mv_niah_easy
mv_niah Medium 44.76% mv_niah_medium
mv_niah Hard 23.59% mv_niah_hard

Trend Anomalies in Ruler V2 #1143

Description

Content

How to close

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions