Newmetric: ClassificationReport #3116

aymuos15 · 2025-06-03T12:59:19Z

What does this PR do?

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements) - Yes, details in issue number mentioned.
Did you read the contributor guideline, Pull Request section? - Yes.
Did you make sure to update the docs? - Not yet. Can do once everything is finalised in terms of code.
Did you write any new necessary tests? - Yes

PR review

Classification report is a nice to have metric. The motivation can be easily inspired from its scikit learn integration. Having it ready for tensors is definitely a good step forward.

Notes:

The original issue talks about having the option to include more metrics. I personally feel that is not required as this should be a quick stand alone way to get a lot of relevant (and usually required metrics info) quickly in a nice presentable way.
Maybe we need a more in depth discussion to expand the testing. I do not have access to a distributed mission at the moment, but if it is paramount, I can look into it.

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Was a really nice way of getting familiar with the codebase :)

📚 Documentation preview 📚: https://torchmetrics--3116.org.readthedocs.build/en/3116/

…mmaries | Initial Commit | Lightning-AI#2580

for more information, see https://pre-commit.ci

Borda · 2025-06-04T09:04:02Z

Great addition!
pls check failing due to a circular import

- Remove direct imports of classification metrics at module level - Implement lazy imports using @Property for Binary/Multiclass/MultilabelClassificationReport - Move metric initialization to property getters to break circular dependencies - Maintain all existing functionality while improving import structure

for more information, see https://pre-commit.ci

aymuos15 · 2025-06-06T12:15:10Z

Thank you very much! @Borda

Could you please help me understand why the TM.unittests are failing? They seemed to be linked with azure and I am not familiar with the error logs there.

Regarding the ruff failures, for some reason my local checks pass, so ill look into that. But is it okay if I tend to those once the actual code is vetted?

Sorry for the delay in getting back to, I had some immediate personal work to get done.

CC: @rittik9 I think it is ready now?

rittik9

@aymuos15 I had a quick look and opened a pr for fixing pre-commit errors.
Doctests regarding classification report seem to be failing, mind having look.
going to take a detailed look later.

fix pre-commit errors

for more information, see https://pre-commit.ci

aymuos15 · 2025-06-06T15:20:40Z

Thanks a lot for the quickfix @rittik9.

Will look into the doctests now.

Sorry about the close and open, i think the close automatically got triggered post the local merge for some reason? not sure.

src/torchmetrics/functional/classification/classification_report.py

for more information, see https://pre-commit.ci

rittik9

Hi @aymuos15 pls update the init.py files in the functional interface and docs.
You can refer #3090

rittik9 · 2025-06-12T16:34:51Z

src/torchmetrics/functional/classification/classification_report.py

+    )
+
+
+def multiclass_classification_report(


can we also provide top_k for multiclass?

rittik9

please remove the unnecessary comments

aymuos15 · 2025-06-12T17:14:51Z

Thank you very much for the review @rittik9.

I think most of them need significant change to be implemented. I will get this done over the weekend. Will make sure to remove the bad comments as well.

rittik9 · 2025-06-12T17:18:50Z

src/torchmetrics/classification/classification_report.py

In case of modular interface , while computing , why not call the functions from functional interface, reducing boilerplate

rittik9 · 2025-06-12T17:27:19Z

Hi @aymuos15 overall it looks pretty good to me, nice work. Just added a few comments. You can also update torchmetrics/src/torchmetrics/functional/__init__.py file.

…r accuracy | Revamp testing to make it more parametrised | Make the non functional part of metric more modular | add ignore_index support | top_k support for multiclass

for more information, see https://pre-commit.ci

aymuos15 · 2025-06-17T13:28:48Z

@rittik9 Thanks again for the detailed review.

I have tried to address everything bar the classwise accuracy score. That is inherently different from the scikit version of the metric. Are you okay if we raise a separate issue for that, discuss how exactly we want to add that, and then do a separate PR for that?

for more information, see https://pre-commit.ci

Borda

Overall it looks good, just think if it would be better if we can have it rather more as a wrapper so you can select what metrics you want to have in the report (probably pass it s string metrics/class names) and then have precision, recall, F-measure as default configuration

see suggestion by @rittik9 in #2580 (comment)

aymuos15 · 2025-06-18T13:25:19Z

Thank you very much for the review.

I definitely like the idea and agree with it.

However, I think I would have to pretty much revamp everything. Would it be okay to keep this as the base and proceed with that separately in another PR or should I just continue here?

Borda · 2025-06-18T13:28:54Z

However, I think I would have to pretty much revamp everything. Would it be okay to keep this as the base and proceed with that separately in another PR or should I just continue here?

you can continue here 🐰

aymuos15 · 2025-06-18T13:34:11Z

Haha, alright. Ill take some time to understand the best way to approach this and make the tests so edge cases do not come up. Thanks again!

rittik9 · 2025-06-18T14:51:06Z

Hi @aymuos15 While you're at it, could you please ensure that the 'micro' averaging option is included by default for all three cases—binary, multiclass, and multilabel? In case the results don't align with sklearn, we can always verify them manually.

aymuos15 · 2025-06-18T14:58:23Z

Yup sure. Thank you for reminding me.

Since anyways now we are going to include everything, ill make all options exhaustive.

Borda · 2025-06-23T21:11:28Z

Since anyways now we are going to include everything, ill make all options exhaustive.

We can figure out how to split it into smaller pieces so it would land smoother/faster

aymuos15 · 2025-06-23T22:29:10Z

That sounds good with me. Could you please let me know what that would entail?

This is what i had in mind:

TorchMetrics Classification Report Examples

============================================================
 BINARY CLASSIFICATION
============================================================
Predictions: [0, 1, 1, 0, 1, 0, 1, 1]
True Labels: [0, 1, 0, 0, 1, 1, 1, 0]

--- CASE 1: DEFAULT METRICS (precision, recall, f1-score) ---
             precision    recall  f1-score   support 

           0      0.67      0.50      0.57         4 
           1      0.60      0.75      0.67         4 

    accuracy                          0.62         8 
   micro avg      0.62      0.62      0.62         8 
   macro avg      0.63      0.62      0.62         8 
weighted avg      0.63      0.62      0.62         8 

--- CASE 2: ADD ACCURACY TO DEFAULT METRICS ---
             precision    recall  f1-score  accuracy   support 

           0      0.67      0.50      0.57      0.50         4 
           1      0.60      0.75      0.67      0.75         4 

    accuracy                                    0.62         8 
   micro avg      0.62      0.62      0.62      0.62         8 
   macro avg      0.63      0.62      0.62      0.62         8 
weighted avg      0.63      0.62      0.62      0.62         8 

--- CASE 3: SPECIFICITY ONLY ---
             specificity     support 

           0        0.75           4 
           1        0.50           4 

    accuracy        0.62           8 
   micro avg        0.50           8 
   macro avg        0.62           8 
weighted avg        0.62           8 

Verification - Direct accuracy calculation: 0.6250

============================================================
 MULTICLASS CLASSIFICATION
============================================================
Predictions: [0, 2, 1, 2, 0, 1, 2, 0]
True Labels: [0, 1, 1, 2, 0, 2, 2, 1]

--- CASE 1: DEFAULT METRICS ---
             precision    recall  f1-score   support 

           0      0.67      1.00      0.80         2 
           1      0.50      0.33      0.40         3 
           2      0.67      0.67      0.67         3 

    accuracy                          0.62         8 
   micro avg      0.62      0.62      0.62         8 
   macro avg      0.61      0.67      0.62         8 
weighted avg      0.60      0.63      0.60         8 

--- CASE 2: ADD ACCURACY TO DEFAULT METRICS ---
             precision    recall  f1-score  accuracy   support 

           0      0.67      1.00      0.80      1.00         2 
           1      0.50      0.33      0.40      0.33         3 
           2      0.67      0.67      0.67      0.67         3 

    accuracy                                    0.62         8 
   micro avg      0.62      0.62      0.62      0.62         8 
   macro avg      0.61      0.67      0.62      0.67         8 
weighted avg      0.60      0.63      0.60      0.63         8 

--- CASE 3: SPECIFICITY ONLY ---
             specificity     support 

           0        0.83           2 
           1        0.80           3 
           2        0.80           3 

    accuracy        0.62           8 
   micro avg        0.81           8 
   macro avg        0.81           8 
weighted avg        0.81           8 

Verification - Direct accuracy calculation: 0.6667

============================================================
 MULTILABEL CLASSIFICATION
============================================================
Predictions:
  Sample 1: [1, 0, 1]
  Sample 2: [0, 1, 0]
  Sample 3: [1, 1, 0]
  Sample 4: [0, 0, 1]
True Labels:
  Sample 1: [1, 0, 0]
  Sample 2: [0, 1, 1]
  Sample 3: [1, 1, 0]
  Sample 4: [0, 0, 1]

--- CASE 1: DEFAULT METRICS ---
             precision    recall  f1-score   support 

           0      1.00      1.00      1.00         2 
           1      1.00      1.00      1.00         2 
           2      0.50      0.50      0.50         2 

   micro avg      0.83      0.83      0.83         6 
   macro avg      0.83      0.83      0.83         6 
weighted avg      0.83      0.83      0.83         6 
 samples avg      0.88      0.88      0.83         6 

--- CASE 2: ADD ACCURACY TO DEFAULT METRICS ---
             precision    recall  f1-score  accuracy   support 

           0      1.00      1.00      1.00      1.00         2 
           1      1.00      1.00      1.00      1.00         2 
           2      0.50      0.50      0.50      0.50         2 

   micro avg      0.83      0.83      0.83      0.83         6 
   macro avg      0.83      0.83      0.83      0.83         6 
weighted avg      0.83      0.83      0.83      0.83         6 
 samples avg      0.88      0.88      0.83      0.83         6 

--- CASE 3: SPECIFICITY ONLY ---
             specificity     support 

           0        1.00           2 
           1        1.00           2 
           2        0.50           2 

   micro avg        0.83           6 
   macro avg        0.83           6 
weighted avg        0.83           6 
 samples avg                       6 

Verification - Direct accuracy calculation: 0.8333

Essentially

If anyone uses it off the shelf, it is exactly like scikit-learn
But then, we have the option to add whatever has been discussed till now as well.

Is the above okay?

I have not committed anything yet because the code is very messy. Once we agree on a path forward, I will trigger the next commit if thats okay.

Borda · 2025-06-24T07:35:53Z

I had rather in mind that we can trim this PR to keep the printing functions and table formatting, and in the following PR have extension of Collection metrics either as a new subclass or a new method...

aymuos15 · 2025-06-24T08:08:05Z

Ah okay! I will push a commit for the formatting and printing tonight. Thank you.

aymuos15 · 2025-06-24T18:35:53Z

@Borda

Keeping in mind the below rigidness for the micro avg

    # Determine if micro average should be shown in the report based on classification task
    # Following scikit-learn's logic:
    # - Show for multilabel classification (always)
    # - Show for multiclass when using a subset of classes
    # - Don't show for binary classification (micro avg is same as accuracy)
    # - Don't show for full multiclass classification with all classes (micro avg is same as accuracy)
    show_micro_avg = False
    is_multilabel = task == ClassificationTask.MULTILABEL

After going through everything again, I think the PR as of right now, it does exactly the bare minimum? The main additions after the initial few commits were the ignore_index and the micro avg. So Just a few quick questions:

Should I remove them as well and keep them for the follow up PR?
Do I keep the base of the collection metrics (as it is not implemented in the current commit) or leave that out as well?
The test file has multiple test covering all the bases, does that need to change?

Thank you very much!

Borda · 2025-06-25T09:46:23Z

Still, the Module-like metric is not derived from the collection to save some computations
I would just leave the functional part as the module-like part needs to be redone

aymuos15 · 2025-06-25T09:52:20Z

Okay, Thank you. So just to confirm -- I would only push src/torchmetrics/classification/classification_report.py and its corresponding files and tests?

aymuos15 added 2 commits June 3, 2025 13:41

feat: Add ClassificationReport metric for comprehensive evaluation su…

052a54e

…mmaries | Initial Commit | Lightning-AI#2580

Merge branch 'master' of github.com:aymuos15/torchmetrics

675119d

aymuos15 requested review from SkafteNicki, Borda and justusschock as code owners June 3, 2025 12:59

github-actions bot added the topic: Classif label Jun 3, 2025

pre-commit-ci bot and others added 2 commits June 3, 2025 12:59

[pre-commit.ci] auto fixes from pre-commit.com hooks

53e5ed0

for more information, see https://pre-commit.ci

Merge branch 'master' into master

0eaf395

aymuos15 and others added 5 commits June 6, 2025 11:46

adhering to formatting

8e36196

Merge branch 'Lightning-AI:master' into master

fded02b

Integrating changes since forking

714bea8

[pre-commit.ci] auto fixes from pre-commit.com hooks

c063783

for more information, see https://pre-commit.ci

fix pre-commit errors

20714e9

rittik9 mentioned this pull request Jun 6, 2025

fix pre-commit errors aymuos15/torchmetrics#1

Merged

4 tasks

Update classification_report.py

80ee0c8

rittik9 suggested changes Jun 6, 2025

View reviewed changes

Merge pull request #1 from rittik9/patch-6

b1fda09

fix pre-commit errors

aymuos15 closed this in aymuos15/torchmetrics#1 Jun 6, 2025

aymuos15 reopened this Jun 6, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

09cbb40

for more information, see https://pre-commit.ci

rittik9 reviewed Jun 6, 2025

View reviewed changes

src/torchmetrics/functional/classification/classification_report.py Outdated Show resolved Hide resolved

aymuos15 and others added 4 commits June 8, 2025 14:09

Fix doctests by adapting main docstrings

c7f7051

[pre-commit.ci] auto fixes from pre-commit.com hooks

0a0c5ff

for more information, see https://pre-commit.ci

Fix all doctests

e9a8b99

Merge branch 'master' of github.com:aymuos15/torchmetrics

8d0d249

rittik9 reviewed Jun 8, 2025

View reviewed changes

rittik9 reviewed Jun 12, 2025

View reviewed changes

aymuos15 and others added 3 commits June 17, 2025 14:23

Merge branch 'master' of github.com:aymuos15/torchmetrics

11add9c

[pre-commit.ci] auto fixes from pre-commit.com hooks

1a8625f

for more information, see https://pre-commit.ci

aymuos15 and others added 4 commits June 17, 2025 14:37

Merge branch 'master' into master

568aa00

Fix ruff errors

4c39b85

[pre-commit.ci] auto fixes from pre-commit.com hooks

c24aab0

for more information, see https://pre-commit.ci

Merge branch 'master' into master

5bdb132

Borda reviewed Jun 18, 2025

View reviewed changes

Merge branch 'master' into master

466517e

mergify bot added the has conflicts label Jun 30, 2025

Merge branch 'master' into master

ceb7517

mergify bot removed the has conflicts label Jun 30, 2025

		)


		def multiclass_classification_report(

Newmetric: ClassificationReport #3116

Are you sure you want to change the base?

Newmetric: ClassificationReport #3116

Uh oh!

Conversation

aymuos15 commented Jun 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Did you have fun?

Uh oh!

Borda commented Jun 4, 2025

Uh oh!

aymuos15 commented Jun 6, 2025

Uh oh!

rittik9 left a comment

Choose a reason for hiding this comment

Uh oh!

aymuos15 commented Jun 6, 2025

Uh oh!

Uh oh!

rittik9 left a comment

Choose a reason for hiding this comment

Uh oh!

rittik9 Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

rittik9 left a comment

Choose a reason for hiding this comment

Uh oh!

aymuos15 commented Jun 12, 2025

Uh oh!

rittik9 Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

rittik9 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aymuos15 commented Jun 17, 2025

Uh oh!

Borda left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aymuos15 commented Jun 18, 2025

Uh oh!

Borda commented Jun 18, 2025

Uh oh!

aymuos15 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rittik9 commented Jun 18, 2025

Uh oh!

aymuos15 commented Jun 18, 2025

Uh oh!

Borda commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aymuos15 commented Jun 23, 2025

Uh oh!

Borda commented Jun 24, 2025

Uh oh!

aymuos15 commented Jun 24, 2025

Uh oh!

aymuos15 commented Jun 24, 2025

Uh oh!

Borda commented Jun 25, 2025

Uh oh!

aymuos15 commented Jun 25, 2025

Uh oh!

Uh oh!

aymuos15 commented Jun 3, 2025 •

edited by github-actions bot

Loading

rittik9 commented Jun 12, 2025 •

edited

Loading

Borda left a comment •

edited

Loading

aymuos15 commented Jun 18, 2025 •

edited

Loading

Borda commented Jun 23, 2025 •

edited

Loading