ENH: `df.duplicated()` default behavior should default to tagging all duplicates as `True`

### Feature Type

- [ ] Adding new functionality to pandas

- [x] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

Hi, folks. I don't complain much about excellent open-source freeware, and I'm reasonably proficient at Pandas. I have about three years of solid use under my belt; before that, I had about six using Stata and am (was) a Stata maven. Pandas is generally much better...BUT...

`df.duplicated()` has what seems to me to be very confusing default behavior. In Stata, for example, one might use the equivalent duplicate-tagger function (in Stata, a "command") to locate people who are duplicates on a particular column to then hand inspect some of the rows to see if these are data entry errors or if, by contrast, one has perhaps missed a nuance and should expect a small number of duplicates on some col. This might sound sloppy, but when the codebook/docs for the DF sucks, this is a good way to figure out precisely what it might mean for a row to be a duplicate on some subset.

For that reason, I propose that `df.duplicated(keep=False)` should be the default behavior. I just spent around 30 minutes trying to figure out the problem with my code or possibly with the df, etc. because I was trying to locate people who are duplicates on a col that should in theory have a few duplicates but not many, and I could not actually see any duplicates after the following.

```
df["dupe"] = df.duplicated(subset=["year", "id"])
df.groupby("year").apply(lambda g :
    g["df"].mean())
df.loc[df["dupe"]==True].apply(foo)
```

Of course, I realized that this is because the default behavior is to *not* tag the first duplicate as a duplicate; in my case, since the multiplicity of a duplicate is almost always two, that meant that subsetting the data to only include duplicates was actually omitting those duplicates from view.

I did even skim the Docs, obviously inefficiently at first, but the wording *itself* seems extremely confusing. I did not realize that `keep=False` would solve my problems because this is not really about keep/drop behavior but tagging behavior. Changing the option's actual name to `tag` or something similar would probably lead to less confusion/make it more obvious that the default behavior is odd. 

On that note, another reason to change this, I think, is that there is no particular reason to expect the first duplicate to be the one to keep (at least without more context on the df; certainly, this is not true in general). This might make sense if it is a duplicate on all cols, but if the subset is anything but the full set of cols, this seems like behavior that might lead to very unexpected results. If someone does simply have outright duplicate records, it doesn't really cost significant time or memory (AFAICT) to then go back and set `keep="first"` once it's been confirmed that one might as well keep any record.

### Feature Description

I'm not a backend dev., merely a fairly tech-savvy sociologist. I think the technical side of this would be very easy.

### Alternative Solutions

People could read the docs very carefully before seeking other solutions.

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: `df.duplicated()` default behavior should default to tagging all duplicates as `True` #65320

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: df.duplicated() default behavior should default to tagging all duplicates as True #65320

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ENH: `df.duplicated()` default behavior should default to tagging all duplicates as `True` #65320