API/BUG: .loc tuple ambiguity with MultiIndex when nlevels == ndim

I [the human] asked Claude to post the following:

## Summary

When a DataFrame has a MultiIndex with `nlevels == ndim` (the most common case: a 2-level MI on a 2D DataFrame), tuple keys passed to `.loc` are fundamentally ambiguous between two interpretations:

1. **MI row key**: `df.loc[(a, b)]` means "the row with MultiIndex key `(a, b)`"
2. **Multi-axis indexing**: `df.loc[a, b]` means "row `a`, column `b`"

Python produces the same tuple `(a, b)` for both spellings. The current code resolves this ambiguity differently depending on context — getitem vs setitem, existing vs missing key, scalar vs slice — leading to a cluster of related bugs where the same syntax silently produces different results.

## Related issues

### Primary — same root ambiguity

| Issue | Status | Problem |
|-------|--------|---------|
| #14969 | Open | `df.loc[0, 0]` returns a DataFrame or a Series depending on the dtype of the inner MI level — identical syntax, different interpretation |
| #16396 | Open | `df.loc[1, 2]` uses the MI interpretation, but `df.loc[:1, 2]` and `df.loc[:, 2]` switch to multi-axis (column `2`) — incoherent |
| #27248 | Open | `df.loc[(1, 2019)] = [3, 4]` fails on setitem-with-expansion because the MI key `(1, 2019)` is misinterpreted as `(row=1, col=2019)` |
| #42603 | Open | `df.loc["foo", 0]` silently returns different things depending on whether `0` exists in MI level 1 — proposes an ambiguity error |
| #19110 | Open | `df.loc[existing_row, new_col] = val` adds a column instead of a (partial) row — missing-label priority is inconsistent with present-label priority |
| #17024 | Open | `df.loc['all'] = [5, 6]` on a MI DataFrame flattens the MI to tupled strings |

### See also

| Issue | Status | Notes |
|-------|--------|-------|
| #39775 | Open | KeyError semantics for partially-missing MI keys in slices — adjacent problem about what should happen when MI keys don't exist |
| #16018 | Closed | `cannot reindex from duplicate axis` on MI expansion (fixed in 1.3) |
| #22247 | Closed | MI expansion with NaN level copies wrong values (fixed) |

## Current behavior

### The resolution rule depends on context

The current code applies different heuristics depending on the operation and key type:

**getitem with scalar tuple** (`_getitem_lowerdim` → `_handle_lowerdim_multi_index_axis0`):
- Try MI key via `obj.xs(tup)`. If found → return row.
- If `KeyError` and `ndim < len(tup) <= nlevels` → re-raise (MI interpretation).
- If `KeyError` and `len(tup) == nlevels == ndim` → raise `IndexingError`, which is suppressed by the caller so the per-axis loop handles it as multi-axis.

This means getitem **silently switches interpretation** when a MI key is missing:

```python
mi = pd.MultiIndex.from_tuples([(1, 2), (3, 4)])
df = pd.DataFrame([[10, 20], [30, 40]], index=mi, columns=[100, 200])

df.loc[(1, 2)]    # → Series (MI row key found)
df.loc[(1, 100)]  # → Series (MI key not found → silently becomes row=1, col=100)
```

**getitem with slices** (`_getitem_tuple_same_dim`):
- Always multi-axis. `df.loc[:1, 2]` treats `2` as a column, even though `df.loc[1, 2]` treats it as MI level 1 (#16396).

**setitem with scalar tuple** (`_get_setitem_indexer`):
- Try MI key via `ax.get_loc(key)`. If found → overwrite row.
- If `KeyError` → suppress, fall through to `_convert_tuple` (always multi-axis). This is why #27248 fails — the missing MI key is reinterpreted as `(row, new_column)`.

### The result depends on dtype

Because the fallback to multi-axis only triggers when the MI lookup fails, and whether it finds a match depends on what values are in the MI levels, the **same syntax gives different results depending on dtype** (#14969):

```python
# String inner level → (0, 0) is NOT in the MI (level 1 is strings),
# so MI lookup fails and multi-axis takes over: partial level-0 key
# on both axes yields a DataFrame.
ind = pd.MultiIndex.from_product([[0, 1], ['A', 'B']])
df = pd.DataFrame(..., index=ind, columns=ind)
df.loc[0, 0]  # → DataFrame

# Int inner level → (0, 0) IS in the MI, so MI row lookup succeeds
# and returns that single row as a Series (indexed by the column MI).
ind = pd.MultiIndex.from_product([[0, 1], [0, 1]])
df = pd.DataFrame(..., index=ind, columns=ind)
df.loc[0, 0]  # → Series
```

## Cases that already work

The ambiguity only exists when `nlevels == ndim`. These cases work correctly:

- **Series with MI** (`ndim=1`): any tuple with `len > 1` exceeds ndim, so `_convert_tuple` raises `IndexingError("Too many indexers")` and the fallthrough correctly handles MI keys.
- **3+-level MI on DataFrame** (`nlevels > ndim=2`): same mechanism — `_convert_tuple` rejects the tuple and the fallthrough handles it.

## Why heuristics don't work

I explored several heuristic approaches to disambiguate setitem when `nlevels == ndim`:

1. **Check if last element is an existing column**: Fails for `df.loc[partial_key, new_col] = val` (simultaneous row + column expansion).
2. **Check type compatibility of each element with its MI level**: Fails when MI levels and columns share the same dtype (e.g., int/int MI + int columns).
3. **Combined type + column-dtype check**: Handles many cases but is fundamentally a guess — there exist DataFrames where both interpretations are type-valid.

The ambiguity is inherent to the API, not a deficiency in the implementation.

## The right API

I think the right long-term answer is: **`.loc` should not guess**. When `nlevels == ndim`, there should be one canonical interpretation, and the user should use explicit syntax for the other.

### Option A: MI key always wins

`.loc[(a, b)]` on a 2-level MI always means MI row key. Multi-axis indexing requires the explicit `df.loc[(a, b), :]` form (or partial keys like `df.loc[a, col]` when `a` is clearly a single level-0 value).

**Pros**: Consistent with getitem priority (MI key is tried first). Consistent with Series. The multi-axis form `df.loc[..., :]` is explicit and readable.

**Cons**: Behavior change for `df.loc[scalar, col]` when the scalar matches level 0 and `col` matches a column.

### Option B: Multi-axis always wins

`.loc[(a, b)]` on a 2-level MI always means `(row=a, col=b)`. MI row key indexing requires `df.loc[pd.IndexSlice[a, b]]` or `df.loc[(a, b), :]`.

**Pros**: No behavior change for existing multi-axis code. Consistent with how `_convert_tuple` already works.

**Cons**: Breaks `df.loc[(a, b)]` for MI row access, which is arguably the more natural reading. Inconsistent with Series and with the `nlevels > ndim` case.

### Option C: Raise on ambiguity

When `nlevels == ndim` and the key is a plain tuple, raise an error (as suggested in #42603) requiring the user to be explicit. Problem: Python offers no way to distinguish `df.loc[(a, b)]` from `df.loc[a, b]`, so we'd have to reject *all* plain tuple keys when `nlevels == ndim`, which is extremely disruptive.

### My recommendation: Option A

The key observation is that the MI-key interpretation is already the **first thing tried** in both getitem and setitem — the multi-axis path is always a fallback. Making the MI-key interpretation authoritative (rather than a try/except heuristic) would make the behavior predictable and consistent across getitem, setitem, scalars, and slices.

Users who need multi-axis indexing on a 2-level MI DataFrame can write `df.loc[(a, b), :]` or use `pd.IndexSlice`.

## How to get there

1. **Short term**: Document the ambiguity and the `df.loc[(a, b), :]` workaround in the MultiIndex docs.
2. **Medium term**: Add a `FutureWarning` when the ambiguous case is hit — when `nlevels == ndim`, MI key lookup fails, and the code is about to fall back to multi-axis. The warning should suggest explicit syntax.
3. **Long term**: Change the default to MI key interpretation when `nlevels == ndim`, removing the multi-axis fallback.

This would resolve #27248, #19110, #16396, #42603, and #14969. #17024 (MI flattening on scalar expansion) is a separate sub-bug in the expansion machinery.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API/BUG: .loc tuple ambiguity with MultiIndex when nlevels == ndim #65326

Summary

Related issues

Primary — same root ambiguity

See also

Current behavior

The resolution rule depends on context

The result depends on dtype

Cases that already work

Why heuristics don't work

The right API

Option A: MI key always wins

Option B: Multi-axis always wins

Option C: Raise on ambiguity

My recommendation: Option A

How to get there

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue	Status	Problem
#14969	Open	`df.loc[0, 0]` returns a DataFrame or a Series depending on the dtype of the inner MI level — identical syntax, different interpretation
#16396	Open	`df.loc[1, 2]` uses the MI interpretation, but `df.loc[:1, 2]` and `df.loc[:, 2]` switch to multi-axis (column `2`) — incoherent
#27248	Open	`df.loc[(1, 2019)] = [3, 4]` fails on setitem-with-expansion because the MI key `(1, 2019)` is misinterpreted as `(row=1, col=2019)`
#42603	Open	`df.loc["foo", 0]` silently returns different things depending on whether `0` exists in MI level 1 — proposes an ambiguity error
#19110	Open	`df.loc[existing_row, new_col] = val` adds a column instead of a (partial) row — missing-label priority is inconsistent with present-label priority
#17024	Open	`df.loc['all'] = [5, 6]` on a MI DataFrame flattens the MI to tupled strings

Issue	Status	Notes
#39775	Open	KeyError semantics for partially-missing MI keys in slices — adjacent problem about what should happen when MI keys don't exist
#16018	Closed	`cannot reindex from duplicate axis` on MI expansion (fixed in 1.3)
#22247	Closed	MI expansion with NaN level copies wrong values (fixed)

Uh oh!

API/BUG: .loc tuple ambiguity with MultiIndex when nlevels == ndim #65326

Description

Summary

Related issues

Primary — same root ambiguity

See also

Current behavior

The resolution rule depends on context

The result depends on dtype

Cases that already work

Why heuristics don't work

The right API

Option A: MI key always wins

Option B: Multi-axis always wins

Option C: Raise on ambiguity

My recommendation: Option A

How to get there

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions