[record_use] Implementation: A pool of `Reference`s

## Current State and Limitations

Currently, the `record_use` package models recorded usages by mapping a `Definition` to a list of `Reference` objects (e.g., `CallWithArguments`, `InstanceCreationReference`). Each `Reference` object contains both the semantic details of the usage (the receiver, the arguments) and the physical locations where that usage was observed (a list of `LoadingUnit`s).

```dart
// Current Model (Simplified)
class Recordings {
  final Map<Definition, List<CallReference>> calls;
  final Map<Definition, List<InstanceReference>> instances;
}

sealed class CallReference {
  final List<LoadingUnit> loadingUnits; // The "Where"
  final MaybeConstant? receiver;        // The "What"
  // arguments, etc.
}
```

This architecture has several limitations:

1.  **Inefficient Canonicalization and Merging:** Because `loadingUnits` are part of the `Reference` object's identity (its `operator ==` and `hashCode`), two semantically identical calls made from *different* loading units are treated as distinct, unequal objects. This makes it difficult to deduplicate identical calls and merge their loading unit lists.
2.  **Redundant JSON Serialization:** If a common function like `print('hello')` is called from 50 different loading units, the JSON output will repeat the call signature 50 times under the definition's `uses` list, each with a different loading unit.
3.  **Inconsistent Nesting:** `InstanceConstantReference` currently holds a pointer to its `Definition` (because it needs it to be self-describing when nested inside arguments), whereas top-level `CallWithArguments` objects do not hold their `Definition` (they rely on being values in the `Recordings.calls` map). This makes the `Reference` hierarchy irregular.

## Proposed Architecture

To solve these issues, we propose fully normalizing the data model by decoupling the *semantic content* of a reference from its *occurrence locations*, and making references fully self-describing.

The new model introduces three distinct layers:

1.  **Definition:** The target of the reference (e.g., a specific method or class).
2.  **Reference (The "What"):** A purely semantic, value-typed object. It contains the `Definition` being referenced, the receiver, and the arguments. It knows *nothing* about where it was called. Because it is purely content-based, identical calls will hash to the same value and can be canonicalized into a single instance.
3.  **ReferenceOccurrence (The "Where"):** A new container object that pairs a canonicalized `Reference` with the set of `LoadingUnit`s where it was observed.

```dart
// Proposed Model (Simplified)

// 1. The "What" (Fully self-contained, no locations)
sealed class Reference {
  final Definition definition;
}

final class CallWithArguments extends Reference {
  final MaybeConstant? receiver;
  final List<MaybeConstant> positionalArguments;
  // ...
}

// 2. The "Where"
class ReferenceOccurrence<T extends Reference> {
  final T reference;
  final Set<LoadingUnit> loadingUnits;
}

// 3. The Top-Level Container
class Recordings {
  final List<ReferenceOccurrence<CallReference>> calls;
  final List<ReferenceOccurrence<InstanceReference>> instances;
}
```

## JSON Representation (Normalized Pools)

This architectural shift allows us to introduce a `references` pool in the JSON format, analogous to the existing `constants` and `definitions` pools.

```json
{
  "loading_units": [ {"name": "lib1"}, {"name": "lib2"} ],
  "definitions": [ {"uri": "...", "path": ["print"]} ],
  "constants": [ {"type": "string", "value": "hello"} ],
  
  // NEW: A deduplicated pool of all distinct reference signatures
  "references": [
    {
      "type": "call_with_arguments",
      "definition_index": 0,
      "positional": [0] // points to "hello" constant
    }
  ],
  
  // UPDATED: 'uses' simply maps a canonical reference to its locations
  "uses": {
    "static_calls": [
      {
        "reference_index": 0,
        "loading_unit_indices": [0, 1] // Called from lib1 and lib2
      }
    ]
  }
}
```

## Impact Analysis

### Pros

1.  **Trivial Deduplication:** Canonicalizing the `Recordings` object becomes much simpler. You index `Reference` objects by their content. If a new occurrence has an identical `Reference`, you simply merge its `loadingUnits` into the existing `ReferenceOccurrence`.
2.  **Consistent Domain Model:** Every `Reference` is now self-contained (it knows its `Definition`) and acts as a true value type, resolving the inconsistency between top-level calls and nested instance constants.

I think the consistent domain model is the strongest argument here. Also in the various backends we first process all definitions, then we process all references. And only at the very end we figure out in which loading units those references end up.

### Cons

1.  **Increased Indirection:** Navigating the object graph requires moving through an additional layer (`Occurrence` -> `Reference` -> `Arguments`).

We will definitely want to have a different user-facing API. At least some `[]` methods on `Recordings` or a completely different user-facing API. The implementation of this will have some lookup maps.

I'm not immediately planning to do this refactoring. I'm noting this as a domain model cleanup that would make other things easier. cc @goderbauer @biggs0125 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[record_use] Implementation: A pool of `Reference`s #3157

Current State and Limitations

Proposed Architecture

JSON Representation (Normalized Pools)

Impact Analysis

Pros

Cons

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[record_use] Implementation: A pool of References #3157

Description

Current State and Limitations

Proposed Architecture

JSON Representation (Normalized Pools)

Impact Analysis

Pros

Cons

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[record_use] Implementation: A pool of `Reference`s #3157