-
Notifications
You must be signed in to change notification settings - Fork 111
Description
Current State and Limitations
Currently, the record_use package models recorded usages by mapping a Definition to a list of Reference objects (e.g., CallWithArguments, InstanceCreationReference). Each Reference object contains both the semantic details of the usage (the receiver, the arguments) and the physical locations where that usage was observed (a list of LoadingUnits).
// Current Model (Simplified)
class Recordings {
final Map<Definition, List<CallReference>> calls;
final Map<Definition, List<InstanceReference>> instances;
}
sealed class CallReference {
final List<LoadingUnit> loadingUnits; // The "Where"
final MaybeConstant? receiver; // The "What"
// arguments, etc.
}This architecture has several limitations:
- Inefficient Canonicalization and Merging: Because
loadingUnitsare part of theReferenceobject's identity (itsoperator ==andhashCode), two semantically identical calls made from different loading units are treated as distinct, unequal objects. This makes it difficult to deduplicate identical calls and merge their loading unit lists. - Redundant JSON Serialization: If a common function like
print('hello')is called from 50 different loading units, the JSON output will repeat the call signature 50 times under the definition'suseslist, each with a different loading unit. - Inconsistent Nesting:
InstanceConstantReferencecurrently holds a pointer to itsDefinition(because it needs it to be self-describing when nested inside arguments), whereas top-levelCallWithArgumentsobjects do not hold theirDefinition(they rely on being values in theRecordings.callsmap). This makes theReferencehierarchy irregular.
Proposed Architecture
To solve these issues, we propose fully normalizing the data model by decoupling the semantic content of a reference from its occurrence locations, and making references fully self-describing.
The new model introduces three distinct layers:
- Definition: The target of the reference (e.g., a specific method or class).
- Reference (The "What"): A purely semantic, value-typed object. It contains the
Definitionbeing referenced, the receiver, and the arguments. It knows nothing about where it was called. Because it is purely content-based, identical calls will hash to the same value and can be canonicalized into a single instance. - ReferenceOccurrence (The "Where"): A new container object that pairs a canonicalized
Referencewith the set ofLoadingUnits where it was observed.
// Proposed Model (Simplified)
// 1. The "What" (Fully self-contained, no locations)
sealed class Reference {
final Definition definition;
}
final class CallWithArguments extends Reference {
final MaybeConstant? receiver;
final List<MaybeConstant> positionalArguments;
// ...
}
// 2. The "Where"
class ReferenceOccurrence<T extends Reference> {
final T reference;
final Set<LoadingUnit> loadingUnits;
}
// 3. The Top-Level Container
class Recordings {
final List<ReferenceOccurrence<CallReference>> calls;
final List<ReferenceOccurrence<InstanceReference>> instances;
}JSON Representation (Normalized Pools)
This architectural shift allows us to introduce a references pool in the JSON format, analogous to the existing constants and definitions pools.
{
"loading_units": [ {"name": "lib1"}, {"name": "lib2"} ],
"definitions": [ {"uri": "...", "path": ["print"]} ],
"constants": [ {"type": "string", "value": "hello"} ],
// NEW: A deduplicated pool of all distinct reference signatures
"references": [
{
"type": "call_with_arguments",
"definition_index": 0,
"positional": [0] // points to "hello" constant
}
],
// UPDATED: 'uses' simply maps a canonical reference to its locations
"uses": {
"static_calls": [
{
"reference_index": 0,
"loading_unit_indices": [0, 1] // Called from lib1 and lib2
}
]
}
}Impact Analysis
Pros
- Trivial Deduplication: Canonicalizing the
Recordingsobject becomes much simpler. You indexReferenceobjects by their content. If a new occurrence has an identicalReference, you simply merge itsloadingUnitsinto the existingReferenceOccurrence. - Consistent Domain Model: Every
Referenceis now self-contained (it knows itsDefinition) and acts as a true value type, resolving the inconsistency between top-level calls and nested instance constants.
I think the consistent domain model is the strongest argument here. Also in the various backends we first process all definitions, then we process all references. And only at the very end we figure out in which loading units those references end up.
Cons
- Increased Indirection: Navigating the object graph requires moving through an additional layer (
Occurrence->Reference->Arguments).
We will definitely want to have a different user-facing API. At least some [] methods on Recordings or a completely different user-facing API. The implementation of this will have some lookup maps.
I'm not immediately planning to do this refactoring. I'm noting this as a domain model cleanup that would make other things easier. cc @goderbauer @biggs0125
Metadata
Metadata
Assignees
Labels
Type
Projects
Status