Description
Upgrade Debug Info Model with Type Info
This issue proposes an upgrade to the debug info feature to implement some of the capabilities proposed in the original debug info feature request (#1917) but not yet implemented. A prototype implementation exists for gdb (PR to follow) and work is under way to implement equivalent functionality for Windows.
Proposed additions
Issue #1917 specified several desirable aspects of debug info support that were (deliberately) omitted from the original implementation. This issue addresses the following missing aspects:
- size and numeric/logical type info for Java primitive types/data
- file and line, size, layout and super relationship info for Java object types/data
- size and layout info for Java array types
- extends and implements relationship info for Java interface types
- location and type info for Java primitive and object values stored in the heap region (i.e. Java static data)
As a corollary to inclusion of information regarding Java object types it should also be possible to embed debug info detailing Java methods and fields in the debug info for the classes to which they belong:
- link method and field debug info to the debug info for associated class
Expected Benefits
This will provide a much richer debugging experience for anyone wishing to debug a GraalVM native image. With type info included in the debug info output it should be possible to perform the following functions:
- printing of primitive values
- structured (field by field) display of Java objects
- casting/printing objects at different levels of generality
- access through object networks via path expressions
- reference by name to static field data
Examples of proposed usage in gdb
Using gdb on Linux these debugger capabilties are supported by the prototype implementation as follows:
- Cast values to specific types
(gdb) file hello
. . .
(gdb) break Hello::main
. . .
(gdb) run Andrew
. . .
Breakpoint 1, Hello::main(java.lang.String[]) () at Hello.java:43
43 Greeter greeter = Greeter.greeter(args);
(gdb) p/x (('java.lang.String[]' *) $rdi)
$1 = 0x7ffff7c01028
- Describe types using the ptype command
(gdb) ptype $1
type = class java.lang.String[] : public _arrhdrA {
java.lang.String *data[0];
} *
(gdb) ptype _arrhdrA
type = struct _arrhdrA {
java.lang.Class *hub;
int len;
int idHash;
}
(gdb) ptype 'java.lang.Object'
type = class _java.lang.Object : public _objhdr {
public:
void Object(void);
boolean equals(java.lang.Object *);
private:
int hashCode(void);
java.lang.String *toString(void);
} *
(gdb) ptype _objhdr
type = struct _objhdr {
java.lang.Class hub;
}
(gdb) ptype 'java.lang.CharSequence'
type = union java.lang.CharSequence {
java.lang.AbstractStringBuilder _java.lang.AbstractStringBuilder;
java.lang.StringBuffer _java.lang.StringBuffer;
java.lang.StringBuilder _java.lang.StringBuilder;
java.lang.String _java.lang.String;
java.nio.CharBuffer _java.nio.CharBuffer;
}
- Print instances field by field using the print command
(gdb) p/x *$1
$2 = {
<_arrhdrA> = {
hub = 0x906b30,
len = 0x1,
idHash = 0x0
},
members of _java.lang.String[]:
data = 0x7ffff7c01038
}
- Traverse the object network using path expressions
(gdb) p/x *$1->hub
$3 = {
<_java.lang.Object> = {
<_objhdr> = {
hub = 0x912bd8
}, <No data fields>},
members of _java.lang.Class:
name = 0x9fdc88,
isAnonymousClass = 0xa71a90,
. . .
}
(gdb) p/x *$1->data[0]
$4 = {
<_java.lang.Object> = {
<_objhdr> = {
hub = 0x90c670
}, <No data fields>},
members of _java.lang.String:
value = 0x7ffff7c01198,
hash = 0x0,
coder = 0x0
}
(gdb) p *$1->data[0]->value
$5 = {
<_arrhdrB> = {
hub = 0x90e0a8,
len = 6,
idHash = 0
},
members of _byte []:
data = 0x7ffff7c011a8 "Andrew"
}
(gdb) x/s $1->hub->name->value->data
0x9047a8: "[Ljava.lang.String;"
- Refer to static field data by name
(gdb) ptype '_java.math.BigDecimal'::BIG_TEN_POWERS_TABLE
type = struct java.math.BigInteger[] : public _arrhdrA {
java.math.BigInteger *data[0];
} *
(gdb) p/x *'_java.math.BigDecimal'::BIG_TEN_POWERS_TABLE
$6 = {
<_arrhdrA> = {
hub = 0x919898,
len = 0x13,
idHash = 0xb3939bc
},
members of _java.math.BigInteger[]:
data = 0xa6fd00
}
(gdb) p/x *'java.math.BigDecimal'::BIG_TEN_POWERS_TABLE->data[0]@2
$7 = {{
<_java.lang.Number> = {
<_java.lang.Object> = {
<_objhdr> = {
hub = 0x919c70
}, <No data fields>}, <No data fields>},
members of _java.math.BigInteger:
mag = 0xa5b3a8,
signum = 0x1,
bitLengthPlusOne = 0x0,
lowestSetBitPlusTwo = 0x0,
firstNonzeroIntNumPlusTwo = 0x0
}, {
<_java.lang.Number> = {
<_java.lang.Object> = {
<_objhdr> = {
hub = 0x919c70
}, <No data fields>}, <No data fields>},
members of _java.math.BigInteger:
mag = 0xa5b3a8,
signum = 0xffffffff,
bitLengthPlusOne = 0x0,
lowestSetBitPlusTwo = 0x0,
firstNonzeroIntNumPlusTwo = 0x0
}}
Note that this implementation relies on a mapping of Java types to an underlying C+ type model that is similar enough to allow the program to be debugged (see next subsection). It should be possible to provide many of the same capabilities on Windows by basing the PECOFF debug info on a comparable mapping or by relying on support for Java debugging in the target Windows debugger.
Note also that automatic resolution of program names to associated program values is only proposed for static field names (and method names). Adding location info to allow resolution of parameter and local var names to current live is withheld until the next phase of debug info enhancement.
Constraints on the generated DWARF info model
The above examples show the need for the Linux implementation to work around the fact that gdb does not currently provide support for debugging Java per se. In the long run a suitable DWARF info model for Java needs to be defined with support added to gdb. Pending such a full solution this problem has been resolved by mapping the Java class base to an equivalent C++ model i.e. it fakes what it can.
So,
- Java reference types like java.lang.String, java.lang.String[] or java.lang.CharSequence are modelled as C++ pointer types (DWARF address types) which point to to underlying structure types. Java class and array types map to pointers to a C++ class type (DWARF info class record). Interfaces map to a pointer to a C++ union type (DWARF info union record). The Java type names which, in Java, identify the oop reference (pointer) type are instead used to identify the underlying C++ structure type.
- Subclassing is modelled by declaring a C++ public inheritance relationship between these underlying layout types.
- Methods and fields are modelled as members of the layout types.
- Object layout types all include an object header struct (_objhdr) as their innermost inherited super which ensures they all include a hub reference as the first component.
- Object layout types embed their own fields and methods and recursively inherit fields and methods inherited form their super up to _objhdr
- Array layout types include an array header type appropriate to their base element type (_arrhdrA, _arrhdrZ, etc) which includes the hub, length and idHash fields and any padding needed to round up to the base of the array data.
- Array types only include one non-inherited member, a field of (C++/DWARF) array type of length 0 whose base type is either a Java primitive type (C++/DWARF base type) or a Java reference type (C++ pointer/DWARF address type).
- Interface layout types are modelled as a union of all the layout types for classes which implement the interface. This allows a reference typed to an interface to be cast to the relevant implementation type by accessing the relevant union element. Union fields are named using corresponding Java type name prefixed with
_
.
Problem when using Isolates
The current prototype relies on object references embedded in object fields actually being stored as pointers. This allows gdb to read the field value directly as the address to the linked object. This is the correct model if -H:-SpawnIsolates
is specified on the command line. With the default setting -H:+SpawnIsolates
field references are actually offsets from the heap register ($r14
on Linux/x86_64). The prototype implementation has not yet identified a means for such base relative pointers to be described to gdb, allowing them to be transformed from offsets to addresses. This problem is still under investigation.