Skip to content

Roadmap to Jacobin source code

Andrew Binstock edited this page Mar 28, 2022 · 24 revisions

This is a roadmap to the source code for Jacobin as of March 2022. (The symbol ❒ indicates work to be done)

The JVM consists of a handful of fundamental parts:

  • a command-line parser
  • a classloader
  • an execution engine
  • garbage collection (GC) This is handled by go's built-in GC.
  • many libraries for doing additional tasks

In a program in another language, the source tree would reflect this division of labor. However, go is persnickety about package layouts and will refuse to compile if packages have mutual dependencies. So the source tree is divided in part according to the preceding and in part based on what's possible in a go project structure.

Command-line parsing

Jacobin handles a subset of the JVM's command-line options. Currently, it accepts many basic options (those that are displayed with java -help, identifies the class name, and captures/passes in any program arguments. It also accepts command-line options from environmental variables, as described here.

The initial handling of the command-line interface (CLI) is in jvm/cli.go The parsing of JVM options and setting of corresponding switches is done in jvm/option_table_loader.go. This file loads a table with all JVM options that Jacobin responds to and for each entry includes a first-class function that is executed when the option is specified. Instructions int option_table_loader.go explain how to add switches when they are supported in Jacobin.

❒ Jacobin expects a class name to end in .class, whereas the OpenJDK JVM expects no extension. Follow the JVM convention. (required)\
❒ Support for JAR (and EAR, WAR) files. (required)
❒ Support for the -cp and -classpath options (required)
❒ Separate parsing routines for switches that begin with the -X: and -XX: prefixes (not urgent)

Classloader

A classloader consists of a utility that parses a class file into useful fields, does some format checking, and places the parsed and checked data into the method area of the JVM. Jacobin's classloading is done in the classloader module. The heavy lifting is done by classloader.go, which begins with many of the data structures used in classloading and then has a variety of different functions for loading a class. The most important of these is loadClassFromFile().

loadClassFromFile() calls parseAndLoadClass(). This function calls the class-file parser (parse() in parser.go), then format-checks the parsed file via a call to formatCheckClass() in formatCheck.go, and if all is successful, posts the parsed and format-checked data to the method area.

In addition to loading the classes of the app, Jacobin (like the OpenJDK JVM) pre-loads a series of base classes from the JDK, which cover basic Java functionality. The location of those classes is specified by the JACOBIN_HOME environmental variable. This is handled by LoadBaseClasses() in classloader.go.

Another set of classes is preloaded, namely the classes referenced by the main application class. These are preloaded by LoadReferencedClasses() in classloader.go. It is called at program start-up by JVMrun() in jvmStart.go, which is the main line that drives Jacobin. This preloading is done in parallel with the execution of the main class, via the use of a channel and a wait group. (Note: originally this parallelization was a proof of concept and the idea was to also parallelize the preloading of the base classes. However, on standard hardware, the roughly 1400 base classes load in Jacobin in less than 500ms, so it's no clear that parallelizing would add much in terms of performance, although it's certainly something to explore whenever optimization becomes a primary goal.)

Parsing the class file, as mentioned above, is driven by parse() in parser.go. The class file consists of three main parts: some basic fields, the constant pool, and the attribute area. Java methods are attributes and found in the attribute area. parse() parses the initial fields. The constant pool (uniformly referred to in the code as CP) is parsed in cpParser.go At present, the parsing of the CP should be complete and all Java 11 record types are parsed as completely as needed for Jacobin. The code, however, is cumbersome due to go's lack of support for generics (even the basic generics in go 1.18 don't satisfy the needs of the CP parser). If need arises, this page can be expanded to explain how the CP data is parsed and, especially, how its data is stored for fast access.

Methods are complex attributes in Java. They are parsed by ParseMethods() in classloader/methodParser.go. Despite being attributes themselves, methods have attributes. These are incompletely parsed at present--sufficiently for Jacobin to function properly, but not enough to provide advanced features, such as debugging, etc.

Several class-file attributes of at present secondary importance are not parsed. These include attributes relating to annotations, modules, packages, etc.

The format-checking performed by the Jacobin classloader both exceeds the JVM spec in some ways and is incomplete in other ways. The goal of format checking is to make sure that the class file has not been accidentally altered. It's an integrity check done by validating data fields against each other, validating sequences of items, data ranges, etc. It's required of all classes. Validation, in JVM terms, is a separate process that entails more advanced checking to make sure that there is no malicious insertion of code. Per the JVM spec, class files are validated at various points in the initialization process (except for classes taken from the JDK/JRE, which are presumed to not be malicious.) Jacobin presently does very few of these checks and assumes any class it executes is legitimate.

❒ pre-load classes from Java 11 SDK, rather than from unzipped classes in JACOBIN_HOME (required). This will also close GitHub issue #7
❒ parse all remaining class and method attributes (eventually, not a priority)
❒ plan out the extent to which initialization, linking, and preparation steps need to be done and how to best sequence them (priority)

Java classes implemented in go

The classloader contains several Java classes that have been partially implemented in go. For example, javaPrintStream.go These classes are explained here. When developing a JVM from scratch, it's beneficial to implement some Java classes in the development language. For example, println() in Java is a complex set of operations. Because an incipient JVM engine might not have all the capabilities to perform these operations, yet still want to print messages to the console, it's customary to write println() in the JVM implementation language. This was done in javaPrintStream.go, which implements various forms of println(). The comments in that file explain the way this is implemented.

The Java classes implemented in go are referred to as go-style functions. (They would normally be called native functions, but that is a term of art in Java, with a different meaning. Calling them go-type functions doesn't work well either, as type is an already overloaded word. Style might not be the permanent way of referring to this. In fact, the open issue JACOBIN-131 refers to this naming problem for methods.)

Go-style functions are loaded into Jacobin's method table (generally referred to as the MTable, or MT). The call to load the MTable with the go-style functions is made from the StartExec() function, which is where the actual bytecode execution begins. This is located in the jvm package in run.go. The MTable is part of the execution engine but is in the classloader package due to dependency requirements that go imposes. The MTable holds the name of a function in fully qualified form (e.g., java/lang/System.currentTimeMillis()), a pointer to the function body, and a flag stating whether the function is Java-style or go-style. The first time a function is searched for by the execution engine, a pointer to it is placed in the MTable. On future look-ups, the engine checks the MTable first before doing the slower search through the class data. Because the engine always checks the MTable for a method, inserting the go-style methods into it guarantees that they, rather than their Java counterparts, will always be executed.

The use of go-style substitutes for Java functions should also have performance benefits--in essence, it acts as a method cache--and so it's likely to continue in use even after the initial stages of Jacobin are complete.

Note that this MTable is JVM-wide: all executed functions are cached in it. This is not how the MTable is usually implemented. More commonly, it's a per-class structure holding the functions addresses for that class and for all the methods in that class's superclasses. This form of the MTable enables a look up of all possible methods a class can execute without forcing the engine to climb the hierarchy of superclasses.

Eventually, when the class-specific MTable equivalent is added to Jacobin, it might make sense to get rid of the present MTable setup. It's not clear what exactly should be done and what the performance impacts are/would be. This will need to be addressed when handling task JACOBIN-120 (calling a superclass method), but it might well deserve its own entry in the task list.

❒ Design per-class MTables when implementing calls to superclass methods ❒ Decide what to about the two kinds of MTables

Clone this wiki locally