by Archie L. Cobbs archie@dellroad.org.
Copyright © 2004, Archie L. Cobbs. All rights reserved.
This manual documents version 1.4.7 of the JC virtual machine.
Permission is granted to reproduce this work in any form for non-commercial purposes only.Permission is granted to modify this work, granted that any modifications are clearly attributed to their respective author(s).
JC is an open source software library that implements the Java 2 virtual machine (JVM) specification. A distinguishing characteristic of JC when compared to other JVM implementations is its method of converting Java class files into C program source and header files using the Soot Java bytecode optimization and analysis framework, compiling these source files into ELF objects using GCC, and then loading the resulting ELF objects (containing both executable code and class meta-data and cached in the filesystem) into memory at runtime using a custom ELF loader. JC also includes a bytecode interpreter and supports execution in either or mixed modes. JC stands for "Java? C!"
JC can be thought of as a virtual machine implementation with a "way ahead of time" bytecode compiler. The process of converting Java class files into C source code (which is itself implemented in Java) and then into ELF object files is slow compared to a typical JIT engine, but the compiled code that results is highly optimized. Since JC uses the powerful Soot bytecode analysis framework, sophisticated "way ahead of time" optimizations such as method inlining and escape analysis for stack allocation of non-escaping objects are possible.
JC can also be thought of as a virtual machine with a JIT compiler that is slow but produces highly optimized machine code, with the additional feature that once a class has been JIT compiled the resulting machine code is cached in the filesystem in an ELF object file for subsequent invocations of the JVM.
The primary goal in developing JC has been to have fun tackling a programming project that encompasses many interesting challenges in software design, such as virtual machines, multi-threading, code analysis, compilation and linking, code clarity vs. complexity, even documentation :-) After playing around with a few other open source JVM's, I wanted to have one of my own so I could quickly fix any pressing bugs (which always seem to come up, open source or not) and in general have a warm fuzzy feeling and pride of ownership. Any success or practical use for JC is therefore somewhat coincidental.
However, the design of JC was inspired by a particular set problems and this makes JC more applicable to some domains than others. The idea originally grew out of a desire to be able to write software for embedded systems in Java rather than C, which is usually the only option. Embedded systems typically have these characteristics:
While not claiming to be a complete solution, JC does have many
attributes that address these issues. The ability to pre-compile
ELF objects is one. By implementing a few platform-specific hooks,
JC can in theory be made to run on any architecture targetable by
GCC (which is all of them, isn't it?). Almost all code that runs
in a JC virtual machine, including Java methods, is compiled C code
and therefore debuggable with GDB. The generated C source files
are human readable, editable, and recompilable (e.g., you can insert
printf()
's if you want). JC's heap attempts to be compact.
Once compiled into ELF objects, Java class files are no longer
needed and can be dispensed with. Using the Soot framework,
sophisticated optimizations are possible that allow execution speed
to approach the speed of C++. Finally, JC is relieved of having to
understand and generate machine instructions for each architecture,
instead leaving that work up to the experts (i.e., GCC hackers).
This makes the overall design easier to understand and Java execution
problems much easier to debug.
In general, JC tries to favor correctness over speed. There are many possible optimizations that would speed up execution but also add code complexity and make bugs harder to track down. The idea here is that for many applications (especially in embedded systems) robustness, stability, and avoiding random crashes is more important that achieving that last 10% speed increase.
An acknowledgment is due here. JC borrows heavily from Etienne Gagnon's innovative SableVM Java virtual machine implementation. Several ideas have been blatantly stolen from SableVM, including bidirectional object layout, the object lockword design and spin-free thin lock algorithm, per-class loader memory allocation, and thread management. In fact, without SableVM as an example to work from this project would probably have never been attempted. Of course, any errors in translation are mine.
Some design features of JC include:
pthread(3)
library, including Thread.interrupt()
, Thread.suspend()
, Thread.resume()
, and Thread.stop()
.
JC has a few bugs and limitations.
The most important limitation is that when compiled execution is enabled JC will not load two different class files defining the same class at the same time (necessarily this requires two different class loaders). More precisely, JC will not load two class files defining the same class but having different contents. This situation does not occur often but certainly can, depending on the application. The reason for this limitation is simply because of the priorities of the project compared to the work required to fix it. For example, JC is relieved of many of the security checks (i.e., loading contraints) that would otherwise required. More importantly, this limitation allows the code generator to make assumptions allowing optimizations that would not otherwise be possible.
JC uses the GNU Classpath class library, and so does not implement any classes that are not (yet) part of it.
JC does not support strictfp
floating point math.
JC assumes that C pointers are written to and read from memory atomically (this may not be true on all architectures).
Java volatile long
and volatile double
types are supposed
to be accessed atomically, but we don't enforce that.
There are surely lots more bugs to be discovered. However, JC was written carefully with an eye toward making debugging easier. Hopefully bugs that do exist should be relatively straighforward to track down and fix.
There are probably lots of better JVM's out there. I've intentionally ignored most of them while writing JC because it's more fun that way, I'm lazy, and I'm not being graded on this. GCJ is clearly relevant. Many other VM implementations are listed on the Classpath web site.
Of particular relevance is the paper Java-through-C Compilation: An Enabling Technology for Java in Embedded Systems, which describes a project similar to JC that has a more embedded systems focus. Their JVM does not support some Java features such as class loading, reflection, and threads. However, they put more work into optimizing the Soot code generation with impressive code speed and code size results.
This section gives an overview of what JC contains and how it works.
JC gives you a fairly complete Java VM implementation, including these components:
jc(1)
command line binary for creating a VM and invoking
the main()
method of the named class.
libjc
library containing the VM innards with a fully
implemented JNI native and invocation API.
jcjavah
utility for creating JNI and JCNI header files
and C file stubs.
cfdump
class file dumper utility.
jcgen
utility for pre-generating ELF object files.
The basic idea in JC code generation is to take class files, convert
them into ELF objects containing executable code for the methods
plus some meta-data about the class, and load and link these objects
at runtime. This conversion is done by Java software in the
org.dellroad.jc
and
org.dellroad.jc.cgen
packages,
with the help of GCC. The Java code generates C source files from
Java class files and compiles them into ELF objects using GCC. This
conversion can either be done ahead of time, or on demand during
execution. If done ahead of time, GCC is not required to be present
on the system.
A more direct conversion to ELF than via C code, including a pure Java
implementation, is certainly possible. To do this, one only has to
implement the
org.dellroad.jc.ObjectGenerator
Java interface.
JC will automatically invoke the class named by the
jc.object.generator
system property (which must implement that
interface) at runtime when an ELF file needs to be generated.
Although the ELF objects don't necessarily have to derive from C
source files, it is easiest to describe them and the way that they
are structured in terms of the C language. As long as the ELF objects
appear as if they came from C source files, they will still
work. Normally, for each class foo.Bar
a "foo/Bar.c" source
file and a "foo/Bar.h" header file are generated.
ELF files may be pre-generated to avoid long pauses during execution
(see jcgen).
However, if JC attempts to load a class and a valid ELF file for that class
cannot be found, it will automatically attempt to generate one
(unless the jc.object.generation.enabled
system property is set
to false
, in which case a LinkageError
is thrown).
Since this process involves executing Java methods, those methods
may also not be found in valid ELF objects. In that case, JC reverts
back to interpreted execution.
If you like reading C source, you may want to look at the header file
jc_defs.h
.
This file defines all of the structures and
macros used by the generated C files. Also, you may look at the
generated header file for a class to see how the various class-specific
structures are defined. These include the class vtable, object
structure (which includes all superclass structures), class fields
structure, etc. Generated C source files show how the macros in
jc_defs.h
are used to implement Java in C.
Each Java method has a corresponding C function in the ELF file; the function parameters are a thread pointer, "this" object pointer (for non-static methods), and method parameters (object pointers and primitive values).
Objects in memory are just C structures. All objects contain an object "head" which consists of a lock word and a pointer to the object's combined type and vtable structure, for a total of two machine words. Primitive fields are laid out upwards in memory from the head, while reference fields are laid out downwards. This idea comes from SableVM and simplifies garbage collection, because the references in an object are contiguous. A class' static fields are contained in a single "global" C structure.
Meta-data in the ELF file consists of structures describing each
field, method, and the class vtable (for non-interface classes) and
type information. Most fields in these structures are initialized
when the ELF file is created, while others are filled in when the
ELF file is loaded at run time. The class type structure points to
hash tables used for instanceof
and interface method dispatch.
Interface method lookup is done as follows: the method name, parameter types, and return type are hashed at code generation time. At run-time, this hash value is used to look up the corresponding method in the object's interface method dispatch hash table. As an optimization, a separate copy called the "quick" hash table is kept containing those hash table buckets with only one entry (usually the case). So in the common case, one extra pointer dereference is required over normal virtual method dispatch. See SableVM for a more clever method requiring zero extra pointer dereferences that was not implemented in JC.
ELF symbols follow a specific format. All support functions provided by
the VM itself begin with the prefix _jc_cs
(for "JC C support").
Other symbols include for example _jc_java_lang_Float$class_object
,
which resolves to the java.lang.Class
instance for class
java.lang.Float
.
See Appendix A for a complete list of these symbols.
Other symbols such as printf
that can be found by the dynamic linker
will resolve correctly as well, so it's possible to edit the generated C
files and add debugging statements.
That is basically it. Almost all code that executes in JC is C code, either as part of the core JC VM, a generated ELF object, or a JNI native library.
Java is a very dynamic language, supporting late binding of symbolic references, introspection, etc. JC includes some basic checks to prevent "stale" ELF files from being loaded, which could cause incorrect behavior. Each ELF file is required to contain a list of the names of all classes on whose class files it depends, along with a hash value of the actual bytes in those class files (our hash algorithm is to take the last 64 bits of the MD5 hash). When a candidate ELF file is loaded, all dependent class files (including of course the class file corresponding to the class itself) are acquired by JC and their hash values compared. If any fails to match, the ELF object is considered stale and must be regenerated.
The way JC generates dependencies currently is very conservative. There are lots of changes that trigger regeneration but shouldn't. For example, if class A references a field in class B, and that field's name (but not its type) is changed, then class A's ELF object remains perfectly valid but JC will regenerate it anyway.
This means that if you recompile for example java.lang.Object
class,
you're in a world of hurt because every ELF object becomes instantly
invalid.
However JC simply verifies the hashes, it doesn't do any further checking.
So it's entirely possible to take a generated C file, edit it manually,
recompile it with GCC, and JC will still load it. Such "hacks" can
also be incorporated directly into the Java source using a special
construct recognized by the code generator (analogous to GCC's asm()
statement). Support for doing this is not currently implemented but has
been done in the past and is fairly trivial at the Soot level.
The Java code that generates C source (which lives in the
org.dellroad.jc.cgen
package) has an analogous mechanism whereby
the generated source file contains hash values in special comment lines
that are checked before the source is compiled. However, this is purely
a matter for the org.dellroad.jc.cgen
code and invisible to
the VM itself.
The astute reader may wonder how exactly a VM proactively "acquires"
class files from user-defined class loaders in order to check their
hash values for classes that are not otherwise supposed to be loaded
and still maintain correct semantics. This is tricky because there
is no such method byte[] ClassLoader.getClassFile()
, there
is only Class ClassLoader.loadClass()
, and to return a
Class
object you have to have loaded the class (not just
acquired the class file).
JC solves this problem by trying to load the class normally. If we
get as far as defineClass()
, then we've got the class file
and we don't care what happens next. If the load subsequently fails
(e.g., because of a ClassCircularityError
we artificially
created by loading classes out of order) we simply discard the
exception. On the other hand, if the class is loaded successfully,
there's no problem there either. Because JC does not allow class
files to change, we can never have a situation where a user class
loader would have changed its mind and returned a different class
file had it been loaded later on.
The above may seem like somewhat of a hack. However, in my opinion
the "real" design flaw is the ClassLoader
API, which allows
user class loaders to get "too close" to the VM and requires the
VM to do all sorts of checks to verify the user class loader is not
misbehaving. It would have been simpler if all that was required
of a user class loader was to override a method like byte[]
ClassLoader.getClassFile()
, and let the VM and
java.lang.ClassLoader
handle the rest.
JC is installed in two steps. First, GNU classpath must be built and
installed. Then JC is configured with --with-classpath=DIR
,
where DIR
is the Classpath installation root directory,
built and installed. Typically Classpath and JC will have the same
installation root, but this is not required. JC works with the
"stock" installation of Classpath.
The JC installation hierarchy looks something like this:
PREFIX/bin/
jc
and JC specific utilities such as cfdump
,
a class file dumper; jcjavah
, a JNI or JCNI native code header
file generator; and jcgen
, a shell script for pre-generating
ELF objects.
PREFIX/etc/jc.conf
PREFIX/include/jc/
PREFIX/info/
PREFIX/lib/
PREFIX/lib/jc/obj/
PREFIX/share/jc/
PREFIX/share/jc/doc/
PREFIX/share/jc/doc/api/
PREFIX/share/jc/src/
~/.jc_src/
~/.jc_obj/
The PREFIX/share/jc/src
directory hierarchy contains
pre-generated source files generated from the JC, Soot, and Classpath
class files. When JC is installed, it regenerates any source files
that differ from the pre-generated ones that ship with the source
distribution. This process requires an existing Java runtime, such
as the JDK. The number of differences can be minimized by using
the specified version of Classpath, installing it in /usr/local
,
and using jikes version 1.22 to compile it.
The libjc
shared library contains the heart of the JC VM and
is installed in PREFIX/lib
.
The jc
binary accepts command line flags in either short or
long form; run jc --help
for more info. Any flag can also
be specified (in long form) in either the system wide
PREFIX/etc/jc.conf
file, or in the per-user
~/.jc
file. In these files, omit the --
prefix
and put each flag on a separate line. Blank lines and lines starting
with a "#" character are ignored. Here's an example:
# sample ~/.jc file classpath=.:/home/archie/classes:/home/archie/foobar.zip property=jc.heap.size=134217728 property=jc.resolve.native.directly=true #verbose=jni,jni-invoke,gc,compile,exceptions,code-gen verbose=gen,gc #property=jc.gen.inline.verbose=true
When JC needs to generate new C source files and ELF objects, it
will by default put them in ~/.jc_src
and ~/.jc_obj
,
respectively.
Change the jc.source.path
and jc.object.path
system
properties to change these defaults.
Note that these properties are overloaded: they are actually search paths
for source files and objects, but also the first component in the search
path is where new source files and objects are stored.
So for example if you recompiled a Java class file, causing the previously
generated source files to become obsolete, but you didn't want them to
be overwritten, you could prepend a directory to your jc.source.path
.
Any older files that were still valid would still get re-used, and
newly generated files would end up in the prepended directory,
thus "hiding" the older files.
One warning: if you modify the bootstrap class path (you shouldn't
need to), it is important that the JC class files (in
PREFIX/share/jc/jc.zip
) appear ahead of the Classpath
class files that they override. Otherwise the VM will fail to start.
This is the result of a deliberate choice to not modify the stock Classpath
build and install process (as a result, tracking changes and updates
in Classpath is much easier).
The JC command line is fairly straightforward and self-explanatory.
Running jc --help
gives this output:
Usage: jc [flag ...] <classname> [param ...] -c, --classpath=path Set application class loader search path -b, --bootclasspath=path Set bootstrap class loader search path -l, --librarypath=path Set native library search path -p, --property=name=value Set a system property -v, --verbose=opt1,opt2,... Enable verbose output: class=Class loading, jni=Native libraries, gc=Garbage collection, exceptions=Exceptions, resolution=Class resolution, gen=ELF object generation, jni-invoke=Native method calls, init=Class initialization, obj=ELF object loading. -L, --loadlist=filename Record loaded ELF objects in file -j, --jar Execute main class of JAR file -X, --show-options Show additional options -V, --version Display version and exit Help options: -?, --help Show this help message --usage Display brief usage message
In addition, JC supports these "JDK standard" command line flags to remain compatible with shell scripts, etc:
-classpath
--classpath
.
-cp
--classpath
.
-Dfoo.bar=jam
--property foo.bar=jam
.
-version
--version
.
-help
--help
.
-jar
--jar
.
-mx
N, -Xmx
N
--property jc.heap.size=
N
.
-ms
N, -Xms
N
--property jc.heap.initial=
N
.
-ss
N, -Xss
N
--property jc.stack.default=
N
.
-Xint
--property jc.object.loader.enabled=false
.
-Xnogen
--property jc.object.generation.enabled=false
.
-Xnoln
--property jc.include.line.numbers=false
.
-Xobj
--property jc.without.classfiles=true
.
-Xrnd
--property jc.resolve.native.directly=true
.
-client
-server
-hotspot
There are several system properties that affect JC's execution. See Appendix B for a detailed list.
By default, JC will load existing ELF objects and generate new ones as necessary. That is, by default the interpreter is disabled.
If you want to use existing ELF objects but interpret a class when
an ELF object for it doesn't exist, invoke JC with
--property jc.object.generation.enabled=false
or -Xnogen
.
This results in "mixed mode" execution.
Note that while executable methods (i.e., ELF loaded methods) execute must faster than interpreted ones, there is a small but non-zero penalty whenever control transfers between interpreted and executed methods. So "mixed mode" execution doesn't come completely free.
If you only want interpreted methods, disable the ELF loader completely
by setting --property jc.object.loader.enabled=false
or -Xint
.
This eliminates any possibility of code generation delays.
You can even control code generation at runtime, using methods in the
java.lang.Compiler
class.
If your goal is the fastest possible performance, here are some suggestions:
--disable-assertions
flag (as of version 1.4.0 this is now the default).
jc.resolve.native.directly
system property to true
(you can use the -Xrnd
command line flag).
jc.without.classfiles
set to true
(you can use the -Xobj
command line flag).
This has the side effect of disabling on-demand object generation.
jc.include.line.numbers
set to false
(you can use the -Xnoln
command line flag).
Alternately, run strip -g foo.o
on all objects from which you
want to remove line number information.
jc.heap.granularity
and
jc.stack.default
.
jc.gen.inline.*
properties
when generating code.
Overview of Objects in JC
JC objects are defined by C structures. For example, here are some
portions of the generated header file for the class java.util.ArrayList
.
ArrayList
extends AbstractList
, which extends
AbstractCollection
, which extends Object
:
#include "java/util/AbstractList.h" // Typedefs typedef struct _jc_java_util_ArrayList$sub_refs _jc_java_util_ArrayList$sub_refs; typedef struct _jc_java_util_ArrayList$sub_nonrefs _jc_java_util_ArrayList$sub_nonrefs; typedef struct _jc_java_util_ArrayList$refs _jc_java_util_ArrayList$refs; typedef struct _jc_java_util_ArrayList$nonrefs _jc_java_util_ArrayList$nonrefs; typedef struct _jc_java_util_ArrayList$object _jc_java_util_ArrayList$object; typedef struct _jc_java_util_ArrayList$vtable _jc_java_util_ArrayList$vtable; typedef struct _jc_java_util_AbstractList$vtype _jc_java_util_AbstractList$vtype; // Vtable struct _jc_java_util_AbstractList$vtable { _jc_java_lang_Object$vmethods java_lang_Object; _jc_java_util_AbstractCollection$vmethods java_util_AbstractCollection; _jc_java_util_AbstractList$vmethods java_util_AbstractList; }; // Vtype struct _jc_java_util_AbstractList$vtype { _jc_type type; _jc_java_util_AbstractList$vtable vtable; }; // Reference instance fields (subclass) struct _jc_java_util_ArrayList$sub_refs { _jc_object_array *data; }; // Reference instance fields (object) struct _jc_java_util_ArrayList$refs { _jc_java_util_ArrayList$sub_refs java_util_ArrayList; _jc_java_util_AbstractList$sub_refs java_util_AbstractList; _jc_java_util_AbstractCollection$sub_refs java_util_AbstractCollection; _jc_java_lang_Object$sub_refs java_lang_Object; }; // Non-reference instance fields (subclass) struct _jc_java_util_ArrayList$sub_nonrefs { jint size; }; // Non-reference instance fields (object) struct _jc_java_util_ArrayList$nonrefs { _jc_java_lang_Object$sub_nonrefs java_lang_Object; _jc_java_util_AbstractCollection$sub_nonrefs java_util_AbstractCollection; _jc_java_util_AbstractList$sub_nonrefs java_util_AbstractList; _jc_java_util_ArrayList$sub_nonrefs java_util_ArrayList; }; // Object instance structure struct _jc_java_util_ArrayList$object { _jc_java_util_ArrayList$refs refs[0]; _jc_word lockword; _jc_java_util_ArrayList$vtype *vtype; _jc_java_util_ArrayList$nonrefs nonrefs; };
You can see in the
_jc_java_util_ArrayList$object
structure how the subclass fields
grow outward from the object head, with primitive fields growing upward
and reference fields growing downward.
Note the zero length refs
field. Pointers to an obect will
always point to the lockword
field. Members of refs
are accessed using the syntax obj->refs[-1]
. So to access a
primitive field in C, e.g., ArrayList.size
, you would write
obj->nonrefs.java_util_ArrayList.size
. To access a reference
field in C, for example ArrayList.data
, you would write
obj->refs[-1].java_util_ArrayList.data
. The
jc_defs.h
include file provides convenience macros for these and other operations.
Arrays have a length field, and grow upward for primitive array types and downward for reference array types.
Object lockwords encode the following information:
java.lang.ClassLoader
objects, weak reference objects, etc.)
The lockword structure and thin lock algorithm is borrowed from SableVM.
In summary, acquiring an uncontested object lock can be done in a single
compare-and-swap operation and no mutex needs to be involved. If an object
lock is contested, or Object.wait()
is invoked, then the thin lock is
"inflated" to become a fat lock with an associated mutex. The mutex,
if any, is reclaimed when the object is recycled during garbage collection.
JC execution is fairly straightforward. When generated Java code is running normally, things look very C-like. Virtual method dispatch is done through vtable function pointers. Non-virtual and static methods are called directly. Interface methods are dispatched through function pointers in a hashtable. Objects are pointers to C structures, etc.
JC uses a small bit of processor specific assembly code to
construct C function calls dynamically at runtime. This is used
for calling out to compiled Java methods and JNI native methods
from within libjc
itself.
The JVM specification gives VM implementors freedom when choosing when symbolic references in Java class files are resolved, but JC's symbolic resolution process is simplistic. All symbolic references are resolved before the class is initialized. The advantages to this are zero resolution runtime overhead (once a class is initialized), a simple implementation, and more deterministic execution. The main disadvantage is that all other classes referenced by a class must be loaded before the class can be used, even if those other classes are never actually needed at runtime. This is especially annoying when ELF objects for these classes must be generated as a result.
Implementing more "on demand" class resolution is possible by adding runtime resolution checks at the appropriate points. However, these add overhead. A zero-overhead approach might use unmapped memory pages to trigger segfaults for unresolved symbols (e.g., just leave unresolved symbols pointing to zero). The difficulty is that the code would have to be patched up (to replace the symbolic reference with the resolved memory address) while one or more threads is half-way through executing the instruction sequence.
As an optimization, if the jc.resolve.native.directly
system
property is true
, then when resolving a class, any references
to JCNI native methods are resolved directly. That is, instead of
resolving the symbolic reference to point to the native method's
generated C stub function that in turn calls into libjc
to
find and execute the native method at runtime, the native method
is resolved immediately and the symbolic reference is resolved to
point directly at the native method. This works because the calling
conventions for generated C code and JCNI native method code are
exactly the same. The result is zero execution overhead for these
native method calls. The minor downside is that these native method
invocations disappear from Java stack traces (i.e., one frame is
missing in the stack trace), and they are not displayed under the
jni-invoke
verbose flag. In effect, the VM is completely
unaware that native methods are being called.
Because JC uses preempting pthreads, periodic checks must be added to each generated method at backward branches. To optimize this frequent operation, we dedicate a single page of memory to be the "check page". When one thread wants all other threads to "check in" it simply unmaps the page. The periodic check that each thread performs is to read and discard a word from the page. When the page is unmapped, these reads cause a SEGV signal which JC catches. By looking at the fault address, JC knows that the thread needs invoke the periodic check function.
The periodic check checks these things: first, whether another thread
has requested a "stop the world" operation; secondly, whether another
thread has suspended this thread using Thread.suspend()
; and finally,
whether another thread has cross-posted an exception into this thread
(e.g., via Thread.stop()
).
Java classes must be initialized (i.e., static variables
given their initial values and any static initializer <clinit>
method invoked) at the first "active use", where active use is defined
as creating an instance of the class, invoking any method of the class,
or referencing any static variable of the class. This is really only
an issue for static methods and static fields. In all other cases, you
must have an object so we can do the check at instantiation time.
JC emits manual checks in the code for active use at the appropriate points in the code, which involves reading a word from memory and checking the relevant bit in that word (an optimization would be to dedicate a separate word for this bit and remove the bit mask operation). Unfortunately this check is performed every time, even after the class has been initialized. An optimization that trades space for time would have a separate check address page for each class.
JC currently includes a simple optimization whereby active use checks are omitted if the class is the same class as the current method. This eliminates a large number of checks. A better optimization would be to eliminate necessarily redundant active use checks via more sophisticated Soot code analysis.
JC includes SableVM's "stop the world" algorithm with some modifications, the main one being to unmap the check page to force other threads to do a periodic check. Stop the world is used before garbage collection for example.
Although not strictly required, for now JC also stops the world when one thread needs to suspend another thread or cross-post an exception in another thread. The cost of the stop the world operation is one signal delivery and a couple of function calls per stopped thread. Instead, we could have a separate check address per thread, but that would require more memory and slow down the check operation.
In any case, because threads perform periodic checks at each backward branch, the latency caused by stopping the world is low, i.e., the other threads will stop very quickly.
JC detects stack overflow by unmapping a guard page at the end of each thread's stack. If touched, the thread receives a SEGV signal and the fault address tells us that we need to throw a StackOverflowError.
One subtlety is this: is it possible for a thread to somehow miss the guard page entirely and walk onto the next page? The only way this could happen is if a method has enough local variables to cover an entire page of memory: not likely, but in theory it could happen. Code could easily be added to foil such a method but has not been.
JC provides a rudimentary debugging interface in the form of SIGUSR1. When this signal is received, JC spits out some interesting summary information about live threads and active class loaders to the console. This is useful for debugging application deadlocks, etc.
GDB can be used normally. JC provides some useful macros in
etc/gdb.userdef
.
For methods that catch exceptions, sigsetjmp()
is used to catch them;
siglongjmp()
is used inside libjc
to throw exceptions.
The value returned in the jump indexes into an array
of GCC local function labels containing the possible trap handlers
for that method (created with the _JC_DEFINE_TRAPS()
macro).
If a method catches any exceptions, any locals that can be used
after catching the exception are marked volatile
in the generated C code
to avoid being "clobbered." JC performs flow analysis to determine which
variables can possibly be used after the exception and only marks
those variables as volatile
.
When executing within libjc
itself, exceptions may be posted but
are never thrown. Instead, an error return value indicates the
presence of a posted exception. At the appropriate "gateway" functions
between Java execution and JC internal execution, any exceptions
are (re)thrown or caught as needed. When configured with assertions enabled,
JC will verify that all signals occur within Java code and not libjc
.
To generate nice stack traces like Java users expect, we need to be able to walk the frames on the "Java stack", which in JC is just a subset of the frames on the C stack, as well as being able to determine the line number and source file associated with each Java stack frame. This presents several challenges.
First, we can't rely on our ability to follow saved frame pointers when
walking through native code and signal stack frames.
To solve this problem, we only assume frame pointers
work for generated Java code and libjc
itself. Each time we
are about to launch into native code we "clip" the top of the
Java stack by saving the "last known good" frame pointer. Each
time we are entered from native code we establish a new contiguous
frame pointer sequence and link it to the previous one. The result
is that by using these linked frame sequences we "leap over" the
uncrawlable portions of the C stack. This is done using the
_jc_java_stack
structure; each such struture defines a
contiguous sequence of crawlable stack frames.
To generate Java stack traces, we need to be able to (a) determine the program counter (PC) in each C stack frame, and (b) map PC values to Java methods and line numbers. For (a), a little machine dependent code is used.
In the case of (b), things are trickier. We can easily correlate the Java line numbers with the generated C line numbers, because Soot gives us this information. The hard part is mapping a PC address to a method and then a specific C source file line.
To determine the method from the PC, we maintain a splay tree containing all loaded methods, sorted by function address. We determine the ending address of each function by examining the ELF symbol, which has an associated size attribute. Given a PC, a binary search yeilds the corresponding method.
To determine the C source file line number, we parse the ELF debug sections
generated by GCC when the -g
flag is given. Currently JC knows
how to parse stabs and DWARF2 formats. If an ELF object doesn't contain
any line number information (or we don't know how to parse it), the
result is simply that stack traces don't contain line numbers for that
class' methods. Including debugging information increases the size
of generated ELF files, and parsing this information does take time,
so JC makes both operations optional.
The jc.include.line.numbers
system property controls JC's
behavior with respect to line numbers. If set to true
(the
default), JC will parse ELF debug sections to determine line numbers,
and it will include line number support in any newly generated
objects. Note that in order to show line numbers for a given class,
two conditions must be satisfied: the class' ELF object must have
been compiled with line number support, and the current invocation
of JC must be with jc.include.line.numbers
set to true
.
To build the Java to C source line mapping, at appropriate points in
the C function we create a new entry in a list of
_jc_linenum_info
structures for that method using a GCC local
label. This technique is something of a hack, because we are relying
on the fact that GCC lays out static data sequentially in the
resulting ELF object. It is best demonstrated with examples from
jc_defs.h
and java.util.AbstractList
:
/* Line number table entry */ struct _jc_linenum { _jc_uint32 cline; /* c line number */ _jc_uint16 jline; /* java line number */ }; /* Append a line number table entry */ #define _JC_LINE_NUMBER(_jline) \ do { \ static volatile _jc_linenum _linenum \ __attribute__ ((section(".data"))) = { \ .cline= __LINE__, \ .jline= _jline, \ }; \ } while (0) static _jc_linenum_info _jc_java_util_AbstractList$linenum_table$addAll$28b94b9c09be8eeb[] = { }; jboolean _jc_java_util_AbstractList$method$addAll$28b94b9c09be8eeb(_jc_env *const env, _jc_java_util_AbstractList$object *this, jint param0, _jc_object *param1) { jboolean z0; jint i1; struct _jc_object *r2; struct _jc_java_lang_Object$object *r3; ... _JC_LINE_NUMBER(181); // 0 i3 = param0; param0 = param0 + 1; r3 = _JC_INVOKE_INTERFACE(env, r2, _JC_JLONG(0x5874f79a1cf66ea), _jc_object *); _JC_INVOKE_VIRTUAL(env, java_util_AbstractList, add$9953f092f36e3303, this, i3, r3); _JC_LINE_NUMBER(182); // 1 i1 = i1 + -1; ... }
The result is that the _jc_java_util_AbstractList$linenum_table$addAll$28b94b9c09be8eeb[]
array contains a list of PC, line number pairs for all points in the function
where the Java line number changes. Given a PC value, we map it to the
corresponding C line number using the debug information, then map from
C line number to Java line number using the linenum_table
table.
When a signal is caught, we rely on a bit of machine dependent code to give us the stack frame and PC where the exception occurred so we can compute the top Java stack frame.
In order to determine what method on the Java stack catches a thrown exception, we have to determine if the exception occured within the code corresponding to the original bytecode range specified. To this end, JC computes the different regions of exception trap coverage within each generated function. A local variable in the exception catch frame is updated whenever necessary, so that when an exception is thrown the catch frame contains the region the potentially catching method was currently in. JC uses this region, along with the type of exception, to match against the method's exception trap list.
It is tempting to try to re-use the line number table for this purpose. After all, it also tells us information about where we are in the function. However, this approach fails in the face of GCC optimizations that scramble and fold the generated machine code. For example, when GCC collapses two C code fragments that exit the function into one, it becomes impossible to tell from the saved PC address alone which of the original two C code fragments is the "real" one.
The method Thread.stop()
requires the ability for
one thread to post an exception in another thread. In JC this is done
using an atomic compare-and-swap followed by thread notification.
Each thread has a "cross posted" exception field which is set by
other threads and cleared by the target thread when it handles the
exception. Atomic operations are used to prevent exceptions from
being lost.
After posting a cross-posted exception in a target thread, the target thread is notified that it needs to "check in" using the same mechanism used by the stop-the-world operation. That is, all threads end up checking in even though only one may need to. An optimization that trades space for time would have a separate check address page for each thread. See Thread periodic checks for a description of how the check address is used.
JC detects null pointer dereferences and divide-by-zero using UNIX
signal delivery, so no run-time penalty is assesed.
The signal handler simply throws the exception normally using
siglongjmp()
.
In any Java VM throwing exceptions from within the VM itself
without getting yourself into trouble can be tricky, especially
for OutOfMemoryError
s. Here is JC's approach:
The net result is that OutOfMemoryError
s are thrown very
reliably, with stack traces, because they get to draw from the
reserved area. Also, because of the multiple GC attempts, typically
only one thread will throw an OutOfMemoryError. When it does, it
frees up memory that other threads can then use, so they don't throw
one also.
Also, the "out of memory" flag is not cleared in a thread until the
next GC cycle. If you clear it too soon (e.g., immediately after
you create the OutOfMemoryError
instance), then often the
thread will throw another one. The extra time gives it a chance to
"clean up" after itself before restricting it back to the non-reserved
portion of the heap.
The heap layout is what you might call a homebrew design. The main design goals were (a) no handles (all object references are direct pointers), (b) non-copying (for speed), and (c) compactness (i.e., utilization efficiency), ideally somewhat tunable. E.g., a split design where objects are copied between semi-spaces and half of the heap memory is unusable did not fit these criteria. On the other hand, my knowledge of the state of the art in GC design is pretty limited. The resulting design is about as fancy a GC design as I could come up with without doing a lot of boring research :-)
Another design goal was to have a good impedance match with the typical workstation virtual memory paging subsystem. The atomic unit of memory in a typical UNIX virtual memory system is the page. On i386 a page of memory is 4096 bytes. By trying to match the boundaries within the heap to page boundaries, and trying to put "high activity" objects in the same page, we hope to limit the amount of page swapping. The latter is done simply by putting objects of the same size together, under the assumption that if some subset of the objects in the heap are "high activity" then that subset probably involves a small number of object sizes. Of course, for an embedded system with no virtual memory subsystem, none of this would apply.
The heap layout consists of some number of contiguous pages of memory. Heap pages are divided into three types: "free", "small" and "large". The state of each page is stored in the first word of that page. There is also a transient state "allocated" used to prevent race conditions when two threads try to grab a free page at the same time.
"Small" pages are subdivided into two or more blocks of equal size. The first word in each block indicates the status of that block as either "free", "allocated" (a transient state) or occupied. This status word does not cost us anything, because the values for "free" and "allocated" (which are odd) can't possibly be the first word of a real object. So when an object is in the block, to indicate that the block is occupied the status word is simply omitted.
There are various block sizes ranging from the smallest block (two
words, the size of a java.lang.Object
instance) up to half
the size of a memory page. Each block size is a fixed percentage
bigger than the next smaller block size (rounded up to the minimum
alignment); the percentage determined by the system property
jc.heap.granularity
.
For each small block size, we maintain a "hint" that points to the next free block. To allocate from a small page, threads allocate the block pointed to by the hint using an atomic compare-and-swap operation, then update the hint. If the compare-and-swap fails, the thread grabs the (updated) hint and tries again.
After a GC cycle, all small pages containing blocks of the same size which have any free blocks left are linked together in a linked list. This list is traversed first when allocating new blocks of that size. This ensures small pages are used as efficiently as possible and don't become too fragmented.
For non-array objects, the block size is pre-computed when the class is loaded, using a binary search on the available block sizes. For array objects, the block size depends on the length, and is computed when the array is created.
Objects are never moved, so "holes" can develop within small pages. As long as more objects of that block size are subsequently allocated the holes will get quickly filled. More block sizes means less wasted space per object (on average) but also more holes and/or partially filled pages. Fewer block sizes means more wasted space per object (on average) but fewer holes. When a small page ends up containing nothing but holes after a GC cycle, it is recycled.
An interesting experiment would be to see what heap granularities result in the most efficient use of memory for different applications.
Also interesting would be to test the intuition that applications that put heavy pressure on the heap (by creating and freeing lots of objects) typically are doing so for a small number of object sizes. If that were true, then the heap activity would be confined to that portion of the heap containing the relevant block sizes, instead of "spamming" the entire heap memory and causing lots of virtual memory activity. That is, test the value of maintaining a loose affinity between object size and memory page.
Objects that don't fit into a small page block (i.e., objects larger than half a page of memory) are allocated in one or more contiguous "large" pages of memory. A large page is just a page of memory wholly dedicated to (some portion of) a single object.
Only the first large page in a large page range contains the initial descriptor word indicating the status of the page as "large". This word also contains the number of pages in the range.
If an object is smaller than the block it lives in, and the object contains a lot of references, then we insert a "skip word" as the first word of the object block. This is simply a word that tells us how many references we need to skip over to reach the head of the object (i.e., the lockword). This is useful for going from heap block to object pointer. A skip word is distinguishable from the "free" and "allocated" heap block words, as well as from object references.
JC includes a straightforward "stop the world" mark and sweep non-moving garbage collector. Thanks to bi-directional object layout, object references in an object are always contiguous. This has two beneficial side effects for garbage collection. First, tracing an object does not require knowing its type; we just need to know how many references it has. This information is contained in the object lockword (an "overflow" value means we have to look at the object's type).
Secondly, the stack space required to do a GC trace can be made smaller, because instead of keeping a stack of references to trace, we keep a stack of pointers to lists of references (all such lists are terminated by the always-odd object lockword). So the last "twigs" of the GC live object trace tree never need to be put on the stack.
The JC garbage collector supports weak, soft, and phantom references. All weak references are cleared when their referents are no longer strongly referenced. Depending on an "urgency" parameter, soft references are treated either like weak references or like strong ones; therefore, we never have to explicitly check for weak reference referents being softly referenced. Reference enqueing and object finalization is handled by a dedicated finalizer thread. This thread is only awoken after a GC cycle if there is any work for it to do.
JC also supports stack-allocation of objects. See Stack Allocation for details. Stack-allocated objects are traced normally during GC but of course never collected.
JC supports class loader unloading and includes special handling for Class objects and class loaders during garbage collection.
Each class loader has its own memory area, another idea borrowed from SableVM. Class objects for classes loaded by that loader are allocated in this area instead of the heap. During GC, this "blob" of memory is considered as if it were a single giant object in the heap. That is, it is only marked once during the GC trace. Each class loader has a dedicated bit meaning "visited during the GC trace."
Class loaders and their corresponding Class objects implicitly refer to each other. Loaded classes implicitly refer to all other types that they symbolicly reference, and any interned String objects. So for each class loader, we maintain a combined master list of all implicit references from classes loaded by that loader to other objects in the heap. When a class loader "blob" is traced during garbage collection, this list, combined with all the explicit references from Class objects and static fields, is used as the list of references from the "blob" to other heap objects.
Note that because class loaders and their defined classes are only unloaded when all are unreachable, implicit references between two classes loaded by the same class loader, or between any class and its associated class loader don't need to be explicitly maintained in this list. In other words, because the "blob" is large many of the references in it point back within itself and we can ignore them. This is another important benefit of per-class loader memory allocation.
If the gc
verbose flag is turned on, JC will print out
lots of interesting statistics after each GC cycle.
JC generates the root set by conservatively scanning each thread's stack. We must check not only for references into the heap, but also into per-class loader memory. For the former, we must find the head of the corresponding object, so we know where its references are. That is, we conservatively accept references to anywhere inside an object, not just to the head of the object (in practice, such "offset" references are probably not very common because GCC doesn't usually generate them). This is possible because we know the heap layout.
For pointers into class loader memory, just knowing that the class loader's
"blob" is referenced is sufficient. We then add the loader's
java.lang.ClassLoader
instance to the root set.
Conservative stack scanning requires being able to snapshot all of
the CPU registers in any thread. Fortunately, this is easy to do
via the getcontext(3)
function. On systems that don't
have getcontext(3)
, a self-delivered signal acomplishes the
same thing.
JC includes a bootstrap class loader capable of reading class files from directories and ZIP/JAR files. JC includes full support for user-defined class loaders as well.
Each class loader has its own dedicated area of memory that is
separate from the Java heap. This memory contains loaded ELF objects,
java.lang.Class
instances, and various other bits of information
associated with that class loader, such as native library information, etc.
The total amount of memory that can be allocated for class loaders
is determined by the jc.loader.size
system property.
The associated java.lang.ClassLoader
instance for user-defined
class loaders, however, lives in the normal Java heap.
Per-class loader memory is treated as a stack: only the most recently allocated chunk of memory may be deallocated. In practice, this is not a problem because in general class loaders only add memory during their lifetimes, and the only times they free memory is when recovering from an error, and such free operations typically happen in the reverse order of allocation. When/if the class loader is unloaded, the whole stack is freed all at once.
Class loaders are unloaded when the associated ClassLoader
object becomes unreachable. Unloading a class loader is quite
straightforward. It involves unloading any associated native
libraries, unloading the ELF objects associated with the class loader,
removing methods from the method lookup tree (which is used to populate
Java stack traces), and freeing the class loader memory blob in
one fell swoop.
JC includes a custom ELF loader. The loader requires some architecture specific code (for processor-specific relocations), which will have to be addressed in any port of JC. The loader does not support the "common" section; it is not needed.
JC expects ELF objects to contain not only functions but also certain
structures contain meta-data describing the class, its fields and methods.
The layout of the structures is defined in
jc_defs.h
.
The ELF loader recognizes special symbols as references to JC support routines, Java methods and fields in other classes, etc. See Appendix A for the list of all special symbols.
When JC loads an ELF object, it performs the following steps. First, it loads the pertinent sections into memory. Then it resolves any intra-object symbolic references (i.e., references from within the object to itself). Next, the object's symbol table is scanned for Java type and method definitions. For each method we use the size attribute of the associated symbol to compute the end of the method. Then, if present, any debugging sections with line number information is processed. This gives us the mapping from PC address to Java line number which is used in exception traces.
There are two command line flags relevant to object file loading.
The --verbose obj
flag is very useful in tracing which ELF
objects JC is loading and which Java types are defined from them.
Also, the --loadlist filename
flag will create a file that contains
the names of all ELF objects that get loaded during JC execution.
This is useful for creating "combo objects" (see below).
Normally JC looks for an individual ELF object corresponding to each Java class file. However, JC can also load multiple types from a single ELF object, called a "combo object". More precisely, whenever JC loads an ELF object, it loads all types defined in that ELF object, even if only one of those types was the one it was actually looking for. If you are actually going to use most of those types, this is a good thing. Otherwise, it's not.
On the positive side, not only does this replace many small ELF loads with one big one, it also simplifies class resolution, as only one resolution operation is required for the entire suite of clases.
On the other hand, any unused types waste memory. But more importantly, when JC resolves a class it resolves all unresolved symbols in the ELF object associated with the class, effectively resolving all of the classes defined in that object at once. Many of the extra classes will reference still other classes not referenced by the target class, causing their ELF objects to be loaded unnecessarily. So if the ELF original object contains other classes that would not normally be resolved, when linked together these classes will be resolved, potentially loading in a bunch more unneeded classes.
Also, JC cannot load a type twice, so if any type in a combo object
is already defined in the VM, the whole object must be rejected.
This can sometimes cause curious failures when mixing combo objects
with regular objects; use the --verbose obj
flag if unsure.
In any case, there are two ways to use this "combo object" feature.
First, if an entry in the jc.object.path
system property points
to an ELF object instead of a directory tree, then JC will look for types
in the corresponding ELF object. Note that this should only be done when
the object contains types that you know will be loaded early in the
application, because until it actually loads it JC will search the ELF object
for every class that it tries to load (loading and searching an ELF object
is a fairly expensive operation).
Secondly, before trying the file foo/bar/Jan.o
corresponding
to Java class foo.bar.Jan
, JC will first look for
foo/bar/_package.o
. If this file exists and contains a
definition for the type foo.bar.Jan
, then foo/bar/_package.o
will loaded instead. As a side effect, all other types defined in
that object will be loaded as well.
You can create _package.o
files for a Java package like this:
$ cd foo/bar $ gcc -o _package.o -r -nostdlib *.o $ rm [^_]*.o
Taking this idea to its logical extreme, you can conglomerate all of
the objects used by an application into a single object. However,
just linking all the objects output by --loadlist
together
is not sufficient by itself. This is because of the aforementioned
unneded resolutions that occur when you conglomerate objects. However,
there is a special extra-dangerous hack to work around this problem,
namely, setting the jc.ignore.resolution.failures
property
to true
. This causes JC to ignore failures to load any
referred-to classes during class resolution, instead simply resolving
the symbol to NULL
. Use this only when you know a priori that
all needed objects are available.
So, putting this all together, here is an example of how to convert
your application into a single, standalone object file and run it
without the need for any class files or other generated objects.
First, we generate the list of needed objects. Then we link them
all together. Finally, we run JC with the jc.object.path
property pointing to (only) the linked object, and tell it also to
not bother looking for Java class files:
$ jc --loadlist list.out foo.Bar $ gcc -o app.o -r -nostdlib `cat list.out` $ jc -Djc.object.path=app.o -Djc.without.classfiles=true \ -Djc.ignore.resolution.failures=true foo.Bar
For a little more speed, if you don't care about Java source line numbers in your exception traces, you can strip out the debugging sections (if present):
$ strip -g app.o
JC is also capable of creating executables from a "main class"
(See jcgen). These simply contain the code to invoke the JC
runtime (similar to the jc
binary, but with a pre-defined
main class to start with), and do not contain any actual compiled
classes in them.
The JC native interface (JCNI) is an alternative to the JNI. However, it was not created as such, rather it simply falls out of the fact that JC executes generated C code: the JCNI is simply the interface that the generated C code uses.
The JC native interface (JCNI) is really two things. First, it is
a mapping from Java method call semantics to C function call semantics,
i.e., what the C prototype of a JCNI native method implementation
must look like. This part is fairly simple: for parameters and return
values, the same JNI primitive types are used, and objects are
represented by _jc_object *
pointers. Static JCNI functions
do not expect the implicit Class
object as a parameter.
For examples of native methods implementing the JCNI interface,
look in the libjc/native
subdirectory of the JC source tree.
The second part is how the body of a JCNI method is supposed to
implement Java functionality like invoking virtual methods and
catching exceptions. This is defined by the all-important include file
jc_defs.h
,
installed in the include/
directory.
The basic idea here is to provide a lot of macros that hide the details
of how things are done. Generated C code ends up looking like some kind
of flattened-out, long winded version of the original Java code
written mostly in C macros with names like _JC_INVOKE_VIRTUAL()
,
_JC_PRIM_FIELD()
, _JC_CAUGHT_EXCEPTION()
, etc.
Code speaks better than words here; see
jc_defs.h
for details.
JC's default code generator is written in Java. If you are familiar with Soot
then it will appear fairly straightforward. It uses the Soot framework
to convert bytecode into Jimple, optimizes the Jimple code somewhat,
and then converts the Jimple statements more or less directly into C
statements. The code for this lives in the
org.dellroad.jc.cgen
package. The JC install process installs some rudimentary
Javadoc
describing these classes.
JC performs several optimizations during the code generation process. These are described below.
The most important optimization performed by JC's code generator
is method inlining. Java code makes heavy use of short method
invocations, so inlining these invocations is a big win, especially
considering that JC imposes additional overhead to normal C function
calls. Inlining is done without any call graph, i.e., only when the
called method is known a priori, i.e., for INVOKESTATIC and non-virtual
INVOKESPECIAL. A typical example is String.length()
(note
that because JC doesn't have a verifier, inlining non-public methods
of other classes is perfectly valid). Other wins include long chains
of constructor invocations where each constructor does little more
than invoke its superclass constructor. In particular, every object
instantiation ultimately invokes Object.<init>
; but this
method does nothing, so inlining it saves us a function call for
every object instantiation. JC's inlining code is contained in
org.dellroad.jc.cgen.DefaultMethodOptimizer
.
Method inlining has an additional benefit: it creates more opportunities for allocation of objects on the stack instead of the heap (see Stack Allocation below).
Nonvirtualization is the process of converting INVOKEVIRTUAL or INVOKEINTERFACE method calls into nonvirtual INVOKESPECIAL method calls. This is a win because both INVOKEVIRTUAL and INVOKEINTERFACE require looking up the method in the object's vtable or interface dispatch hash table at runtime, whereas a nonvirtual INVOKESPECIAL method call is just a "normal" function call where the function to call is known at code generation time. So nonvirtual calls are both faster and require less code.
Of course, since a nonvirtual invocation is required for method inlining, this process creates more opportunities for method inlining.
To perform nonvirtualization, JC must determine the exact type of
the base object. First, the easy case: if we know an object is an
instance of type X, and X is a final class, then we know the exact
type of the object is X. Second, JC performs a simple type analysis
to determine the exact runtime type of reference variables (when
possible). For example, any object created via a new
expression
has a known exact type.
JC uses Soot analysis to eliminate redundant array bounds checks.
That is, when we know an array index is not going to cause an
ArrayIndexOutOfBoundsException
then we can omit the code
that checks for that case.
JC uses Soot analysis to eliminate redundant casts. These are common, often resulting from code like this:
if (foo instanceof Comparable) r = ((Comparable)foo).compareTo(bar);
The cast to Comparable
is completely redundant: there's no way
that foo can not be an instance of Comparable
when that cast
is performed. JC omits such casts.
JC uses Soot analysis to eliminate redundant null pointer checks. Explicit null pointer checks are only required when invoking a non-static method nonvirtually, because a virtual invocation goes through the vtable which requires dereferencing the object pointer. JC eliminates these explicit null pointer checks when it is known that the object reference is not null.
JC uses Soot to perform local escape analysis on each method. Any objects that are allocated in a method but don't "escape" that method (either by being returned, thrown, assigned to a field, assigned to an array element, or passed as a parameter to another method) may be allocated on the stack instead of in the heap. This means not only much a faster allocation time but also less work for the garbage collector.
Typically only a small percentage of object allocations qualify. However, more agressive method inlining means more opportunities for stack allocation.
The maximum amount of space in a method's runtime stack frame that
will be used for stack-allocated Java objects is determined by the
jc.gen.max.stack.alloc
system property.
When you combine nonvirtualization, method inlining, and stack allocation, the results can be fairly significant. For example, consider this Java code:
for (Iterator iter = foo.iterator(); iter.hasNext(); ) { ... iter.next() ... }
If the exact type of foo
is known, then foo.iterator()
can be inlined, then the exact type of iter
will also be
known, so iter.hasNext()
and iter.next()
can be
inlined, and to top it off iter
itself can be allocated on
the stack. This means significantly faster code for this common
Java idiom.
Java classes are initialized upon their first "active use", defined as referencing a static variable, invoking a static method, or instantiating an instance of the class. Performing this check before each of these operations is time consuming.
JC performs flow analysis to determine which checks can be eliminated, because it can be proven that the class must have already been initialized. These checks are then omitted from the generated code.
Many neat tricks are possible but not yet attempted. Before discussing them, it is worth reiterating an important point: generated ELF objects must contain reliable dependency information. Class files can change frequently so it is imperative that any change in any class file that invalidates a generated ELF object must be covered by that ELF object's dependency list.
For the default JC code generator, generating this list is easy: we
conservatively include all superclasses, superinterfaces, all
field and method types, parameter types, exception types, etc. When
more agressive optimizations are made, the list grows longer. For example,
if a java.lang.String
method is inlined, and that method refers
to a field in java.lang.StringBuffer
, then it too must be added
to the dependency list.
The most interesting future possibility with code generation is to perform much more agressive optimizations using Soot. Especially for "closed" systems where all class files are known ahead of time, Soot can be used for whole program analysis. Inlining and stack allocation of objects could be done on a massive scale (inlining creates more opportunities for stack allocation). Type analysis could allow virtual methods to be resolved at build time, and therefore made candidates for inlining. When combined with a conglomerate link, class loading and linking times would be reduced. Etc.
Other tricks include a Java equivalent of the GCC asm()
statement.
For example, something like the following could be used to return an
object's actual address:
public static long getActualMemoryAddress(Object obj) { org.dellroad.jc.cgen.C.inline("return (jlong)param0;"); }
Extending this idea, one could imagine native C code being written inline with Java code, eliminating the need for separate native libraries.
Of course, there's no requirement that C be used as an intermediate step to generating ELF objects. In fact, there is something of an impedence mismatch between the C language and our needs. For example, we use setjmp/longjmp to throw and catch exceptions, but that is not necessarily the most efficient method.
Perhaps a different language would work better, such as C++, or an even lower level language than C like C- or GCC's RTL (register transfer language).
JC supplies a few tools that get automatically installed
alongside the main jc
binary:
The jcjavah
is the standard C native code header file generator program.
It also generates C source file stubs.
It generates JCNI header files by default; specify -jni
for JNI files.
Here is the usage message:
Usage: jcjavah [-classpath path] [-d output-dir] [-c] [-jni] class ... Options: -classpath Specify search path for class files -d dir Output directory for generated files -c Also generate C source files stubs -jni Generate JNI sources instead of JCNI
The jcgen
program is handy for pre-generating ELF objects or
"main class" executables. Here is the usage message:
Usage: jcgen [options] pattern ... jcgen -o filename [options] classname Options: -srcpath path Prepend path to the search path for generated C source files. The first component of the resulting path is also the destination for newly generated sources. This option is cumulative and may be repeated multiple times. -newsrcpath path Same as -srcpath but replaces the current search path instead of prepending to it. -objdir dir Specify destination for generated ELF objects. -incdir dir Specify include directory for JC headers. -classpath path Specify search path for user class files. -vmopt opt Pass opt through to the Java VM command line. -o filename Generate an executable invoking classname -f Force regeneration even if not needed -q Quiet mode (omit verbose output). -v Increase verbosity. -s Don't include support for Java line numbers. -N Only generate source files, no ELF objects. Example patterns: jan.foo.Bar Class jan.foo.Bar jan.foo.* All classes in the jan.foo package jan.foo.% Same jan.foo.** All classes in jan.foo and all its subpackages jan.foo.%% Same Default -srcpath: /home/archie/.jc_src:/usr/local/share/jc/src Default -objdir: /home/archie/.jc_obj Default -incdir: /usr/local/include/jc Default -classpath: .
The default classpath, source path, and object directory are set up
to put newly generated files into ~/.jc_src
and ~/.jc_obj
,
which is usually what you want. So to compile a Java source file,
pre-generate its ELF object, and execute it you'd do something like this:
$ jikes foo/Bar.java $ jcgen foo.Bar $ jc foo.Bar
Note that you can use percent signs instead of asterisks on the command
line. jcgen
is just a shell script wrapper around the JC class
org.dellroad.jc.BootstrapObjectGenerator.
Pre-generating all your ELF objects has the advantage that you can
then run JC with system property jc.without.classfiles=true
,
which yields faster execution and a smaller memory footprint because
JC is relieved of having to perform several tasks relating to
handling class files and code generation.
However, note that running with jc.without.classfiles=true
can change which class loaders load which classes, which may affect
application behavior. In general, all classes will end up being
defined by the boot loader. This is because other class loaders
will first try to load classes via the boot loader, and in cases
where this would normally fail (because the class file is not on
the boot loader's classpath), it will suceed instead, because the
boot loader skips the class file search and just finds and loads
the ELF object directly. Another way to explain this is: while each
class loader may have its own search path for class files, JC has only
one search path for ELF objects that is shared by all class loaders.
To create an executable, use the -o
flag:
$ jcgen -o ~/bin/Bar foo.Bar
In this form, no ELF objects are immediately created. Instead,
jcgen
creates a small executable that simply invokes the JC
runtime via the JNI invocation interface, passing along the main
class name (foo.Bar
in this example) along with any
command line parameters.
All of the other flags except -classpath
(and -v
) are
ignored; their respective settings are determined at runtime.
However, by using -classpath
, you can hard-code the classpath
to the Java class in the executable.
cfdump
dumps the contents of a class file in human-readable form.
Not all contents, most notably method bytecode, are included.
All operating-system and architecture-specific code is contained within the following files and directories:
include/jc_machdep.h
include/jni_machdep.h
libjc/arch/
These files are somewhat self-explanatory. The machine independent functionality required includes:
This chapter contains random notes about things that confused me at one point and may be useful to other developers.
A class is a Java concept represented by a class file. Classes can be either normal classes or interfaces. Classes exist even if there is no JVM to interpret them.
A type is a runtime concept and is more general. All objects are
instances of some type. Once a class is loaded, a type is created to
represent its runtime manifestation. Note that the same class file
may be loaded more than once (by different class loaders); each time
the class is loaded a new type is created. In JC ypes are represented
by _jc_type
structures.
All loaded classes are types but not the converse. The types which
are not derived from loaded classes are the primitive types and the
array types. Regardless, all types have instances of java.lang.Class
associated with them. Each type contains a reference to the
java.lang.Class
object associated with that type.
Confusion sometimes results because the word "class" is used when
what is really meant is "type derived from a loaded class". Sometimes
the phrase "class type" is used for clarity when referring to a runtime
type which is neither a primitive nor array type. And java.lang.Class
objects are associated with types, not classes. It really should
have been called java.lang.Type
instead!
If a class is loaded, then unloaded, and then loaded again, the
second time it is loaded it becomes a completely new and different
type from the first time it is loaded, a new Class
object is instantiated, etc.
All objects are instances of some (non-primitive) type, and contain
in their 2 word headers pointers to the "vtype" structure which
contains the object's associated _jc_type
structure and
vtable (virtual method dispath table). The other word is the lockword
which contains various bits for locking the object, etc.
Bootstrapping the VM involves loading the initial types such as
java.lang.Object
and getting things to the point where we
are ready to execute arbitrary Java code. Things must be done carefully
to avoid problems. This is all handled in the file bootstrap.c
.
During bootstrapping, creation of the Class objects that are associated
with each loaded type is deferred until the java.lang.Class
class has
been loaded. Then we search the tree of already-loaded types and belatedly
create their corresponding Class objects. Otherwise in JC Class objects
are always created whenever a new type is created.
No constructor is ever invoked for instances of java.lang.Class
.
During early bootstrap class initialization is also disabled, preventing execution of arbitrary Java code.
Here is the basic sequence of events:
java.lang.Class
instances and class initialization.
java.lang.Class
.
java.lang.Class
instance
for them (note: no constructor for java.lang.Class
is ever invoked,
and class initialization is still disabled, so no Java code executes).
java.lang.Object
and java.lang.Class
. This step must
occur after the previous step so that ELF symbolic references to Class objects
have something to resolve to.
java.lang.Class
instances and class initialization.
java.lang.Class
(and therefore
java.lang.Object
).
Exceptions during bootstrap are particularly annoying to handle. There are three cases:
env
does not exist (no thread or VM context yet).
The exception and message are printed out as an error when "posted."
vm->initialization != NULL && !vm->initialization->may_execute
.
The exception class name and message are stored in vm->initialization
,
and printed when "caught".
vm->initialization != NULL && vm->initialization->may_execute
.
Is this a recursive exception X before pre-instantiated exception X created?
If yes, the class name and message are stored in vm->initialization
,
printed when "caught". If no, the exception is posted normally.
Code that is in the "bootstrap path" must not call _jc_retrieve_exception() because this function doesn't work before execution is enabled (because posted exceptions do not actually create any objects).
There are two types of class loaders:
The bootstrap class loader (or "boot loader") is an internal JVM
concept and does not have any corresponding java.lang.ClassLoader
object. This loader only knows how to read class files from the
filesystem and from ZIP files (in this particular JVM implementation),
is used to load the first classes that are loaded during bootstrapping,
and is the class loader of last resort so to speak.
User-defined class loaders are simply all loaders other than the boot
loader, all of which are associated with instances of
java.lang.ClassLoader
.
The "system class loader" (or "application class loader") is a
special user-defined loader. It is used e.g., to load the class
specified on the command line. It is (by definition) the loader
returned by the method ClassLoader.getSystemClassLoader()
.
Aside from this, the VM does not treat it specially in any way.
People sometimes confuse the boot loader with the system loader, but they
are completely different.
With respect to a specific type, a class loader can be an "initiating
loader" of that type. This basically means that someone successfully
loaded the type using that loader, perhaps by invoking
ClassLoader.loadClass()
.
Another way to load a type using a specific class loader is to load
another class with that class loader whose class file contains a reference
to the type.
If a loader is an initiating loader for a type, it can also be the "defining
loader" for that type. There is only one defining loader for each type,
though there may be multiple initiating loaders. For "class types" the
defining loader is either the boot loader or else it's the loader whose
ClassLoader.defineClass()
method was called to create the type's
instance of java.lang.Class
.
A class loader is an initiating loader but not a defining loader for a type when it delegates the loading of that type to another class loader (typically it's "parent" class loader).
All object references must be tracable at all times.
For GC to not free objects that are actually still in use, the GC scan needs to be able to find all live objects. For objects referenced via normal Java code, we guarantee this using the usual methods (i.e., scanning the Java stacks of all live threads and Class variables of all loaded classes).
The trickiness happens when objects are referenced only by internal C code and/or structures. We need a strategy for rigorously tracking these references. Here is the strategy.
Each thread has a local native reference list which is explicitly scanned during each GC run. In addition, there is a global VM native reference list that is scanned during GC and used to hold non-thread specific references. All native references must be in one of these lists at any time a GC can possibly occur.
References held only by a single thread (e.g., in a local variable) must be stored in that thread's local native reference list before any opportunity can be allowed to occur in which a GC run may be performed. Since GC can happen during any object allocation, or any time the current thread checks to see if another thread is trying to halt it, this means GC can happen during almost any function call. To be conservative, one should assume that the only safe time to call a function without storing local reference variables in the native reference list is when no functions are called while the reference is being used.
If a reference is passed as a function parameter, we generally assume that the calling function has already taken care of this.
In addition, certain other references are explicitly scanned during GC:
java.lang.Thread
object associated with each live thread
JC keeps an internal tree of loaded class files and their hash values. The hash value is how it detects the loading of the same class using different files.
The actual class file bytes are only stored if "on demand" code generation is enabled. Otherwise they are discarded after being parsed. The reason we can't throw them away when code generation is enabled is because class loaders don't keep class files around, so once a class is loaded the class file is normally discarded.
When the code generator needs to look at a class file, we pull it out
of this tree. A special class
org.dellroad.jc.JCFinder
containing a native method is used
to allow Java code access to this tree.
When an ELF object file corresponding to a class is loaded by JC, it is expected to refer to other Java classes and types using the following set of symbol names. In the list below, "XXX" and "YYY" are replaced by suitably encoded Java names. Method names have their signature hashes appended to prevent namespace collisions.
Java names are encoded as follows: first the name is converted to UTF-8 bytes. Alphanumeric bytes are passed as is. Dots and slashes turn into underscores. All other bytes becomes __NN where NN is the hex value for the byte. This encoding is also used for file names.
_jc_cs_XXX
jc_defs.h
.
_jc_XXX$fields
_jc_XXX$method$YYY
_jc_XXX$method_info$YYY
_jc_XXX$type
_jc_XXX$prim$type
_jc_XXX$class_object
java.lang.Class
instance for class XXX.
_jc_XXX$arrayNNN$type
_jc_XXX$array$type
.
_jc_XXX$arrayNNN$prim$type
_jc_XXX$arrayNNN$class_object
java.lang.Class
instance for the reference
array type having NNN dimensions and associated with class or
interface XXX.
_jc_check_address
_JC_PERIODIC_CHECK()
macro.
_jc_empty_quick_table
NULL
check in every interface method dispatch.
These are some of the more important Java system properties that affect JC's behavior. The JC build and install process will try to figure out the right defaults for everything.
java.boot.class.path
java.boot.class.path.append
java.boot.class.path
.
java.boot.class.path.prepend
java.boot.class.path
.
java.class.path
CLASSPATH
environment
variable to avoid confusion when multiple Java VMs are installed on the
same host. Instead, use a ~/.jc
file.
java.home
/usr/local
.
java.library.path
jc.gnu.compiler
jc.heap.granularity
jc.heap.size
java.lang.Class
objects
don't live in the Java heap; see also the jc.loader.size
property.
jc.ignore.resolution.failures
NullPointerException
s if used inappropriately.
jc.include.dir
jc_defs.h
and friends) are installed.
Used by the default JC code generator when compiling C files into ELF files.
jc.include.line.numbers
true
or false
depending on whether JC should parse the debug
sections (if present) of ELF objects to extract line number information,
and should include line number support in newly generated ELF objects.
Without this property set both during object file creation and during
execution, Java stack traces will not include source file line numbers.
Default is true
.
jc.gen.inline.max.expansion
2.5
.
jc.gen.inline.max.caller
5000
.
jc.gen.inline.max.callee
12
.
jc.gen.inline.min.caller
jc.gen.inline.max.expansion
limit is ignored.
Default is 20
.
jc.gen.inline.min.callee
jc.gen.inline.max.expansion
limit is ignored.
Default is 3
.
jc.gen.inline.verbose
true
, during code generation inlined methods will be printed
to standard output.
jc.gen.max.stack.alloc
1024
.
jc.gen.force
true
, causes C source files to be regenerated even if they
appear valid.
jc.loader.size
java.lang.Class
objects).
jc.method.optimizer
org.dellroad.jc.cgen.MethodOptimizer
interface. An instance is created and used by the JCObjectGenerator
class to optimize method Jimple bodies during code generation.
Default is
org.dellroad.jc.cgen.DefaultMethodOptimizer
.
jc.object.generation.enabled
true
means that JC will store copies of all
loaded class files in memory, for possible subsequent use by the Soot
code generator. So set this to false
when you want optimal performance
and you know all classes have already been converted to ELF objects.
This property is automatically set to false
if either
jc.without.classfiles
is true
or
jc.object.loader.enabled
is false
.
Default is true
.
jc.object.generator
org.dellroad.jc.ObjectGenerator
interface to be invoked for on-demand ELF object generation.
Default is
org.dellroad.jc.cgen.JCObjectGenerator
.
jc.object.loader.enabled
false
then JC will function strictly as an interpreting JVM.
Setting this to false
forces jc.object.generation.enabled
to be set to false
as well (as there's no point to generating ELF
objects on demand if you're not allowed to load them).
Default is true
.
jc.object.path
jc.resolve.native.directly
true
to cause JCNI native methods to be resolved directly when
posssible. Enabling this optimization eliminates the overhead for JCNI
native methods when the invoking class is loaded after the native method
is resolved, but also causes the native method to disappear from stack traces.
Default is false
.
jc.source.path
jc.stack.default
jc.stack.maximum
jc.stack.minimum
jc.verbose.XXX
true
or false
depending on whether the corresponding
-v
or --verbose
flag was given as a command line argument to JC.
These flags are set by the VM itself; any values they are assigned during VM
creation (e.g., using the --property
command line flag) are ignored.
The "XXX" is one of the verbosity flags, e.g., class
, gc
,
gen
, etc.
jc.without.classfiles
true
, then when loading a class JC's bootstrap class loader will
look first for an already generated ELF object file and, if found, load it
directly, skipping the step of acquiring the class file. This property being
set to true
forces the jc.object.generation.enabled
property
to be set to false
(it requires class files to work). Note that this
property being true
may cause classes that would normally be
defined by a user-defined class loader to be defined by the boot loader
instead, possibly affecting application behavior.
Default is false
.
--loadlist
: Loader Overview
_package.o
: Combo Objects
cfdump
: cfdump
jcgen
: jcgen
jcjavah
: jcjavah