Java 虚拟机圖文詳解: JVM 體系結構 ( The JV

2021-08-21  本文已影响0人  光剑书架上的书

What Is the JVM?

A Virtual Machine is a software implementation of a physical machine. Java was developed with the concept of WORA (Write Once Run Anywhere), which runs on a VM. The compiler compiles the Java file into a Java .class file, then that .class file is input into the JVM, which loads and executes the class file. Below is a diagram of the Architecture of the JVM.

JVM 架構體系

一個 Java 應用程序的啟動入口是: sun.misc.Launcher$AppClassLoader@xxxx

JVM 字節碼由JRE(Java 運行時環境)執行。
JVM bytecode will be executed by the JRE (Java Runtime Environment).

JRE 是Java 虛擬機(JVM) 的實現,它分析字節碼、解釋代碼並執行它。
JRE is the implementation of Java Virtual Machine (JVM), which analyzes the bytecode, interprets the code, and executes it.

JVM Architecture

如上架構圖所示,JVM分為三個主要子系統:

  1. 類加載器子系統
  2. 運行時數據區
  3. 執行引擎

1. 類加載器子系統

Java 的動態類加載功能由 ClassLoader 子系統處理。它加載,鏈接。並在運行時第一次引用類時初始化類文件,而不是編譯時。

1.1 加載

類將由此組件加載。BootStrap ClassLoader、Extension ClassLoader 和 Application ClassLoader 是三個有助於實現它的 ClassLoader。

  1. BootStrap ClassLoader – 負責從 bootstrap 類路徑加載類,除了rt.jar什麼都沒有此加載程序將獲得最高優先級。
  2. 擴展類加載器——負責加載位於 ext 文件夾(jre\lib) 內的類。
  3. Application ClassLoader –負責加載Application Level Classpath,路徑提到的環境變量等。

上述類加載器在加載類文件時將遵循委託層次算法。

算法實現源代碼: java.lang.ClassLoader#loadClass(java.lang.String, boolean)

如下:

public abstract class ClassLoader {

    private static native void registerNatives();
    static {
        registerNatives();
    }

    // The parent class loader for delegation
    // Note: VM hardcoded the offset of this field, thus all new fields
    // must be added *after* it.
    private final ClassLoader parent;

    ....



    /**
     * Loads the class with the specified <a href="#name">binary name</a>.  The
     * default implementation of this method searches for classes in the
     * following order:
     *
     * <ol>
     *
     *   <li><p> Invoke {@link #findLoadedClass(String)} to check if the class
     *   has already been loaded.  </p></li>
     *
     *   <li><p> Invoke the {@link #loadClass(String) <tt>loadClass</tt>} method
     *   on the parent class loader.  If the parent is <tt>null</tt> the class
     *   loader built-in to the virtual machine is used, instead.  </p></li>
     *
     *   <li><p> Invoke the {@link #findClass(String)} method to find the
     *   class.  </p></li>
     *
     * </ol>
     *
     * <p> If the class was found using the above steps, and the
     * <tt>resolve</tt> flag is true, this method will then invoke the {@link
     * #resolveClass(Class)} method on the resulting <tt>Class</tt> object.
     *
     * <p> Subclasses of <tt>ClassLoader</tt> are encouraged to override {@link
     * #findClass(String)}, rather than this method.  </p>
     *
     * <p> Unless overridden, this method synchronizes on the result of
     * {@link #getClassLoadingLock <tt>getClassLoadingLock</tt>} method
     * during the entire class loading process.
     *
     * @param  name
     *         The <a href="#name">binary name</a> of the class
     *
     * @param  resolve
     *         If <tt>true</tt> then resolve the class
     *
     * @return  The resulting <tt>Class</tt> object
     *
     * @throws  ClassNotFoundException
     *          If the class could not be found
     */
    protected Class<?> loadClass(String name, boolean resolve)
        throws ClassNotFoundException
    {
        synchronized (getClassLoadingLock(name)) {
            // First, check if the class has already been loaded
            Class<?> c = findLoadedClass(name);
            if (c == null) {
                long t0 = System.nanoTime();

                try {
                    if (parent != null) { // 父加載器不為null, 使用父加載器 loadClass()
                        c = parent.loadClass(name, false);
                    } else { // 父加載器為 null,  去 BootstrapClassLoader 中尋找當前類
                        // Returns a class loaded by the bootstrap class loader; 
                        //or return null if not found.
                        c = findBootstrapClassOrNull(name);
                    }

                } catch (ClassNotFoundException e) {
                    // ClassNotFoundException thrown if class not found
                    // from the non-null parent class loader
                }

                if (c == null) {
                    // If still not found, then invoke findClass in order
                    // to find the class.
                    long t1 = System.nanoTime();

                   // 父加載器無法加載的時候, 調用本身的 findClass() 方法來加載類
                    c = findClass(name);

                    // this is the defining class loader; record the stats
                    sun.misc.PerfCounter.getParentDelegationTime().addTime(t1 - t0);
                    sun.misc.PerfCounter.getFindClassTime().addElapsedTimeFrom(t1);
                    sun.misc.PerfCounter.getFindClasses().increment();
                }
            }
            if (resolve) {
                resolveClass(c);
            }
            return c;
        }
    }

    ...

}

一个Java类是由java.lang.ClassLoader类的一个实例加载的。由于java.lang.ClassLoader自己本身是一个抽象类所以一个类加载器只能够是java.lang.ClassLoader类的具体子类的实例。如果是这种情况,那么哪一个类加载器来加载java.lang.ClassLoader这个类?

经典的"谁将会加载加载者"引导的问题?

事实证明JVM有一个内置的引导类加载器。引导加载器加载java.lang.ClassLoader和许多其他java平台类。

要加载一个具体的java类,例如 com.acme.Foo,JVM调用 java.lang.ClassLoader 类的loadClass() 方法 (事实上,JVM查找loadClassInternal() 方法-如果发现 loadClassInternal() 方法则用loadClassInternal() 方法,否则JVM使用loadClass方法,而 loadClassInternal() 方法会调用loadClass() 方法)。

java.lang.ClassLoader#loadClassInternal

    // This method is invoked by the virtual machine to load a class.
    private Class<?> loadClassInternal(String name)
        throws ClassNotFoundException
    {
        // For backward compatibility, explicitly lock on 'this' when
        // the current class loader is not parallel capable.
        if (parallelLockMap == null) {
            synchronized (this) {
                 return loadClass(name);
            }
        } else {
            return loadClass(name);
        }
    }

loadClass() 方法接收类名来加载类返回表示加载的类的java.lang.Class实例。事实上loadClass方法找到.class文件(或者URL)的实际字节,并调用defineClass方法来构造出java.lang.Class类的字节数组。加载器上调用loadClass方法的加载器称之为初始化加载器(即,JVM启动加载使用这个加载器).

但是,启动加载器不是直接加载类的, 而是可能委托给另外一个类加载器(例如,它的父加载器). 它自己也可能委派给另外一个加载器去加载等等。最终在委托链中的某些类加载器对象调用defineClass方法加载有关的类.

1.2 鏈接

  1. 驗證- 字節碼驗證器將驗證生成的字節碼是否正確,如果驗證失敗,我們將收到驗證錯誤。
  2. 準備- 對於所有靜態變量,內存將被分配和分配默認值。
  3. 解決——所有符號內存引用都替換為來自方法區的原始引用。

1.3 初始化

這是 ClassLoading 的最後階段;在這裡,所有靜態變量都將被賦值為原始值,並且靜態塊將被執行。

2. JVM 運行時數據區

Runtime Data Areas: Heap | Method Area | JVM Stacks | PC Register | Native Stacks

運行時數據區分為五個主要部分:

  1. 方法區——所有類級別的數據都將存儲在這裡,包括靜態變量。每個JVM只有一個方法區,它是一個共享資源。

  2. 堆區——所有對象及其對應的實例變量和數組都將存儲在這裡。每個 JVM 也有一個堆區。由於 Method 和 Heap 區域為多個線程共享內存,因此存儲的數據不是線程安全的。

  3. 堆棧區——對於每個線程,將創建一個單獨的運行時堆棧。對於每個方法調用,都會在堆棧內存中創建一個條目,稱為堆棧幀。所有局部變量都將在堆棧內存中創建。堆棧區域是線程安全的,因為它不是共享資源。堆棧幀分為三個子實體:

    1. Local Variable Array – 與方法相關的局部變量有多少,對應的值會存儲在這裡。
    2. 操作數堆棧– 如果需要執行任何中間操作,操作數堆棧將充當運行時工作區來執行操作。
    3. 幀數據– 與該方法對應的所有符號都存儲在此處。在任何異常的情況下,catch 塊信息將保留在幀數據中。
  1. PC 寄存器——每個線程都有獨立的 PC 寄存器,用於保存當前執行指令的地址,一旦指令執行,PC 寄存器將隨下一條指令更新。

  2. 本機方法堆棧——本機方法堆棧保存本機方法信息。對於每個線程,將創建一個單獨的本地方法堆棧。

JVM 堆内存区域 Heap

Heap 数据结构

METHOD AREA / PERMANENT

Created at JVM startup and shared among all threads like Heap.

Per Thread Runtime Data Areas : PC Register & Stack Frame

https://www.jianshu.com/p/6dc6c921c9a2

3. 執行引擎

分配給運行時數據區的字節碼將由執行引擎執行。執行引擎讀取字節碼並逐段執行。

  1. 解釋器——解釋器更快地解釋字節碼,但執行速度很慢。解釋器的缺點是當一個方法被多次調用時,每次都需要一個新的解釋。
  2. JIT 編譯器——JIT 編譯器消除了解釋器的缺點。執行引擎將使用解釋器的幫助來轉換字節碼,但是當它發現重複的代碼時,它會使用 JIT 編譯器,編譯整個字節碼並將其更改為本地代碼。此原生代碼將直接用於重複的方法調用,從而提高系統的性能。
    1. 中間代碼生成器——生成中間代碼
    2. Code Optimizer——負責優化上面生成的中間代碼
    3. 目標代碼生成器——負責生成機器代碼或本地代碼
    4. Profiler – 一個特殊的組件,負責尋找熱點,即該方法是否被多次調用。
  3. 垃圾收集器:收集並刪除未引用的對象。調用 可以觸發垃圾回收System.gc(),但不保證執行。JVM 的垃圾收集收集創建的對象。

Java 本機接口 (JNI):JNI 將與本機方法庫交互並提供執行引擎所需的本機庫。

本機方法庫:這是執行引擎所需的本機庫的集合。

JVM 入参: class 字节码二进制流文件协议

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html

Class 类文件的结构定义:

struct ClassFile {
       u4             magic;
       u2             minor_version;
       u2             major_version;
       u2             constant_pool_count;
       cp_info        constant_pool[constant_pool_count-1];
       u2             access_flags;
       u2             this_class;
       u2             super_class;
       u2             interfaces_count;
       u2             interfaces[interfaces_count];
       u2             fields_count;
       field_info     fields[fields_count];
       u2             methods_count;
       method_info    methods[methods_count];
       u2             attributes_count;
       attribute_info attributes[attributes_count];
}

JvmtiClassFileReconstituter.cpp 源代碼:

void JvmtiClassFileReconstituter::write_class_file_format() {
  ReallocMark();

  // JVMSpec|   ClassFile {
  // JVMSpec|           u4 magic;
  write_u4(0xCAFEBABE);

  // JVMSpec|           u2 minor_version;
  // JVMSpec|           u2 major_version;
  write_u2(ikh()->minor_version());
  u2 major = ikh()->major_version();
  write_u2(major);

  // JVMSpec|           u2 constant_pool_count;
  // JVMSpec|           cp_info constant_pool[constant_pool_count-1];
  write_u2(cpool()->length());
  copy_cpool_bytes(writeable_address(cpool_size()));

  // JVMSpec|           u2 access_flags;
  write_u2(ikh()->access_flags().get_flags() & JVM_RECOGNIZED_CLASS_MODIFIERS);

  // JVMSpec|           u2 this_class;
  // JVMSpec|           u2 super_class;
  write_u2(class_symbol_to_cpool_index(ikh()->name()));
  Klass* super_class = ikh()->super();
  write_u2(super_class == NULL? 0 :  // zero for java.lang.Object
                class_symbol_to_cpool_index(super_class->name()));

  // JVMSpec|           u2 interfaces_count;
  // JVMSpec|           u2 interfaces[interfaces_count];
  Array<Klass*>* interfaces =  ikh()->local_interfaces();
  int num_interfaces = interfaces->length();
  write_u2(num_interfaces);
  for (int index = 0; index < num_interfaces; index++) {
    HandleMark hm(thread());
    instanceKlassHandle iikh(thread(), interfaces->at(index));
    write_u2(class_symbol_to_cpool_index(iikh->name()));
  }

  // JVMSpec|           u2 fields_count;
  // JVMSpec|           field_info fields[fields_count];
  write_field_infos();

  // JVMSpec|           u2 methods_count;
  // JVMSpec|           method_info methods[methods_count];
  write_method_infos();

  // JVMSpec|           u2 attributes_count;
  // JVMSpec|           attribute_info attributes[attributes_count];
  // JVMSpec|   } /* end ClassFile 8?
  write_class_attributes();
}

Sections

There are 10 basic sections to the Java Class File structure:

Magic Number

Class files are identified by the following 4 byte header (in hexadecimal): CA FE BA BE (the first 4 entries in the table below). The history of this magic number was explained by James Gosling referring to a restaurant in Palo Alto:[2]

"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI.

General layout

Because the class file contains variable-sized items and does not also contain embedded file offsets (or pointers), it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types:

Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context. There is no enforcement of word alignment, and so no padding bytes are ever used. The overall layout of the class file is as shown in the following table.

Representation in a C-like programming language

Since C doesn't support multiple variable length arrays within a struct, the code below won't compile and only serves as a demonstration.

struct Class_File_Format {
   u4 magic_number;

   u2 minor_version;   
   u2 major_version;

   u2 constant_pool_count;   
  
   cp_info constant_pool[constant_pool_count - 1];

   u2 access_flags;

   u2 this_class;
   u2 super_class;

   u2 interfaces_count;   
   
   u2 interfaces[interfaces_count];

   u2 fields_count;   
   field_info fields[fields_count];

   u2 methods_count;
   method_info methods[methods_count];

   u2 attributes_count;   
   attribute_info attributes[attributes_count];
}

The constant pool

Class 类文件中的常量池 ( Constant Pool ),是Class文件结构中与其他项目关联最多的数据类型,也是占用Class文件空间最大的数据项目之一,同时它还是Class文件中第一个出现的表类型数据项目。

The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid).

Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), but the count should actually be interpreted as the maximum index plus one.[5] Additionally, two types of constants (longs and doubles) take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.

The type of each item (constant) in the constant pool is identified by an initial byte tag. The number of bytes following this tag and their interpretation are then dependent upon the tag value. The valid constant types and their tag values are:

Tag byte Additional bytes Description of constant Version introduced
1 2+x bytes (variable) UTF-8 (Unicode) string: a character string prefixed by a 16-bit number (type u2) indicating the number of bytes in the encoded string which immediately follows (which may be different than the number of characters). Note that the encoding used is not actually UTF-8, but involves a slight modification of the Unicode standard encoding form. 1.0.2
3 4 bytes Integer: a signed 32-bit two's complement number in big-endian format 1.0.2
4 4 bytes Float: a 32-bit single-precision IEEE 754 floating-point number 1.0.2
5 8 bytes Long: a signed 64-bit two's complement number in big-endian format (takes two slots in the constant pool table) 1.0.2
6 8 bytes Double: a 64-bit double-precision IEEE 754 floating-point number (takes two slots in the constant pool table) 1.0.2
7 2 bytes Class reference: an index within the constant pool to a UTF-8 string containing the fully qualified class name (in internal format) (big-endian) 1.0.2
8 2 bytes String reference: an index within the constant pool to a UTF-8 string (big-endian too) 1.0.2
9 4 bytes Field reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian) 1.0.2
10 4 bytes Method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian) 1.0.2
11 4 bytes Interface method reference: two indexes within the constant pool, the first pointing to a Class reference, the second to a Name and Type descriptor. (big-endian) 1.0.2
12 4 bytes Name and type descriptor: two indexes to UTF-8 strings within the constant pool, the first representing a name (identifier) and the second a specially encoded type descriptor. 1.0.2
15 3 bytes Method handle: this structure is used to represent a method handle and consists of one byte of type descriptor, followed by an index within the constant pool. 7
16 2 bytes Method type: this structure is used to represent a method type, and consists of an index within the constant pool. 7
17 4 bytes Dynamic: this is used to specify a dynamically computed constant produced by invocation of a bootstrap method. 11
18 4 bytes InvokeDynamic: this is used by an invokedynamic instruction to specify a bootstrap method, the dynamic invocation name, the argument and return types of the call, and optionally, a sequence of additional constants called static arguments to the bootstrap method. 7
19 2 bytes Module: this is used to identify a module. 9
20 2 bytes Package: this is used to identify a package exported or opened by a module. 9

There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant.

Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".

The Unicode strings, despite the moniker "UTF-8 string", are not actually encoded according to the Unicode standard, although it is similar. There are two differences (see UTF-8 for a complete discussion). The first is that the code point U+0000 is encoded as the two-byte sequence C0 80 (in hex) instead of the standard single-byte encoding 00. The second difference is that supplementary characters (those outside the BMP at U+10000 and above) are encoded using a surrogate-pair construction similar to UTF-16 rather than being directly encoded using UTF-8. In this case each of the two surrogates is encoded separately in UTF-8. For example, U+1D11E is encoded as the 6-byte sequence ED A0 B4 ED B4 9E, rather than the correct 4-byte UTF-8 encoding of F0 9D 84 9E.

一個常量池實例

Consider the Salutation application shown below:

// On CD-ROM in file linking/ex5/Salutation.java
class Salutation {

    private static final String hello = "Hello, world!";
    private static final String greeting = "Greetings, planet!";
    private static final String salutation = "Salutations, orb!";

    private static int choice = (int) (Math.random() * 2.99);

    public static void main(String[] args) {

        String s = hello;
        if (choice == 1) {
            s = greeting;
        }
        else if (choice == 2) {
            s = salutation;
        }

        System.out.println(s);
    }
}

Assume that you have asked a Java virtual machine to run Salutation. When the virtual machine starts, it attempts to invoke the main() method of Salutation. It quickly realizes, however, that it can't invoke main(). The invocation of a method declared in a class is an active use of that class, which is not allowed until the class is initialized. Thus, before the virtual machine can invoke main(), it must initialize Salutation.

And before it can initialize Salutation, it must load and link Salutation. So, the virtual machine hands the fully qualified name of Salutation to the bootstrap class loader, which retrieves the binary form of the class, parses the binary data into internal data structures, and creates an instance of java.lang.Class.

The constant pool for Salutation is shown in below Table ( The symbolic reference table ):

Index Type Value
1   CONSTANT_String_info    30
2   CONSTANT_String_info    31
3   CONSTANT_String_info    39
4   CONSTANT_Class_info 37
5   CONSTANT_Class_info 44
6   CONSTANT_Class_info 45
7   CONSTANT_Class_info 46
8   CONSTANT_Class_info 47
9   CONSTANT_Methodref_info 7, 16
10  CONSTANT_Fieldref_info  4, 17
11  CONSTANT_Fieldref_info  8, 18
12  CONSTANT_Methodref_info 5, 19
13  CONSTANT_Methodref_info 6, 20
14  CONSTANT_Double_info    2.99
16  CONSTANT_NameAndType_info   26, 22
17  CONSTANT_NameAndType_info   41, 32
18  CONSTANT_NameAndType_info   49, 34
19  CONSTANT_NameAndType_info   50, 23
20  CONSTANT_NameAndType_info   51, 21
21  CONSTANT_Utf8_info  "()D"
22  CONSTANT_Utf8_info  "()V"
23  CONSTANT_Utf8_info  "(Ljava/lang/String;)V"
24  CONSTANT_Utf8_info  "([Ljava/lang/String;)V"
25  CONSTANT_Utf8_info  "<clinit>"
26  CONSTANT_Utf8_info  "<init>"
27  CONSTANT_Utf8_info  "Code"
28  CONSTANT_Utf8_info  "ConstantValue"
29  CONSTANT_Utf8_info  "Exceptions"
30  CONSTANT_Utf8_info  "Greetings, planet!"
31  CONSTANT_Utf8_info  "Hello, world!"
32  CONSTANT_Utf8_info  "I"
33  CONSTANT_Utf8_info  "LineNumberTable"
34  CONSTANT_Utf8_info  "Ljava/io/PrintStream;"
35  CONSTANT_Utf8_info  "Ljava/lang/String;"
36  CONSTANT_Utf8_info  "LocalVariables"
37  CONSTANT_Utf8_info  "Salutation"
38  CONSTANT_Utf8_info  "Salutation.java"
39  CONSTANT_Utf8_info  "Salutations, orb!"
40  CONSTANT_Utf8_info  "SourceFile"
41  CONSTANT_Utf8_info  "choice"
42  CONSTANT_Utf8_info  "greeting"
43  CONSTANT_Utf8_info  "hello"
44  CONSTANT_Utf8_info  "java/io/PrintStream"
45  CONSTANT_Utf8_info  "java/lang/Math"
46  CONSTANT_Utf8_info  "java/lang/Object"
47  CONSTANT_Utf8_info  "java/lang/System"
48  CONSTANT_Utf8_info  "main"
49  CONSTANT_Utf8_info  "out"
50  CONSTANT_Utf8_info  "println"
51  CONSTANT_Utf8_info  "random"
52  CONSTANT_Utf8_info  "salutation"

例如, index = 7 的那一行記錄:

...
7   CONSTANT_Class_info 46
...
46  CONSTANT_Utf8_info  "java/lang/Object"
...

Figure 8-5 The symbolic reference from Salutation to Object.

The Constant Pool

https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html

Java Virtual Machine instructions do not rely on the run-time layout of classes, interfaces, class instances, or arrays. Instead, instructions refer to symbolic information in the constant_pool table.

All constant_pool table entries have the following general format:

cp_info {
    u1 tag;
    u1 info[];
}

Each item in the constant_pool table must begin with a 1-byte tag indicating the kind of cp_info entry. The contents of the info array vary with the value of tag. The valid tags and their values are listed in Table 4.3. Each tag byte must be followed by two or more bytes giving information about the specific constant. The format of the additional information varies with the tag value.

Table 4.3. Constant pool tags

Constant Type Value
CONSTANT_Class 7
CONSTANT_Fieldref 9
CONSTANT_Methodref 10
CONSTANT_InterfaceMethodref 11
CONSTANT_String 8
CONSTANT_Integer 3
CONSTANT_Float 4
CONSTANT_Long 5
CONSTANT_Double 6
CONSTANT_NameAndType 12
CONSTANT_Utf8 1
CONSTANT_MethodHandle 15
CONSTANT_MethodType 16
CONSTANT_InvokeDynamic 18

常量池中的各常量项的结构表

Class文件结构中的 cp_info,即常量池中的各常量项的结构表:


`CONSTANT_Class_info` {
    u1 tag;  // 值为1
    u2 name_index;  // 指向权限定名常量项的索引
}

`CONSTANT_Fieldref_info` {
    u1 tag;  // 值为9
    u2 class_index;  // 指向声明字段的类或接口描述符CONSTANT_Class_info的索引项
    u2 name_and_type_index;  // 指向字段描述符CONSTANT_NameAndType的索引项
}

`CONSTANT_Methodref_info` {
    u1 tag;  // 值为10
    u2 class_index;  // 指向声明方法的类描述符CONSTANT_Class_info的索引项
    u2 name_and_type_index;  // 指向名称及类型描述符CONSTANT_NameAndType的索引项
}

`CONSTANT_InterfaceMethodref_info` {
    u1 tag;  // 值为11
    u2 class_index;  // 指向声明方法的接口描述符CONSTANT_Class_info的索引项
    u2 name_and_type_index;  // 指向名称及类型描述符CONSTANT_NameAndType的索引项
}
`CONSTANT_String_info` {
    u1 tag;  // 值为8
    u2 string_index;  // 指向字符串字面量的索引
}
`CONSTANT_Integer_info` {
    u1 tag;  // 值为3
    u4 bytes;  // 按照高位在前存储的int值
}

`CONSTANT_Float_info` {
    u1 tag;  // 值为4
    u4 bytes;  // 按照高位在前存储的float值
}
`CONSTANT_Long_info` {
    u1 tag;  // 值为5
    u4 high_bytes;  // 按照高位在前存储的long值
    u4 low_bytes;  //
}

`CONSTANT_Double_info` {
    u1 tag;  // 值为6 
    u4 high_bytes;  // 按照高位在前存储的double值
    u4 low_bytes;  //
}
`CONSTANT_NameAndType_info` {
    u1 tag;  // 值为12
    u2 name_index;  // 指向该字段或方法名称常量项的索引
    u2 descriptor_index;  // 指向该字段或方法描述符常量项的索引
}
`CONSTANT_Utf8_info` {
    u1 tag;  // 值为1
    u2 length;  // UTF8编码的字符串占用的字节数
    u1 bytes[length];  // 长度为length的UTF8编码的字符串
}
`CONSTANT_MethodHandle_info` {
    u1 tag;  // 值为15
    u1 reference_kind;  // 值必须在[1~9]之间,它决定了方法句柄的类型。
                        //方法句柄类型的值表示方法句柄的字节码行为
    u2 reference_index;  // 值必须是对常量池的有效索引
}
`CONSTANT_MethodType_info` {
    u1 tag;  // 值为16
    u2 descriptor_index;  // 值必须是对常量池的有效索引,
                //常量池在该索引处的项必须是CONSTANT_Utf8_info结构,表示方法的描述符
}
`CONSTANT_InvokeDynamic_info` {
    u1 tag;  // 值为18
    u2 bootstrap_method_attr_index;  // 值必须是对当前Class文件中引导方法表的
                                    //bootstrap_methods[]数组的有效索引
    u2 name_and_type_index;  // 值必须是对当前常量池的有效索引,常量池在该索引处的项必须
                            // 是CONSTANT_NameAndType_info结构,表示方法名和方法描述符
}
`CONSTANT_Module_info` {
    u1 tag;  //  
    u2 name_index;  //
}
`CONSTANT_Package_info` {
    u1 tag;  //
    u2 name_index;  //
}

u1,u2,u4,u8分别代表1个字节,2个字节,4个字节,8个字节的无符号数

附錄

附錄1: 用 javap 命令查看 class 文件的字节码

javap -verbose Light

Java 源代碼1:

package light.sword;

public class Light {
    private int m;

    public int inc() {
        return m + 1;
    }
}

JVM字節碼1:

Classfile /Users/bytedance/code/jvm_notes/src/light/sword/Light.class
  Last modified 2021-8-14; size 279 bytes
  MD5 checksum bb11ff4b71e1ebd07f01cae7bbfd0c95
  Compiled from "Light.java"
public class light.sword.Light
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #4.#15         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#16         // light/sword/Light.m:I
   #3 = Class              #17            // light/sword/Light
   #4 = Class              #18            // java/lang/Object
   #5 = Utf8               m
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               inc
  #12 = Utf8               ()I
  #13 = Utf8               SourceFile
  #14 = Utf8               Light.java
  #15 = NameAndType        #7:#8          // "<init>":()V
  #16 = NameAndType        #5:#6          // m:I
  #17 = Utf8               light/sword/Light
  #18 = Utf8               java/lang/Object
{
  public light.sword.Light();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0

  public int inc();
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: getfield      #2                  // Field m:I
         4: iconst_1
         5: iadd
         6: ireturn
      LineNumberTable:
        line 7: 0
}
SourceFile: "Light.java"

Java 源代碼2:

package light.sword;

public class LightSynchronized {

    private int m;

    public int inc() {
        synchronized (this) {
            return m + 1;
        }
    }
}

JVM字節碼2:

Classfile /Users/bytedance/code/jvm_notes/src/light/sword/LightSynchronized.class
  Last modified 2021-8-21; size 409 bytes
  MD5 checksum 7289db3daff240d90744d2e2e9836d8b
  Compiled from "LightSynchronized.java"
public class light.sword.LightSynchronized
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Methodref          #4.#19         // java/lang/Object."<init>":()V
   #2 = Fieldref           #3.#20         // light/sword/LightSynchronized.m:I
   #3 = Class              #21            // light/sword/LightSynchronized
   #4 = Class              #22            // java/lang/Object
   #5 = Utf8               m
   #6 = Utf8               I
   #7 = Utf8               <init>
   #8 = Utf8               ()V
   #9 = Utf8               Code
  #10 = Utf8               LineNumberTable
  #11 = Utf8               inc
  #12 = Utf8               ()I
  #13 = Utf8               StackMapTable
  #14 = Class              #21            // light/sword/LightSynchronized
  #15 = Class              #22            // java/lang/Object
  #16 = Class              #23            // java/lang/Throwable
  #17 = Utf8               SourceFile
  #18 = Utf8               LightSynchronized.java
  #19 = NameAndType        #7:#8          // "<init>":()V
  #20 = NameAndType        #5:#6          // m:I
  #21 = Utf8               light/sword/LightSynchronized
  #22 = Utf8               java/lang/Object
  #23 = Utf8               java/lang/Throwable
{
  public light.sword.LightSynchronized();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0

  public int inc();
    descriptor: ()I
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=3, args_size=1
         0: aload_0
         1: dup
         2: astore_1
         3: monitorenter
         4: aload_0
         5: getfield      #2                  // Field m:I
         8: iconst_1
         9: iadd
        10: aload_1
        11: monitorexit
        12: ireturn
        13: astore_2
        14: aload_1
        15: monitorexit
        16: aload_2
        17: athrow
      Exception table:
         from    to  target type
             4    12    13   any
            13    16    13   any
      LineNumberTable:
        line 8: 0
        line 9: 4
        line 10: 13
      StackMapTable: number_of_entries = 1
        frame_type = 255 /* full_frame */
          offset_delta = 13
          locals = [ class light/sword/LightSynchronized, class java/lang/Object ]
          stack = [ class java/lang/Throwable ]
}
SourceFile: "LightSynchronized.java"

其中, The Code attribute has the following format:

struct Code_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 max_stack;
    u2 max_locals;
    u4 code_length;
    u1 code[code_length];
    u2 exception_table_length;
    {   u2 start_pc;
        u2 end_pc;
        u2 handler_pc;
        u2 catch_type;
    } exception_table[exception_table_length];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.4.2

附录2: java、javac 源码的位置

bytedance$java -version
java version "1.8.0_291"
Java(TM) SE Runtime Environment (build 1.8.0_291-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)
bytedance$java -h
用法: java [-options] class [args...]
           (执行类)
   或  java [-options] -jar jarfile [args...]
           (执行 jar 文件)
其中选项包括:
    -d32      使用 32 位数据模型 (如果可用)
    -d64      使用 64 位数据模型 (如果可用)
    -server   选择 "server" VM
                  默认 VM 是 server,
                  因为您是在服务器类计算机上运行。


    -cp <目录和 zip/jar 文件的类搜索路径>
    -classpath <目录和 zip/jar 文件的类搜索路径>
                  用 : 分隔的目录, JAR 档案
                  和 ZIP 档案列表, 用于搜索类文件。
    -D<名称>=<值>
                  设置系统属性
    -verbose:[class|gc|jni]
                  启用详细输出
    -version      输出产品版本并退出
    -version:<值>
                  警告: 此功能已过时, 将在
                  未来发行版中删除。
                  需要指定的版本才能运行
    -showversion  输出产品版本并继续
    -jre-restrict-search | -no-jre-restrict-search
                  警告: 此功能已过时, 将在
                  未来发行版中删除。
                  在版本搜索中包括/排除用户专用 JRE
    -? -help      输出此帮助消息
    -X            输出非标准选项的帮助
    -ea[:<packagename>...|:<classname>]
    -enableassertions[:<packagename>...|:<classname>]
                  按指定的粒度启用断言
    -da[:<packagename>...|:<classname>]
    -disableassertions[:<packagename>...|:<classname>]
                  禁用具有指定粒度的断言
    -esa | -enablesystemassertions
                  启用系统断言
    -dsa | -disablesystemassertions
                  禁用系统断言
    -agentlib:<libname>[=<选项>]
                  加载本机代理库 <libname>, 例如 -agentlib:hprof
                  另请参阅 -agentlib:jdwp=help 和 -agentlib:hprof=help
    -agentpath:<pathname>[=<选项>]
                  按完整路径名加载本机代理库
    -javaagent:<jarpath>[=<选项>]
                  加载 Java 编程语言代理, 请参阅 java.lang.instrument
    -splash:<imagepath>
                  使用指定的图像显示启动屏幕
有关详细信息, 请参阅 http://www.oracle.com/technetwork/java/javase/documentation/index.html。

平常我们编译java代码的命令javac,运行 java 应用的 java 命令就在下面的路径下

src\share\classes\sun\tools\javac
src\share\classes\sun\tools\java\

JDK main函数入口

\src\share\bin\main.c

#include "defines.h"

#ifdef _MSC_VER
#if _MSC_VER > 1400 && _MSC_VER < 1600

/*
 * When building for Microsoft Windows, main has a dependency on msvcr??.dll.
 *
 * When using Visual Studio 2005 or 2008, that must be recorded in
 * the [java,javaw].exe.manifest file.
 *
 * As of VS2010 (ver=1600), the runtimes again no longer need manifests.
 *
 * Reference:
 *     C:/Program Files/Microsoft SDKs/Windows/v6.1/include/crtdefs.h
 */
#include <crtassem.h>
#ifdef _M_IX86

#pragma comment(linker,"/manifestdependency:\"type='win32' "            \
        "name='" __LIBRARIES_ASSEMBLY_NAME_PREFIX ".CRT' "              \
        "version='" _CRT_ASSEMBLY_VERSION "' "                          \
        "processorArchitecture='x86' "                                  \
        "publicKeyToken='" _VC_ASSEMBLY_PUBLICKEYTOKEN "'\"")

#endif /* _M_IX86 */

//This may not be necessary yet for the Windows 64-bit build, but it
//will be when that build environment is updated.  Need to test to see
//if it is harmless:
#ifdef _M_AMD64

#pragma comment(linker,"/manifestdependency:\"type='win32' "            \
        "name='" __LIBRARIES_ASSEMBLY_NAME_PREFIX ".CRT' "              \
        "version='" _CRT_ASSEMBLY_VERSION "' "                          \
        "processorArchitecture='amd64' "                                \
        "publicKeyToken='" _VC_ASSEMBLY_PUBLICKEYTOKEN "'\"")

#endif  /* _M_AMD64 */
#endif  /* _MSC_VER > 1400 && _MSC_VER < 1600 */
#endif  /* _MSC_VER */

/*
 * Entry point.
 */
#ifdef JAVAW

char **__initenv;

int WINAPI
WinMain(HINSTANCE inst, HINSTANCE previnst, LPSTR cmdline, int cmdshow)
{
    int margc;
    char** margv;
    const jboolean const_javaw = JNI_TRUE;

    __initenv = _environ;

#else /* JAVAW */
int
main(int argc, char **argv)
{
    int margc;
    char** margv;
    const jboolean const_javaw = JNI_FALSE;
#endif /* JAVAW */
#ifdef _WIN32
    {
        int i = 0;
        if (getenv(JLDEBUG_ENV_ENTRY) != NULL) {
            printf("Windows original main args:\n");
            for (i = 0 ; i < __argc ; i++) {
                printf("wwwd_args[%d] = %s\n", i, __argv[i]);
            }
        }
    }
    JLI_CmdToArgs(GetCommandLine());
    margc = JLI_GetStdArgc();
    // add one more to mark the end
    margv = (char **)JLI_MemAlloc((margc + 1) * (sizeof(char *)));
    {
        int i = 0;
        StdArg *stdargs = JLI_GetStdArgs();
        for (i = 0 ; i < margc ; i++) {
            margv[i] = stdargs[i].arg;
        }
        margv[i] = NULL;
    }
#else /* *NIXES */
    margc = argc;
    margv = argv;
#endif /* WIN32 */
    return JLI_Launch(margc, margv,
                   sizeof(const_jargs) / sizeof(char *), const_jargs,
                   sizeof(const_appclasspath) / sizeof(char *), const_appclasspath,
                   FULL_VERSION,
                   DOT_VERSION,
                   (const_progname != NULL) ? const_progname : *margv,
                   (const_launcher != NULL) ? const_launcher : *margv,
                   (const_jargs != NULL) ? JNI_TRUE : JNI_FALSE,
                   const_cpwildcard, const_javaw, const_ergo_class);
}

其中, JLI_Launch () 函数在 java.c 中:

#ifndef _JAVA_H_
#define _JAVA_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

#include <jni.h>
#include <jvm.h>

/*
 * Get system specific defines.
 */
#include "emessages.h"
#include "java_md.h"
#include "jli_util.h"

#include "manifest_info.h"
#include "version_comp.h"
#include "wildcard.h"
#include "splashscreen.h"

# define KB (1024UL)
# define MB (1024UL * KB)
# define GB (1024UL * MB)

#define CURRENT_DATA_MODEL (CHAR_BIT * sizeof(void*))

/*
 * The following environment variable is used to influence the behavior
 * of the jre exec'd through the SelectVersion routine.  The command line
 * options which specify the version are not passed to the exec'd version,
 * because that jre may be an older version which wouldn't recognize them.
 * This environment variable is known to this (and later) version and serves
 * to suppress the version selection code.  This is not only for efficiency,
 * but also for correctness, since any command line options have been
 * removed which would cause any value found in the manifest to be used.
 * This would be incorrect because the command line options are defined
 * to take precedence.
 *
 * The value associated with this environment variable is the MainClass
 * name from within the executable jar file (if any). This is strictly a
 * performance enhancement to avoid re-reading the jar file manifest.
 *
 */
#define ENV_ENTRY "_JAVA_VERSION_SET"

#define SPLASH_FILE_ENV_ENTRY "_JAVA_SPLASH_FILE"
#define SPLASH_JAR_ENV_ENTRY "_JAVA_SPLASH_JAR"

/*
 * Pointers to the needed JNI invocation API, initialized by LoadJavaVM.
 */
typedef jint (JNICALL *CreateJavaVM_t)(JavaVM **pvm, void **env, void *args);
typedef jint (JNICALL *GetDefaultJavaVMInitArgs_t)(void *args);
typedef jint (JNICALL *GetCreatedJavaVMs_t)(JavaVM **vmBuf, jsize bufLen, jsize *nVMs);

typedef struct {
    CreateJavaVM_t CreateJavaVM;
    GetDefaultJavaVMInitArgs_t GetDefaultJavaVMInitArgs;
    GetCreatedJavaVMs_t GetCreatedJavaVMs;
} InvocationFunctions;

int
JLI_Launch(int argc, char ** argv,              /* main argc, argc */
        int jargc, const char** jargv,          /* java args */
        int appclassc, const char** appclassv,  /* app classpath */
        const char* fullversion,                /* full version defined */
        const char* dotversion,                 /* dot version defined */
        const char* pname,                      /* program name */
        const char* lname,                      /* launcher name */
        jboolean javaargs,                      /* JAVA_ARGS */
        jboolean cpwildcard,                    /* classpath wildcard */
        jboolean javaw,                         /* windows-only javaw */
        jint     ergo_class                     /* ergnomics policy */
);

/*
 * Prototypes for launcher functions in the system specific java_md.c.
 */

jboolean
LoadJavaVM(const char *jvmpath, InvocationFunctions *ifn);

void
GetXUsagePath(char *buf, jint bufsize);

jboolean
GetApplicationHome(char *buf, jint bufsize);

#define GetArch() GetArchPath(CURRENT_DATA_MODEL)

/*
 * Different platforms will implement this, here
 * pargc is a pointer to the original argc,
 * pargv is a pointer to the original argv,
 * jrepath is an accessible path to the jre as determined by the call
 * so_jrepath is the length of the buffer jrepath
 * jvmpath is an accessible path to the jvm as determined by the call
 * so_jvmpath is the length of the buffer jvmpath
 */
void CreateExecutionEnvironment(int *argc, char ***argv,
                                char *jrepath, jint so_jrepath,
                                char *jvmpath, jint so_jvmpath,
                                char *jvmcfg,  jint so_jvmcfg);

/* Reports an error message to stderr or a window as appropriate. */
void JLI_ReportErrorMessage(const char * message, ...);

/* Reports a system error message to stderr or a window */
void JLI_ReportErrorMessageSys(const char * message, ...);

/* Reports an error message only to stderr. */
void JLI_ReportMessage(const char * message, ...);

/*
 * Reports an exception which terminates the vm to stderr or a window
 * as appropriate.
 */
void JLI_ReportExceptionDescription(JNIEnv * env);
void PrintMachineDependentOptions();

const char *jlong_format_specifier();

/*
 * Block current thread and continue execution in new thread
 */
int ContinueInNewThread0(int (JNICALL *continuation)(void *),
                        jlong stack_size, void * args);

/* sun.java.launcher.* platform properties. */
void SetJavaLauncherPlatformProps(void);
void SetJavaCommandLineProp(char* what, int argc, char** argv);
void SetJavaLauncherProp(void);

/*
 * Functions defined in java.c and used in java_md.c.
 */
jint ReadKnownVMs(const char *jvmcfg, jboolean speculative);
char *CheckJvmType(int *argc, char ***argv, jboolean speculative);
void AddOption(char *str, void *info);

enum ergo_policy {
   DEFAULT_POLICY = 0,
   NEVER_SERVER_CLASS,
   ALWAYS_SERVER_CLASS
};

const char* GetProgramName();
const char* GetDotVersion();
const char* GetFullVersion();
jboolean IsJavaArgs();
jboolean IsJavaw();
jint GetErgoPolicy();

jboolean ServerClassMachine();

int ContinueInNewThread(InvocationFunctions* ifn, jlong threadStackSize,
                   int argc, char** argv,
                   int mode, char *what, int ret);

int JVMInit(InvocationFunctions* ifn, jlong threadStackSize,
                   int argc, char** argv,
                   int mode, char *what, int ret);

/*
 * Initialize platform specific settings
 */
void InitLauncher(jboolean javaw);

/*
 * For MacOSX and Windows/Unix compatibility we require these
 * entry points, some of them may be stubbed out on Windows/Unixes.
 */
void     PostJVMInit(JNIEnv *env, jstring mainClass, JavaVM *vm);
void     ShowSplashScreen();
void     RegisterThread();
/*
 * this method performs additional platform specific processing and
 * should return JNI_TRUE to indicate the argument has been consumed,
 * otherwise returns JNI_FALSE to allow the calling logic to further
 * process the option.
 */
jboolean ProcessPlatformOption(const char *arg);

/*
 * This allows for finding classes from the VM's bootstrap class loader directly,
 * FindClass uses the application class loader internally, this will cause
 * unnecessary searching of the classpath for the required classes.
 *
 */
typedef jclass (JNICALL FindClassFromBootLoader_t(JNIEnv *env,
                                                  const char *name));
jclass FindBootStrapClass(JNIEnv *env, const char *classname);

jobjectArray CreateApplicationArgs(JNIEnv *env, char **strv, int argc);
jobjectArray NewPlatformStringArray(JNIEnv *env, char **strv, int strc);
jclass GetLauncherHelperClass(JNIEnv *env);

int JNICALL JavaMain(void * args); /* entry point                  */

enum LaunchMode {               // cf. sun.launcher.LauncherHelper
    LM_UNKNOWN = 0,
    LM_CLASS,
    LM_JAR
};

static const char *launchModeNames[]
    = { "Unknown", "Main class", "JAR file" };

typedef struct {
    int    argc;
    char **argv;
    int    mode;
    char  *what;
    InvocationFunctions ifn;
} JavaMainArgs;

#define NULL_CHECK_RETURN_VALUE(NCRV_check_pointer, NCRV_return_value) \
    do { \
        if ((NCRV_check_pointer) == NULL) { \
            JLI_ReportErrorMessage(JNI_ERROR); \
            return NCRV_return_value; \
        } \
    } while (JNI_FALSE)

#define NULL_CHECK0(NC0_check_pointer) \
    NULL_CHECK_RETURN_VALUE(NC0_check_pointer, 0)

#define NULL_CHECK(NC_check_pointer) \
    NULL_CHECK_RETURN_VALUE(NC_check_pointer, )

#endif /* _JAVA_H_ */

附錄3: JVM 指令集

参考资料

https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html

https://dzone.com/articles/jvm-architecture-explained
https://www.jianshu.com/p/6dc6c921c9a2
https://en.wikipedia.org/wiki/Java_class_file
https://www.artima.com/insidejvm/ed2/linkmodP.html

https://slideplayer.com/slide/4939367/
https://en.wikipedia.org/wiki/List_of_Java_bytecode_instructions

上一篇下一篇

猜你喜欢

热点阅读