I've written a virtual machine in C which has decent performance for a non-JIT VM, but I want to learn something new, and improve performance. My current implementation simply uses a switch to translate from VM bytecode to instructions, which is compiled to a jump table. Like I said, decent performance for what it is, but I've hit a barrier that can only be overcome with a JIT compiler.

我用C语言编写了一个虚拟机,它对于非jit VM有很好的性能,但是我想学习一些新的东西,并提高性能。我当前的实现只是使用一个开关将VM字节码转换为指令,指令被编译为跳转表。就像我说的,这是一种不错的性能,但是我遇到了一个障碍,只能用JIT编译器来克服。

I've already asked a similar question not long ago about self-modifying code, but I came to realize that I wasn't asking the right question.


So my goal is to write a JIT compiler for this C virtual machine, and I want to do it in x86 assembly. (I'm using NASM as my assembler) I'm not quite sure how to go about doing this. I'm comfortable with assembly, and I've looked over some self-modifying code examples, but I haven't come to figure out how to do code generation just yet.


My main block so far is copying instructions to an executable piece of memory, with my arguments. I'm aware that I can label a certain line in NASM, and copy the entire line from that address with the static arguments, but that's not very dynamic, and doesn't work for a JIT compiler. I need to be able to interpret the instruction from bytecode, copy it to executable memory, interpret the first argument, copy it to memory, then interpret the second argument, and copy it to memory.


I've been informed about several libraries that would make this task easier, such as GNU lightning, and even LLVM. However, I'd like to write this by hand first, to understand how it works, before using external resources.

我被告知有几个库可以使这个任务更容易,比如GNU lightning,甚至LLVM。但是,我想先用手来写,在使用外部资源之前先了解它是如何工作的。

Are there any resources or examples this community could provide to help me get started on this task? A simple example showing two or three instructions like "add" and "mov" being used to generate executable code, with arguments, dynamically, in memory, would do wonders.


I wouldn't recommend writing a JIT in assembly at all. There are good arguments for writing the most frequently executed bits of the interpreter in assembly. For an example of how this looks like see this comment from Mike Pall, the author of LuaJIT.

我完全不建议在程序集中编写JIT。在汇编语言中编写最频繁执行的解释器位是有很好的理由的。要了解这种情况的示例,请参见《LuaJIT》的作者Mike Pall的评论。

As for the JIT, there are many different levels with varying complexity:


  1. Compile a basic block (a sequence of non-branching instructions) by simply copying the interpreter's code. For example, the implementations of a few (register-based) bytecode instructions might look like this:


    ; ebp points to virtual register 0 on the stack
        mov eax, [ebp + ecx * 4]  ; load first operand from stack
        add eax, [ebp + edx * 4]  ; add second operand from stack
        mov [ebp + ebx * 4], eax  ; write back result
        ... ; similar

    So, given the instruction sequence ADD R3, R1, R2, SUB R3, R3, R4 a simple JIT could copy the relevant parts of the interpreters implementation into a new machine code chunk:


        mov ecx, 1
        mov edx, 2
        mov ebx, 3
        mov eax, [ebp + ecx * 4]  ; load first operand from stack
        add eax, [ebp + edx * 4]  ; add second operand from stack
        mov [ebp + ebx * 4], eax  ; write back result
        mov ecx, 3
        mov edx, 4
        mov ebx, 3
        mov eax, [ebp + ecx * 4]  ; load first operand from stack
        sub eax, [ebp + edx * 4]  ; add second operand from stack
        mov [ebp + ebx * 4], eax  ; write back result

    This simply copies the relevant code, so we need to initialise the registers used accordingly. A better solution would be to translate this directly into machine instructions mov eax, [ebp + 4], but now you already have to manually encode the requested instructions.

    这只是复制相关的代码,因此我们需要初始化相应使用的寄存器。更好的解决方案是直接将其转换为机器指令mov eax [ebp + 4],但是现在您已经需要手动编码所请求的指令。

    This technique removes the overheads of interpretation, but otherwise does not improve efficiency much. If the code is executed for only one or two times, then it may not worth it to first translate it to machine code (which requires flushing at least parts of the I-cache).


  2. While some JITs use the above technique instead of an interpreter, they then employ a more complicated optimisation mechanism for frequently executed code. This involves translating the executed bytecode into an intermediate representation (IR) on which additional optimisations are performed.


    Depending on the source language and the type of JIT, this can be very complex (which is why many JITs delegate this task to LLVM). A method-based JIT needs to deal with joining control-flow graphs, so they use SSA form and run various analyses on that (e.g., Hotspot).


    A tracing JIT (like LuaJIT 2) only compiles straight line code which makes many things easier to implement, but you have to be very careful how you pick traces and how you link multiple traces together efficiently. Gal and Franz describe one method in this paper (PDF). For another method see the LuaJIT source code. Both JITs are written in C (or perhaps C++).

    跟踪JIT(如LuaJIT 2)只编译直线代码,这使许多事情更容易实现,但是您必须非常小心地选择跟踪,以及如何高效地将多个跟踪链接在一起。Gal和Franz在本文中描述了一种方法。有关另一个方法,请参阅LuaJIT源代码。两个JITs都是用C(或者可能是c++)编写的。



I suggest you look at the project http://code.google.com/p/asmjit/. By using the framework it provides, you can save a lot of energy. If you want write all things by hand, just read the source and rewrite it yourself, I think it's not very hard.


