我正在使用Common Lisp(64位Debian GNU/Linux中的SBCL 1.1.5)编写一个多体系结构汇编程序/反汇编程序,目前汇编程序为x86-64的子集生成正确的代码.为了组装x86-64汇编代码,我使用一个哈希表,其中汇编指令助记符(字符串)如"jc-rel8"
和,"stosb"
是返回一个或多个编码函数列表的键,如下所示:
(defparameter *emit-function-hash-table-x64* (make-hash-table :test 'equalp)) (setf (gethash "jc-rel8" *emit-function-hash-table-x64*) (list #'jc-rel8-x86)) (setf (gethash "stosb" *emit-function-hash-table-x64*) (list #'stosb-x86))
编码函数就像这些(有些更复杂):
(defun jc-rel8-x86 (arg1 &rest args) (jcc-x64 #x72 arg1)) (defun stosb-x86 (&rest args) (list #xaa))
现在我试图通过使用转换为Common Lisp CLOS语法的NASM(NASM 2.11.06)指令编码数据(文件insns.dat
)来合并完整的x86-64指令集.这意味着代替用于发射的二进制码(如上面的函数)与自定义的情况下正则函数x86-asm-instruction
类(到目前为止一个非常基本的类,某些20个时隙:initarg
,:reader
,:initform
等等),其中emit
将用于带有参数的方法用于发出给定指令(助记符)和参数的二进制代码.转换后的指令数据看起来像这样(但它超过40'000行,正好是7193 make-instance
和7193 setf
).
;; first mnemonic + operand combination instances (:is-variant t). ;; there are 4928 such instances for x86-64 generated from NASM's insns.dat. (eval-when (:compile-toplevel :load-toplevel :execute) (setf Jcc-imm-near (make-instance 'x86-asm-instruction :name "Jcc" :operands "imm|near" :code-string "[i: odf 0f 80+c rel]" :arch-flags (list "386" "BND") :is-variant t)) (setf STOSB-void (make-instance 'x86-asm-instruction :name "STOSB" :operands "void" :code-string "[ aa]" :arch-flags (list "8086") :is-variant t)) ;; then, container instances which contain (or could be refer to instead) ;; the possible variants of each instruction. ;; there are 2265 such instances for x86-64 generated from NASM's insns.dat. (setf Jcc (make-instance 'x86-asm-instruction :name "Jcc" :is-container t :variants (list Jcc-imm-near Jcc-imm64-near Jcc-imm-short Jcc-imm Jcc-imm Jcc-imm Jcc-imm))) (setf STOSB (make-instance 'x86-asm-instruction :name "STOSB" :is-container t :variants (list STOSB-void))) ;; thousands of objects more here... ) ; this bracket closes (eval-when (:compile-toplevel :load-toplevel :execute)
我已经insns.dat
使用一个简单的Perl脚本将NASM转换为Common Lisp语法(如上所述)(下面进一步说明,但脚本本身并没有任何兴趣),原则上它可以工作.所以它可以工作,但编译这些7193对象真的很慢,通常会导致堆耗尽.在具有16G内存的Linux Core i7-2760QM笔记本电脑上,(eval-when (:compile-toplevel :load-toplevel :execute)
使用上述7193个对象编译代码块需要7分钟以上,有时会导致堆耗尽,如下所示:
;; Swank started at port: 4005. * Heap exhausted during garbage collection: 0 bytes available, 32 requested. Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age 0: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000 1: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000 2: 0 0 0 0 0 0 0 0 0 0 0 41943040 0 0 0.0000 3: 38805 38652 0 0 49474 15433 389 416 0 2144219760 9031056 1442579856 0 1 1.5255 4: 127998 127996 0 0 45870 14828 106 143 199 1971682720 25428576 2000000 0 0 0.0000 5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000 6: 0 0 0 0 1178 163 0 0 0 43941888 0 2000000 985 0 0.0000 Total bytes allocated = 4159844368 Dynamic-space-size bytes = 4194304000 GC control variables: *GC-INHIBIT* = true *GC-PENDING* = in progress *STOP-FOR-GC-PENDING* = false fatal error encountered in SBCL pid 9994(tid 46912556431104): Heap exhausted, game over. Welcome to LDB, a low-level debugger for the Lisp runtime environment. ldb>
我不得不--dynamic-space-size 4000
为SBCL 添加参数以使其完全编译,但仍然在分配4千兆字节的动态空间后,堆有时会耗尽.即使堆耗尽了,只需在类中添加一个槽('x86-asm-instruction
用于这些实例的类)后编译7193个实例超过7分钟对于REPL中的交互式开发来说太多了(我使用slimv,如果这很重要) .
这是(time (compile-file
输出:
; caught 18636 WARNING conditions ; insns.fasl written ; compilation finished in 0:07:11.329 Evaluation took: 431.329 seconds of real time 238.317000 seconds of total run time (234.972000 user, 3.345000 system) [ Run times consist of 6.073 seconds GC time, and 232.244 seconds non-GC time. ] 55.25% CPU 50,367 forms interpreted 784,044 lambdas converted 1,031,842,900,608 processor cycles 19,402,921,376 bytes consed
使用OOP(CLOS)将允许合并指令助记符(例如jc
或stosb
以上),指令的:name
允许操作数(:operands
),指令的二进制编码(例如#xaa
for stosb
,:code-string
)和:arch-flags
一个对象中指令的可能的体系结构限制().但似乎至少我3岁的计算机效率不足以快速编译大约7000个CLOS对象实例.
我的问题是:有没有办法使SBCL make-instance
更快,或者我应该在常规函数中保持汇编代码生成,如上面的例子?我也很高兴知道任何其他可能的解决方案.
这是Perl脚本,以防万一:
#!/usr/bin/env perl use strict; use warnings; # this program converts NASM's `insns.dat` to Common Lisp Object System (CLOS) syntax. my $firstchar; my $line_length; my $are_there_square_brackets; my $mnemonic_and_operands; my $mnemonic; my $operands; my $code_string; my $flags; my $mnemonic_of_current_mnemonic_array; my $clos_object_name; my $clos_mnemonic; my $clos_operands; my $clos_code_string; my $clos_flags; my @object_name_array = (); my @mnemonic_array = (); my @operands_array = (); my @code_string_array = (); my @flags_array = (); my @each_mnemonic_only_once_array = (); my @instruction_variants_array = (); my @instruction_variants_for_current_instruction_array = (); open(FILE, 'insns.dat'); $mnemonic_of_current_mnemonic_array = ""; # read one line at once. while () { $firstchar = substr($_, 0, 1); $line_length = length($_); $are_there_square_brackets = ($_ =~ /\[.*\]/); chomp; if (($line_length > 1) && ($firstchar =~ /[^\t ;]/)) { if ($are_there_square_brackets) { ($mnemonic_and_operands, $code_string, $flags) = split /[\[\]]+/, $_; $code_string = "[" . $code_string . "]"; ($mnemonic, $operands) = split /[\t ]+/, $mnemonic_and_operands; } else { ($mnemonic, $operands, $code_string, $flags) = split /[\t ]+/, $_; } $mnemonic =~ s/[\t ]+/ /g; $operands =~ s/[\t ]+/ /g; $code_string =~ s/[\t ]+/ /g; $flags =~ s/[\t ]+//g; # we don't want non-x86-64 instructions here. unless ($flags =~ "NOLONG") { # ok, the content of each field is now filtered, # let's convert them to a suitable Common Lisp format. $clos_object_name = $mnemonic . "-" . $operands; # in Common Lisp object names `|`, `,`, and `:` must be escaped with a backslash `\`, # but that would get too complicated. # so we'll simply replace them: # `|` -> `-`. # `,` -> `.`. # `:` -> `.`. $clos_object_name =~ s/\|/-/g; $clos_object_name =~ s/,/./g; $clos_object_name =~ s/:/./g; $clos_mnemonic = "\"" . $mnemonic . "\""; $clos_operands = "\"" . $operands . "\""; $clos_code_string = "\"" . $code_string . "\""; $clos_flags = "\"" . $flags . "\""; # add first and last double quotes. $clos_flags =~ s/,/" "/g; # make each flag its own Common Lisp string. $clos_flags = "(list " . $clos_flags. ")"; # convert to `list` syntax. push @object_name_array, $clos_object_name; push @mnemonic_array, $clos_mnemonic; push @operands_array, $clos_operands; push @code_string_array, $clos_code_string; push @flags_array, $clos_flags; if ($mnemonic eq $mnemonic_of_current_mnemonic_array) { # ok, same mnemonic as the previous one, # so the current object name goes to the list. push @instruction_variants_for_current_instruction_array, $clos_object_name; } else { # ok, this is a new mnemonic. # so we'll mark this as current mnemonic. $mnemonic_of_current_mnemonic_array = $mnemonic; push @each_mnemonic_only_once_array, $mnemonic; # we first push the old array (unless it's empty), then clear it, # and then push the current object name to the cleared array. if (@instruction_variants_for_current_instruction_array) { # push the variants array, unless it's empty. push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ]; } @instruction_variants_for_current_instruction_array = (); push @instruction_variants_for_current_instruction_array, $clos_object_name; } } } } # the last instruction's instruction variants must be pushed too. if (@instruction_variants_for_current_instruction_array) { # push the variants array, unless it's empty. push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ]; } close(FILE); # these objects need be created already during compilation. printf("(eval-when (:compile-toplevel :load-toplevel :execute)\n"); # print the code to create each instruction + operands combination object. for (my $i=0; $i <= $#mnemonic_array; $i++) { $clos_object_name = $object_name_array[$i]; $mnemonic = $mnemonic_array[$i]; $operands = $operands_array[$i]; $code_string = $code_string_array[$i]; $flags = $flags_array[$i]; # print the code to create a variant object. # each object here is a variant of a single instruction (or a single mnemonic). # actually printed as 6 lines to make it easier to read (for us humans, I mean), with an empty line in the end. printf("(setf %s (make-instance 'x86-asm-instruction\n:name %s\n:operands %s\n:code-string %s\n:arch-flags %s\n:is-variant t))", $clos_object_name, $mnemonic, $operands, $code_string, $flags); printf("\n\n"); } # print the code to create each instruction + operands combination object. # for (my $i=0; $i <= $#each_mnemonic_only_once_array; $i++) for my $i (0 .. $#instruction_variants_array) { $mnemonic = $each_mnemonic_only_once_array[$i]; # print the code to create a container object. printf("(setf %s (make-instance 'x86-asm-instruction :name \"%s\" :is-container t :variants (list \n", $mnemonic, $mnemonic); @instruction_variants_for_current_instruction_array = $instruction_variants_array[$i]; # for (my $j=0; $j <= $#instruction_variants_for_current_instruction_array; $j++) for my $j (0 .. $#{$instruction_variants_array[$i]} ) { printf("%s", $instruction_variants_array[$i][$j]); # print 3 closing brackets if this is the last variant. if ($j == $#{$instruction_variants_array[$i]}) { printf(")))"); } else { printf(" "); } } # if this is not the last instruction, print two newlines. if ($i < $#instruction_variants_array) { printf("\n\n"); } } # print the closing bracket to close `eval-when`. print(")"); exit;
Rainer Joswi.. 10
18636警告看起来非常糟糕,首先要删除所有警告.
我会从摆脱EVAL-WHEN
所有这一切开始.对我来说没有多大意义.直接加载文件,或编译并加载文件.
另请注意,(setf STOSB-void ...)
当变量未定义时,SBCL不喜欢.使用DEFVAR
或引入了新的顶级变量DEFPARAMETER
.SETF
只是设置它们,但不定义它们.这应该有助于摆脱警告.
此外:is-container t
,:is-variant t
这些属性的气味应该转换为继承的类(例如作为mixin).容器有变种.变体没有变体.
18636警告看起来非常糟糕,首先要删除所有警告.
我会从摆脱EVAL-WHEN
所有这一切开始.对我来说没有多大意义.直接加载文件,或编译并加载文件.
另请注意,(setf STOSB-void ...)
当变量未定义时,SBCL不喜欢.使用DEFVAR
或引入了新的顶级变量DEFPARAMETER
.SETF
只是设置它们,但不定义它们.这应该有助于摆脱警告.
此外:is-container t
,:is-variant t
这些属性的气味应该转换为继承的类(例如作为mixin).容器有变种.变体没有变体.