1パスS式アセンブラCOMFY-65を読む

S式をベースにした言語はそれこそLisp/Schemeのような高級言語からWeb Assemblyのテキスト表現( https://webassembly.github.io/spec/core/text/index.html )のようなある種の中間表現まで巾が有るが、一番ギリギリのラインは、COMFY-65やSassyのようなS式アセンブラだろう。

http://home.pipeline.com/~hbaker1/sigplannotices/sigcol03.pdf
- COMFY言語の解説(BAKER3) - (読み飛ばして良い)
http://home.pipeline.com/~hbaker1/sigplannotices/sigcol04.pdf
- COMFY-65の解説とソースリスト(BAKER4) - COMFYコンパイラの実装(Emacs Lisp)、プリミティブの解説
- http://home.pipeline.com/~hbaker1/lisp/ にソースが置いてある
http://sassy.sourceforge.net/sassy-Z-H-1.html
- Sassy(COMFYを採用したx86 アセンブラ、Scheme)のドキュメント - プリミティブの解説

これらはアセンブラなのでレジスタの割当は自前で行う必要があるが、ジャンプ命令/分岐命令の替わりにCOMFYと呼んでいるプリミティブを使用できる。COMFYの記事では6502とZ80に言及しているが、これらはそもそもレジスタが少いアーキテクチャなのでレジスタの割り当てに悩む必要性はあまりない。

1pass実装の難しさ = 可変長命令のfixup

いわゆる1pass言語の難しさは色々な側面がある。ココで度々引いているTURBO Pascalでは、高級言語としての言語仕様から実際のアセンブルに至るまで可能な限り1passとなるように設計されている。当然TURBO Pascalでは最適化は犠牲になっているが、同時に、生成されるコードにもある種の"特徴"が表われる結果になっている。
例えば、WHILEループが生成するコードを考えたとき、

https://www.pcengines.ch/tp3.htm

WHILE

l1: 	condition           ;evaluate condition
	J..   l2            ;:condition met ★ 条件判断
	JMP   l3 ★ 条件不成立時にループ本体を避ける
l2: 	statement
	JMP   l1            ;try again
l3:	                    ;end of loop

条件判断を逆方向にすれば"JMP l3"を避けることはできるんじゃないかという気はする。しかし、1passコンパイラではこの戦略は取れない。8086の条件分岐命令は全てshortジャンプ(飛び先のアドレスが命令相対の符号付き8bitアドレス)であるため、ループの本体が127バイトに収まらない限り"JMP l3"の省略は不可能であり、コードを左から右に解釈してコードを生成していく1passコンパイラではループ本体のサイズを事前に知ることができない。(実際のTURBO PascalはCASE文や他のケースである種のバックトラックを行うことがある。)
...もちろん最初に最も長いジャンプを想定してコード生成しても、後から生成したコードをパッチする方向の実装によってジャンプを最適化することは不可能ではない。この点はCOMFYの作者も指摘している。

http://home.pipeline.com/~hbaker1/Prag-Parse.html

Since one already knows the location to which one must jump at the time a branch is emitted, as well as the location of the current instruction, one can easily choose the correct short/long jump sequence.

COMFY-65はもっと直接的なトリックで1passを実現している。つまりプログラムを完全に逆順で生成し、コードも高位アドレスから低位アドレスに向かって書き込みを行っていく。これは要するに深さ優先のトラバースに相当するため、単にコード生成手続き(compile)を再帰的に呼べば良い。これは再帰呼び出しのためのコストがコンパイル時に掛かり、(通常のPIC - Position Independent Code - のような)プログラム再配置のための考察が現実的なワークロードのためには必要になる。

http://home.pipeline.com/~hbaker1/lisp/CFYCMP1.LSP

(defvar mem (make-vector 10 0) ;; 生成した機械語を格納するためのベクタ
  "Vector where the compiled code is placed.")
(setq mem (make-vector 100 0))

(defvar f (length mem) ;; 生成先を表わすポインタ
  "Compiled code array pointer; it works its way down from the top.")
(setq f (length mem))

(defun init () ;; 初期化手続き
  (fillarray mem 0)
  (setq f (length mem)))

- snip -

(defun gen (obj) ;; 1バイト出力手続き
  ;;; place one character "obj" into the stream.
  (setq f (1- f)) ;; 1減算
  (aset mem f obj)
  f) ;; 書き込み先のアドレスを返す

COMFYコンパイラ

(以下はBAKER4の方を参照している。)
COMPFYコンパイラのインターフェースは単純で、

(compile 式 <win-address> <lose-address>) ;; => address

のようになっている。式が分岐命令(test)であった場合は条件によってwinまたはloseのいずれかにジャンプするコードを出力する。(Schemeで言うところのcaseのように分岐先が多数に渡る場合は小細工するしかない。)
COMFYでは、式を以下の3種に分類している:

test: 分岐命令。条件によってwinまたはloseのアドレスを次に実行するようなコードを出力する。
action: 通常の命令。IP(Instruction Pointer: 命令ポインタ)を書き換えない命令全て。
jump: ジャンプ命令。IPを書き換える命令で6502の場合RTS(callからのリターン、BAKER4ではreturn)およびRTI(割り込みからのリターン、BAKER4ではresume)。

分岐の生成はかなり単純化される: コードを逆順で生成することにより、COMFYのプリミティブで生成されるジャンプ先アドレスは常に既知となるため、単にshort jumpが生成できるかどうかを検証して必要なジャンプ命令を出力するだけとなる。

(defun inv (c) ;; 6502条件分岐OPコードの条件を反転させる
  ;;; invert the condition for a branch.
  ;;; invert bit 5 (counting from the right).
  (logxor c 32))

(defun genbr (win)
  ;;; generate an unconditional jump to "win".
  (gen 0) (gen 0) (gen jmp) (ra f win))

(defun 8bitp (n)
  (let* ((m (logand n -128)))
    (or (= 0 m) (= -128 m))))

(defun genbrc (c win lose)
  ;;; generate an optimized conditional branch
  ;;; on condition c to "win" with failure to "lose".
  (let* ((w (- win f)) (l (- lose f)))   ;;; Normalize to current point.
    ;; w = winアドレスまでの距離、l = loseアドレスまでの距離
    (cond ((= w l) win) ;; win == lose ならそもそも分岐でない。何もせず帰る。
          ;; lose側が直下(距離ゼロ)でwin側にshort分岐命令で到達できるなら、short分岐命令を出力
          ;; (genは8bit値を直接出力、cには分岐命令のOPコードが代入されている)
          ((and (= l 0) (8bitp w)) (gen w) (gen c))
          ;; 逆にwin側が直下でlose側にshort分岐命令で到達できるなら、条件をinvで反転して出力
          ((and (= w 0) (8bitp l)) (gen l) (gen (inv c)))
          ;; win/loseどちらも直下ではないがshortで到達できる場合、loseの分岐ジャンプ、winの分岐ジャンプ の順に出力
          ((and (8bitp l) (8bitp (- w 2)))
           (gen l) (gen (inv c)) (gen (- w 2)) (gen c))
          ;; (上記の逆パタン - win/loseのどちらかは2バイトぶん要件がゆるいため両者のチェックが必要)
          ((and (8bitp w) (8bitp (- l 2)))
           (gen w) (gen c) (gen (- l 2)) (gen (inv c)))
          ;; win側がshortで到達できないがlose側が到達できるケース
          ((8bitp (- l 3)) (genbrc c (genbr win) lose))
          ;; fallback。長いジャンプ命令でloseに飛ばし、再帰してwin側の分岐命令を生成する
          (t (genbrc c win (genbr lose))))))

各プリミティブの実装

COMFYプリミティブのうち、not、seq、loop、if、whileはCompile内部で特別扱いされている。他のプリミティブはマクロで実装できる。
notは、単にwinとloseを逆にしてcompileを再度呼べば実装できる。seqも再帰的に生成されるが、上記のようにジャンプは最適化された形で生成されるので無駄は少い。

(defun ra (b a) ;; loop/whileで使用される: 一度生成したジャンプのアドレスをパッチする
  ;;; replace the absolute address at the instruction "b"
  ;;; by the address "a".
  (let* ((ha (lsh a -8))
         (la (logand a 255)))
    (aset mem (1+ b) la)
    (aset mem (+ b 2) ha))
  b)

(defun compile (e win lose)
  ;;; compile expression e with success continuation "win" and
  ;;; failure continuation "lose".
  ;;; "win" an "lose" are both addresses of stuff higher in memory.
  (cond ((numberp e) (gen e))           ; allow constants.
        ((macrop e)
         (compile (apply (get e 'cmacro) (list e)) win lose))
        ((jumpp e) (gen (get e 'jump))) ; must be return or resume.
        ((actionp e) (emit e win))      ; single byte instruction.
        ((testp e) (genbrc (get e 'test) win lose)) ; test instruction ;; 分岐命令
        ((eq (car e) 'not) (compile (cadr e) lose win)) ;; not: win/loseを逆にして再帰
        ((eq (car e) 'seq)
         (cond ((null (cdr e)) win) ;; (seq) は単にwinへ進む。
               (t (compile (cadr e)
                           (compile (cons 'seq (cddr e)) win lose)
                           lose))))
        ((eq (car e) 'loop)
         (let* ((l (genbr 0)) ;; ループ本体の末尾に配置するlongジャンプ命令
                              ;; (プログラムは逆順に生成しているので、これが先に生成される)
                (r (compile (cadr e) l lose))) ;; ループ本体をアセンブルする
           (ra l r) ;; (genbr 0)で生成したジャンプ命令をパッチする
           r)) ;; ループ先頭のアドレスを返す
        ((numberp (car e))              ; duplicate n times.
         (cond ((zerop (car e)) win)
               (t (compile (cons (1- (car e)) (cdr e))
                           (compile (cadr e) win lose)
                           lose))))
        ((eq (car e) 'if)               ; if-then-else.
         (compile (cadr e)
                  (compile (caddr e) win lose)
                  (compile (cadddr e) win lose)))
        ((eq (car e) 'while)            ; do-while.
         (let* ((l (genbr 0))
                (r (compile (cadr e)
                            (compile (caddr e) l lose)
                            win)))
           (ra l r)
           r))
        ;;; allow for COMFY macros !
        ((macrop (car e))
         (compile (apply (get (car e) 'cmacro) (list e)) win lose))
        (t (emit e win))))