Discussion:
Code generation for DOES> in Gforth
(too old to reply)
Anton Ertl
2024-09-21 17:25:51 UTC
Permalink
I recently noticed that Gforth still used the following COMPILE,
implementation for words defined with CREATE...SET-DOES> (and
consequently also for words defined with CREATE...DOES>):

: does, ( xt -- ) does-check ['] does-xt peephole-compile, , ;

Ignore DOES-CHECK (it has to do with stack-depth checking, still
incomplete). The rest means that it compiles the primitive DOES-XT
with the xt of the COMPILE,d word as immediate argument. DOES-XT
pushes the body of the word and then EXECUTEs the xt that SET-DOES>
has registered for this word. In most cases this is a colon
definition (always if DOES> is used), so the next thing that happens
is DOCOL, and then the code for the colon definition is run.

I have now replaced this with

: does, ( xt -- ) does-check dup >body lit, >extra @ compile, ;

What this does is to compile the body as a literal, and then it
COMPILE,s the xt that DOES-XT would EXECUTE. In the common case of a
colon definition this compiles a call to the colon definition. This
saves the overhead of accessing the doesfield and of dispatching on
its contents at run-time; all that is now done during compilation.

Let us first look at the generated code. Consider the example:

: myconst create , does> @ ;
5 myconst five
: foo five ;

SIMPLE-SEE FOO shows:

old new
$7F6F5CAE6BC8 does-xt 1->1 $7F46A7EA92B8 lit 1->1
$7F6F5CAE6BD0 five $7F46A7EA92C0 five
$7F6F5CAE6BD8 ;s 1->1 ok $7F46A7EA92C8 call 1->1
$7F46A7EA92D0 $7F46A7C0A168
$7F46A7EA92D8 ;s 1->1

For the following microbenchmark:

: d1 ( "name" -- )
create 0 ,
does> ( -- addr )
; \ yes, an empty DOES> exists in an application program
d1 z1

: bench-z1-comp ( -- )
iterations 0 ?do
1 z1 +!
loop ;

I see the following results per iteration (startup overhead included)
on a Rocket Lake:

old new
8.2 7.5 cycles:u
34.0 29.0 instructions:u
5.2 4.2 branches:u

So five instructions less (including one branch), resulting in a small
speedup for this microbenchmark.

The Gforth image contained 129 occurences of does-xt and after the
change it contains 12 (a part of the image is created with the
cross-compiler, which still compiles to DOES-XT. As a result, the
image size and gforth-fast (AMD64) native-code size in bytes are as
follows:

old new
2189364 2193264 image
448291 448659 native-code

The larger image is no surprise. For the 117 replaced does-xts, the
threaded code grows by 2 cells each, and the meta-data grows
correspondingly.

For the native code, the growth is not that expected. Let's see how
the code looks:

does-xt lit call
add rbx,$10 mov $00[r13],r8
mov $00[r13],r8 sub r13,$08
mov r8,-$08[rbx] mov r8,$08[rbx]
sub r13,$08 mov rax,$18[rbx]
sub rbx,$08 sub r14,$08
mov rax,-$08[r8] add rbx,$20
mov rdx,$18[rax] mov [r14],rbx
mov rax,-$10[rdx] mov rbx,rax
jmp eax mov rax,[rbx]
jmp eax

34 bytes 35 bytes

Ok, it's larger, but that explains only 117 extra bytes. Maybe the
interaction with other optimizations explains the rest.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
Anton Ertl
2024-10-03 10:59:26 UTC
Permalink
Post by Anton Ertl
I recently noticed that Gforth still used the following COMPILE,
implementation for words defined with CREATE...SET-DOES> (and
: does, ( xt -- ) does-check ['] does-xt peephole-compile, , ;
Ignore DOES-CHECK (it has to do with stack-depth checking, still
incomplete). The rest means that it compiles the primitive DOES-XT
with the xt of the COMPILE,d word as immediate argument. DOES-XT
pushes the body of the word and then EXECUTEs the xt that SET-DOES>
has registered for this word. In most cases this is a colon
definition (always if DOES> is used), so the next thing that happens
is DOCOL, and then the code for the colon definition is run.
I have now replaced this with
What this does is to compile the body as a literal, and then it
COMPILE,s the xt that DOES-XT would EXECUTE. In the common case of a
colon definition this compiles a call to the colon definition. This
saves the overhead of accessing the doesfield and of dispatching on
its contents at run-time; all that is now done during compilation.
Another benefit: Gforth used to implement special COMPILE,
implementations for 2VALUE and FVALUE. Here's the old implementation
of FVALUE:

: opt-fval ( xt -- ) >body postpone Literal postpone f@ ;

create dummy-fvalue
' f@ set-does>
' fvalue-to set-to
' opt-fval set-optimizer

: fvalue ( r "name" -- ) \ floating-ext f-value
\g Define @i{name} @code{( -- r1 )} where @i{r1} initially is
\g @i{r}; this value can be changed with @code{to @i{name}} or
\g @code{->@i{name}}.
['] dummy-fvalue create-from reveal f, ;

The new DOES, generates exactly the same code for FVALUEs as OPT-FVAL
does, so we no longer need OPT-FVAL and the use of SET-OPTIMIZER here.
Likewise for 2VALUE. This simplification reduces the image size by
927 bytes and the native-code size by 176 bytes.

The code for compiling an FVALUE looks as follows (before and after
the change):

5e fvalue x ok
: bla x ; ok
see-code bla
$7F1341F2D5A8 lit 1->2
$7F1341F2D5B0 x
7F1341A4EA63: mov r15,$08[rbx]
$7F1341F2D5B8 f@ 2->1
7F1341A4EA67: movsd [r12],xmm15
7F1341A4EA6D: movsd xmm15,[r15]
7F1341A4EA72: sub r12,$08
$7F1341F2D5C0 ;s 1->1
7F1341A4EA76: mov rbx,[r14]
7F1341A4EA79: add r14,$08
7F1341A4EA7D: mov rax,[rbx]
7F1341A4EA80: jmp eax

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net
Loading...