Post by dxfPost by Anton ErtlWhere can I find claims about better performance? All I have read is
claims about worse performance.
'Eliminate stack juggling' sounds like an argument for better performance.
Not to me. To me it sounds like a statement about the ease of writing
and reading the code.
The performance of locals vs. stack juggling depends on the
implementation. I know no implementation that performs register
allocation of locals or stack items (except the TOS) to registers
across basic block boundaries. This seems to hurt code with locals
more than code that keeps everything on the stacks. Here's the data
from an earlier posting <***@mips.complang.tuwien.ac.at>,
now including data from iForth:
locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)
The data from iForth is the outlier here, let's look at the code:
Source code:
defer dummy
: z" [char] " parse 2drop postpone dummy ; immediate
defer zformat
defer z+
defer >name
defer error
: VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid
pindex paddr
ELSE \ Index is invalid
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;
: VICHECK2 ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
Z" Invalid index " 2 PICK ZFORMAT Z+
Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
Z" length " Z+ OVER @ ZFORMAT Z+
ERROR
NIP 0 SWAP \ Use zeroth index
THEN ;
One difference is that VICHECK2 does not just replace the locals with
stack stuff and eliminate the first branch of the IF, but also
replaces ">NAME 1+" with "CELL- @".
Disassembled code:
VICHECK1 VICHECK2
pop rbx pop rbx
lea rsi, [rsi #-16 +] qword mov rdi, [rsp] qword
mov [esi] dword, rbx push rbx
pop rbx push rdi
lea rsi, [rsi #-16 +] qword push 0 b#
mov [esi] dword, rbx mov rbx, [rsp #16 +] qword
mov rbx, [rsi #16 +] qword pop rdi
mov rbx, [rbx] qword mov rax, rdi
mov rdi, [rsi] qword sub rax, [rbx] qword
cmp rbx, rdi neg rax
jbe $10227337 offset NEAR pop rbx
push [rsi] qword sub rbx, rdi
push [rsi #16 +] qword cmp rax, rbx
jmp $10227395 offset NEAR seta bl
call $10226600 qword-offset movzx rbx, bl
push [rsi] qword neg rbx
call $10226E90 qword-offset cmp rbx, 0 b#
call $10226EB0 qword-offset jne $10227465 offset NEAR
call $10226600 qword-offset call $10226600 qword-offset
call $10226EB0 qword-offset mov rbx, [rsp #16 +] qword
push [rsi #16 +] qword push rbx
call $10226ED0 qword-offset call $10226E90 qword-offset
pop rbx call $10226EB0 qword-offset
lea rbx, [rbx 1 +] qword call $10226600 qword-offset
push rbx call $10226EB0 qword-offset
call $10226EB0 qword-offset pop rbx
call $10226600 qword-offset mov rdi, [rsp] qword
call $10226EB0 qword-offset push rbx
mov rbx, [rsi #16 +] qword push [rdi -8 +] qword
push [rbx] qword call $10226EB0 qword-offset
call $10226E90 qword-offset call $10226600 qword-offset
call $10226EB0 qword-offset call $10226EB0 qword-offset
call $10226EF0 qword-offset pop rbx
push 0 b# mov rdi, [rsp] qword
push [rsi #16 +] qword push rbx
add rsi, #32 b# push [rdi] qword
; call $10226E90 qword-offset
call $10226EB0 qword-offset
call $10226EF0 qword-offset
pop rbx
pop rdi
mov rdi, 0 d#
mov rcx, rdi
push rcx
push rbx
;
iForth 5.1-mini does not even keep the TOS in a register on basic
block boundaries, which results in pops and pushes at all the
boundaries, especially for the stack-only code. However, in the
actual application (where Z", ZFORMAT etc. don't compile as deferred
words) it would probably inline many of these words which might result
in better code for the stack variant. It does not keep locals in
stack items, either, but accesses them in memory through a separate
stack pointer.
The code at the start of VICHECK2 does not suffer from basic block
boundaries, yet makes less use of registers than I expected. By
contrast, in VICHECK1 iforth discovers that "0 paddr @ within" is
equivalent to "paddr @ u<", while for "0 2 pick @ within" it fails to
make the equivalent discovery.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net