Forth systems where do/?do pushes that loop start address

Discussion:

(too old to reply)

Anton Ertl

2024-03-04 17:24:09 UTC

Many years ago I have read here about Forth systems where DO and ?DO
push three items on the return stack: the two values from the data
stack (initial index and limit) like many other Forth systems, but in
addition they also push the address that LOOP/+LOOP later jumps to.

I used to consider this to be inefficient, but it turns out that in an
efficient interpreter-based Forth system like, say gforth-fast from
2022 it would actually be more efficient than compiling that address
with the (LOOP)/(+LOOP) and loading it from there.

My question is: Which Forth systems have a DO/?DO that pushes the
address that LOOP/+LOOP then jumps to?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Krishna Myneni

2024-03-05 04:54:50 UTC

Permalink

On 3/4/24 11:24, Anton Ertl wrote:
> Many years ago I have read here about Forth systems where DO and ?DO
> push three items on the return stack: the two values from the data
> stack (initial index and limit) like many other Forth systems, but in
> addition they also push the address that LOOP/+LOOP later jumps to.
>
> I used to consider this to be inefficient, but it turns out that in an
> efficient interpreter-based Forth system like, say gforth-fast from
> 2022 it would actually be more efficient than compiling that address
> with the (LOOP)/(+LOOP) and loading it from there.
>
> My question is: Which Forth systems have a DO/?DO that pushes the
> address that LOOP/+LOOP then jumps to?
>
> - anton

Yes, kForth uses this method. DO pushes three items onto the return
stack, the two loop parameters, and the virtual instruction pointer.

\ From ForthVM.cpp

int CPP_do ()
{
// stack: ( -- | generate opcodes for beginning of loop structure )

pCurrentOps->push_back(OP_PUSH);
pCurrentOps->push_back(OP_PUSH);
pCurrentOps->push_back(OP_PUSHIP);

dostack.push(pCurrentOps->size());
return 0;
}

--
Krishna

Krishna Myneni

2024-03-05 04:57:09 UTC

Permalink

On 3/4/24 22:54, Krishna Myneni wrote:
> On 3/4/24 11:24, Anton Ertl wrote:
>> Many years ago I have read here about Forth systems where DO and ?DO
>> push three items on the return stack: the two values from the data
>> stack (initial index and limit) like many other Forth systems, but in
>> addition they also push the address that LOOP/+LOOP later jumps to.
>>
>> I used to consider this to be inefficient, but it turns out that in an
>> efficient interpreter-based Forth system like, say gforth-fast from
>> 2022 it would actually be more efficient than compiling that address
>> with the (LOOP)/(+LOOP) and loading it from there.
>>
>> My question is: Which Forth systems have a DO/?DO that pushes the
>> address that LOOP/+LOOP then jumps to?
>>
>> - anton
>
> Yes, kForth uses this method. DO pushes three items onto the return
> stack, the two loop parameters, and the virtual instruction pointer.
>
> \ From ForthVM.cpp
>
> int CPP_do ()
> {
> // stack: ( -- | generate opcodes for beginning of loop structure )
>
> pCurrentOps->push_back(OP_PUSH);
> pCurrentOps->push_back(OP_PUSH);
> pCurrentOps->push_back(OP_PUSHIP);
>
> dostack.push(pCurrentOps->size());
> return 0;
> }
>

To be clear, DO compiles three VM instructions to push the items onto
the return stack.

--
Krishna

Anton Ertl

2024-03-05 11:38:11 UTC

Permalink

Krishna Myneni <***@ccreweb.org> writes:
>Yes, kForth uses this method. DO pushes three items onto the return
>stack, the two loop parameters, and the virtual instruction pointer.

Thanks. You can find the performance benefit from that in gforth-fast
in the right bar of each benchmark
<http://www.complang.tuwien.ac.at/anton/tmp/select-uarch.eps>. It
provides pretty good speedups for siev, bubble, and matrix, and small
speedups in sha512.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Anton Ertl

2024-03-09 18:26:23 UTC

Permalink

Krishna Myneni <***@ccreweb.org> writes:
>Yes, kForth uses this method. DO pushes three items onto the return
>stack, the two loop parameters, and the virtual instruction pointer.
>
>\ From ForthVM.cpp
>
>int CPP_do ()
>{
> // stack: ( -- | generate opcodes for beginning of loop structure )
>
> pCurrentOps->push_back(OP_PUSH);
> pCurrentOps->push_back(OP_PUSH);
> pCurrentOps->push_back(OP_PUSHIP);
>
> dostack.push(pCurrentOps->size());
> return 0;
>}

Thanks. Why do you do it this way? Do you want to break dependence
chains on the virtual instruction pointer (the reason for the speedup
in my results)?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

a***@spenarnc.xs4all.nl

2024-03-05 08:36:39 UTC

Permalink

In article <***@mips.complang.tuwien.ac.at>
logging-data="3448296"; mail-complaints-to="***@eternal-september.org"; posting-account="U2FsdGVkX1/HptfcniFIEyKZUGV89+Ev",
Anton Ertl <***@mips.complang.tuwien.ac.at> wrote:
>Many years ago I have read here about Forth systems where DO and ?DO
>push three items on the return stack: the two values from the data
>stack (initial index and limit) like many other Forth systems, but in
>addition they also push the address that LOOP/+LOOP later jumps to.
>
>I used to consider this to be inefficient, but it turns out that in an
>efficient interpreter-based Forth system like, say gforth-fast from
>2022 it would actually be more efficient than compiling that address
>with the (LOOP)/(+LOOP) and loading it from there.
>
>My question is: Which Forth systems have a DO/?DO that pushes the
>address that LOOP/+LOOP then jumps to?

All the versions of ciforth MS/Linux/OSX 32/64 ARM/86 do this.

>
>- anton
>--
>M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
>comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
> New standard: https://forth-standard.org/
> EuroForth 2023: https://euro.theforth.net/2023
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

Anton Ertl

2024-03-05 11:58:37 UTC

Permalink

***@spenarnc.xs4all.nl writes:
>In article <***@mips.complang.tuwien.ac.at>
>>My question is: Which Forth systems have a DO/?DO that pushes the
>>address [at run-time to the return stack] that LOOP/+LOOP then jumps to?
>
>All the versions of ciforth MS/Linux/OSX 32/64 ARM/86 do this.

Thanks. AFAIK you started with fig-Forth that puts the loop-back
address in the interpreted code. Why did you change this approach?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

a***@spenarnc.xs4all.nl

2024-03-05 13:19:05 UTC

Permalink

In article <***@mips.complang.tuwien.ac.at>,
Anton Ertl <***@mips.complang.tuwien.ac.at> wrote:
>***@spenarnc.xs4all.nl writes:
>>In article <***@mips.complang.tuwien.ac.at>
>>>My question is: Which Forth systems have a DO/?DO that pushes the
>>>address [at run-time to the return stack] that LOOP/+LOOP then jumps to?
>>
>>All the versions of ciforth MS/Linux/OSX 32/64 ARM/86 do this.
>
>Thanks. AFAIK you started with fig-Forth that puts the loop-back
>address in the interpreted code. Why did you change this approach?

The address that I push is the address after the loop.
So LEAVE as well as LOOP discards only loop parameters and go NEXT.
(DO) is followed by a (FORWARD half jump, it doesn't jump over the
body but is resolved by a FORWARD) , so it knows what
address to push.
If I remember correctly the original FIG LEAVE was not ISO, so this
had to be fixed anyway. LEAVE and UNLOOP are almost synonyms.
Simple manipulation of the return stack are preferred in view of my
optimiser that can push return stack items into oblivion (registers).

DO LOOP in FIG / ISO say FORTH is a mess anyway. The idea that
signed/unsigned numbers can be handled uniformly was cute at the
time, when you could not spare 10 bytes. In the 50 years no novice
even dared to try negative indices or negative increments.

>- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

a***@spenarnc.xs4all.nl

2024-03-05 19:18:39 UTC

Permalink

In article <nnd$091faf4b$***@59a4330bdcfeaef0>,
<***@spenarnc.xs4all.nl> wrote:
>In article <***@mips.complang.tuwien.ac.at>,
>Anton Ertl <***@mips.complang.tuwien.ac.at> wrote:
>>***@spenarnc.xs4all.nl writes:
>>>In article <***@mips.complang.tuwien.ac.at>
>>>>My question is: Which Forth systems have a DO/?DO that pushes the
>>>>address [at run-time to the return stack] that LOOP/+LOOP then jumps to?
>>>
>>>All the versions of ciforth MS/Linux/OSX 32/64 ARM/86 do this.
>>
>>Thanks. AFAIK you started with fig-Forth that puts the loop-back
>>address in the interpreted code. Why did you change this approach?
>
>The address that I push is the address after the loop.
>So LEAVE as well as LOOP discards only loop parameters and go NEXT.
>(DO) is followed by a (FORWARD half jump, it doesn't jump over the
>body but is resolved by a FORWARD) , so it knows what
>address to push.
>If I remember correctly the original FIG LEAVE was not ISO, so this
>had to be fixed anyway. LEAVE and UNLOOP are almost synonyms.
>Simple manipulation of the return stack are preferred in view of my
>optimiser that can push return stack items into oblivion (registers).
>
>DO LOOP in FIG / ISO say FORTH is a mess anyway. The idea that
>signed/unsigned numbers can be handled uniformly was cute at the
>time, when you could not spare 10 bytes. In the 50 years no novice
>even dared to try negative indices or negative increments.
>
>>- anton
>Groetjes Albert

I looked at your original post again. Actually this is different.
+LOOP does a branch back. The address pushed on the return stack
is the address past the loop.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

Anton Ertl

2024-03-09 18:28:45 UTC

Permalink

***@spenarnc.xs4all.nl writes:
>DO LOOP in FIG / ISO say FORTH is a mess anyway. The idea that
>signed/unsigned numbers can be handled uniformly was cute at the
>time, when you could not spare 10 bytes. In the 50 years no novice
>even dared to try negative indices or negative increments.

LOOP is fine. +LOOP with negative increment is more problematic
(that's why Gforth has -LOOP), but it turns out that for running
backwards through an array, +LOOP with negative increment actually
works out ok. But Gforth now has MEM-DO..LOOP so you don't need to
worry about that.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

mhx

2024-03-05 09:32:34 UTC

Permalink

Anton Ertl wrote:
[..]
> My question is: Which Forth systems have a DO/?DO that pushes the
> address that LOOP/+LOOP then jumps to?

It is not 100% clear what you mean.
In iForth I do something special with both DO and LOOP , where the
LOOP action is probably closest to your question.

FORTH> : test 22 10 2 do 1+ loop . ; ok
FORTH> see test
Flags: ANSI
$01340A00 : test
$01340A0A push #22 b#
$01340A0C mov rcx, #10 d#
$01340A13 mov rbx, 2 d#
$01340A1A call (DO) offset NEAR
$01340A24 lea rax, [rax 0 +] qword
$01340A28 lea rbx, [rbx 1 +] qword
$01340A2C add [rbp 0 +] qword, 1 b#
$01340A31 add [rbp 8 +] qword, 1 b#
$01340A36 jno $01340A28 offset NEAR
$01340A3C add rbp, #24 b#
$01340A40 push rbx
$01340A41 jmp .+10 ( $0124A102 ) offset NEAR

Or, without SYMBOLIC disguising the (DO) machine code:

FORTH> false TO symbolic ok
FORTH> $01340A1A idis
$01340A1A call $012413F8 offset NEAR
$01340A1F jmp $01340A3C offset NEAR
$01340A24 lea rax, [rax 0 +] qword
$01340A28 lea rbx, [rbx 1 +] qword
$01340A2C add [rbp 0 +] qword, 1 b#
$01340A31 add [rbp 8 +] qword, 1 b#
$01340A36 jno $01340A28 offset NEAR
$01340A3C add rbp, #24 b#
$01340A40 push rbx
$01340A41 jmp $0124A102 offset NEAR

-marcel

Anton Ertl

2024-03-05 11:18:36 UTC

Permalink

***@iae.nl (mhx) writes:
>Anton Ertl wrote:
>[..]
>> My question is: Which Forth systems have a DO/?DO that pushes the
>> address that LOOP/+LOOP then jumps to?
>
>It is not 100% clear what you mean.
>In iForth I do something special with both DO and LOOP , where the
>LOOP action is probably closest to your question.
>
>FORTH> : test 22 10 2 do 1+ loop . ; ok
>FORTH> see test
>Flags: ANSI
>$01340A00 : test
>$01340A0A push #22 b#
>$01340A0C mov rcx, #10 d#
>$01340A13 mov rbx, 2 d#
>$01340A1A call (DO) offset NEAR
>$01340A24 lea rax, [rax 0 +] qword
>$01340A28 lea rbx, [rbx 1 +] qword
>$01340A2C add [rbp 0 +] qword, 1 b#
>$01340A31 add [rbp 8 +] qword, 1 b#
>$01340A36 jno $01340A28 offset NEAR
>$01340A3C add rbp, #24 b#
>$01340A40 push rbx
>$01340A41 jmp .+10 ( $0124A102 ) offset NEAR

Native-code systems generally use direct (conditional) jumps to the
loop start, like iforth does here with the jno.

What I meant is that some (interpreter-based) systems keep the loop
start address ($01340A28 in this example) on the return stack, and
LOOP/+LOOP takes it from there and then performs a (VM-level) jump
there (unless the loop is exited).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Stephen Pelc

2024-03-05 11:18:31 UTC

Permalink

On 4 Mar 2024 at 18:24:09 CET, "Anton Ertl" <Anton Ertl> wrote:
>
> My question is: Which Forth systems have a DO/?DO that pushes the
> address that LOOP/+LOOP then jumps to?
>
> - anton

VFX since the beginning.

Stephen

--
Stephen Pelc, ***@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com - free VFX Forth downloads

dxf

2024-03-05 23:42:41 UTC

Permalink

On 5/03/2024 10:18 pm, Stephen Pelc wrote:
> On 4 Mar 2024 at 18:24:09 CET, "Anton Ertl" <Anton Ertl> wrote:
>>
>> My question is: Which Forth systems have a DO/?DO that pushes the
>> address that LOOP/+LOOP then jumps to?
>>
>> - anton
>
> VFX since the beginning.

AFAICS the loop jump addr is hard-coded (JNO) as that was generally
seen as most efficient:

: test 10 0 do loop ; ok
see test
TEST
( 005945D0 488D6DF0 ) LEA RBP, [RBP+-10]
( 005945D4 48C745000A000000 ) MOV QWord [RBP], # 0000000A
( 005945DC 48895D08 ) MOV [RBP+08], RBX
( 005945E0 BB00000000 ) MOV EBX, # 00000000
( 005945E5 E86615E9FFFF45590000000 CALL 00425B50 (DO) 00000000005945FF
( 005945F2 49FFC6 ) INC R14
( 005945F5 49FFC7 ) INC R15
( 005945F8 71F8 ) JNO 005945F2
( 005945FA 415E ) POP R14
( 005945FC 415F ) POP R15
( 005945FE 58 ) POP RAX
( 005945FF C3 ) RET/NEXT
( 48 bytes, 12 instructions )

Stephen Pelc

2024-03-06 19:52:30 UTC

Permalink

On 6 Mar 2024 at 00:42:41 CET, "dxf" <***@gmail.com> wrote:

> On 5/03/2024 10:18 pm, Stephen Pelc wrote:
>> On 4 Mar 2024 at 18:24:09 CET, "Anton Ertl" <Anton Ertl> wrote:
>>>
>>> My question is: Which Forth systems have a DO/?DO that pushes the
>>> address that LOOP/+LOOP then jumps to?
>>>
>>> - anton
>>
>> VFX since the beginning.
>
> AFAICS the loop jump addr is hard-coded (JNO) as that was generally
> seen as most efficient:
>
> : test 10 0 do loop ; ok
> see test
> TEST
> ( 005945D0 488D6DF0 ) LEA RBP, [RBP+-10]
> ( 005945D4 48C745000A000000 ) MOV QWord [RBP], # 0000000A
> ( 005945DC 48895D08 ) MOV [RBP+08], RBX
> ( 005945E0 BB00000000 ) MOV EBX, # 00000000
> ( 005945E5 E86615E9FFFF45590000000 CALL 00425B50 (DO) 00000000005945FF
> ( 005945F2 49FFC6 ) INC R14
> ( 005945F5 49FFC7 ) INC R15
> ( 005945F8 71F8 ) JNO 005945F2
> ( 005945FA 415E ) POP R14
> ( 005945FC 415F ) POP R15
> ( 005945FE 58 ) POP RAX
> ( 005945FF C3 ) RET/NEXT
> ( 48 bytes, 12 instructions )

The three items pushed are
loop exit address
limit data of previous loop
index data of previous loop

The slightly odd list of items allows us to keep the index/limit data in
registers.

Stephen
--
Stephen Pelc, ***@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com - free VFX Forth downloads

Ruvim

2024-03-06 13:28:45 UTC

Permalink

On 2024-03-04 21:24, Anton Ertl wrote:
> Many years ago I have read here about Forth systems where DO and ?DO
> push three items on the return stack: the two values from the data
> stack (initial index and limit) like many other Forth systems, but in
> addition they also push the address that LOOP/+LOOP later jumps to.
>
> I used to consider this to be inefficient, but it turns out that in an
> efficient interpreter-based Forth system like, say gforth-fast from
> 2022 it would actually be more efficient than compiling that address
> with the (LOOP)/(+LOOP) and loading it from there.
>
> My question is: Which Forth systems have a DO/?DO that pushes the
> address that LOOP/+LOOP then jumps to?

In SP-Forth v3 and v4 (they generate native code), "DO" pushes three
items on the return stack, and among them the address that "LEAVE" then
jumps to. The address (actually, an offset) for "LOOP" is inlined.

Code in SP-Forth/4

: foo 7 1 do i . loop ; see foo

579793 8945FC MOV FC [EBP] , EAX
579796 C745F807000000 MOV F8 [EBP] , # 7
57979D B801000000 MOV EAX , # 1
5797A2 8BD8 MOV EBX , EAX
5797A4 81C000000080 ADD EAX , # 80000000
5797AA 2B45F8 SUB EAX , F8 [EBP]
5797AD 8BD0 MOV EDX , EAX
5797AF 8B45FC MOV EAX , FC [EBP]
5797B2 68D7975700 PUSH , # 5797D7
5797B7 52 PUSH EDX
5797B8 53 PUSH EBX
5797B9 90 XCHG EAX, EAX
5797BA 90 XCHG EAX, EAX
5797BB 90 XCHG EAX, EAX
5797BC 8945FC MOV FC [EBP] , EAX
5797BF 8B0424 MOV EAX , [ESP]
5797C2 8D6DFC LEA EBP , FC [EBP]
5797C5 E81ECDFDFF CALL 5564E8 ( . )
5797CA FF0424 INC [ESP]
5797CD FF442404 INC 4 [ESP]
5797D1 71E9 JNO 5797BC
5797D3 8D64240C LEA ESP , C [ESP]
5797D7 C3 RET NEAR
END-CODE

--
Ruvim

minforth

2024-03-06 14:15:25 UTC

Permalink

Ruvim wrote:

> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three
> items on the return stack, and among them the address that "LEAVE" then
> jumps to.

That would make implementing BREAK and CONTINUE rather easy...

Ruvim

2024-03-06 15:25:31 UTC

Permalink

On 2024-03-06 18:15, minforth wrote:
> Ruvim wrote:
>
>> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three
>> items on the return stack, and among them the address that "LEAVE"
>> then jumps to.
>
> That would make implementing BREAK and CONTINUE rather easy...

Yes, and it does make "LEAVE" implementing easy. But there are no other
advantages.

--
Ruvim

dxf

2024-03-07 02:13:32 UTC

Permalink

On 7/03/2024 12:28 am, Ruvim wrote:
> ...
> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three items on the return stack, and among them the address that "LEAVE" then jumps to. ...
That is the classic implementation as suggested by Bob Berkey - inventor of
the Forth-83 DO LOOP. IIUC Anton is asking about systems that push the loop
address - not the exit address.

Ruvim

2024-03-07 09:24:46 UTC

Permalink

On 2024-03-07 06:13, dxf wrote:
> On 7/03/2024 12:28 am, Ruvim wrote:
>> ...
>> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three items on the return stack, and among them the address that "LEAVE" then jumps to. ...
> That is the classic implementation as suggested by Bob Berkey - inventor of
> the Forth-83 DO LOOP.

Thank you, I did not know that!

I see this fact was mentioned in Forth Dimension Volume 04 Number 3,
1982 (FD-V4N3, p24) [1]

> IIUC Anton is asking about systems that push the loop
> address - not the exit address.

Yes, I noticed, just wanted to give another option of three items on the
return stack.

[1]
https://archive.org/details/Forth_Dimension_Volume_04_Number_3/page/n24/mode/1up?view=theater

--
Ruvim

dxf

2024-03-08 07:30:58 UTC

Permalink

On 7/03/2024 8:24 pm, Ruvim wrote:
> On 2024-03-07 06:13, dxf wrote:
>> On 7/03/2024 12:28 am, Ruvim wrote:
>>> ...
>>> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three items on the return stack, and among them the address that "LEAVE" then jumps to. ...
>> That is the classic implementation as suggested by Bob Berkey - inventor of
>> the Forth-83 DO LOOP.
>
> Thank you, I did not know that!
>
> I see this fact was mentioned in Forth Dimension Volume 04 Number 3, 1982 (FD-V4N3, p24) [1]
>
>
>> IIUC Anton is asking about systems that push the loop
>> address - not the exit address.
>
> Yes, I noticed, just wanted to give another option of three items on the return stack.

An alternative two item solution was presented in FD V5N4 p22. It would suit smaller
systems.

Anton Ertl

2024-03-09 18:09:57 UTC

Permalink

dxf <***@gmail.com> writes:
>On 7/03/2024 12:28 am, Ruvim wrote:
>> ...
>> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three items on the return stack, and among them the address that "LEAVE" then jumps to. ...
>
>That is the classic implementation as suggested by Bob Berkey - inventor of
>the Forth-83 DO LOOP. IIUC Anton is asking about systems that push the loop
>address - not the exit address.

Correct.

So, to summarize the answers:

kForth keeps the loop-back address on the return stack in addition to
index and limit.

A number of systems (at least ciForth, VFX, SP-Forth) have followed
Bob Berkey's suggestion of keeping the loop-exit address on the return
stack for LEAVE. That is not what I was asking about, but it's also
interesting.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

dxf

2024-03-09 23:43:48 UTC

Permalink

On 10/03/2024 5:09 am, Anton Ertl wrote:
> dxf <***@gmail.com> writes:
>> On 7/03/2024 12:28 am, Ruvim wrote:
>>> ...
>>> In SP-Forth v3 and v4 (they generate native code), "DO" pushes three items on the return stack, and among them the address that "LEAVE" then jumps to. ...
>>
>> That is the classic implementation as suggested by Bob Berkey - inventor of
>> the Forth-83 DO LOOP. IIUC Anton is asking about systems that push the loop
>> address - not the exit address.
>
> Correct.
>
> So, to summarize the answers:
>
> kForth keeps the loop-back address on the return stack in addition to
> index and limit.
>
> A number of systems (at least ciForth, VFX, SP-Forth) have followed
> Bob Berkey's suggestion of keeping the loop-exit address on the return
> stack for LEAVE. That is not what I was asking about, but it's also
> interesting.

The lack of affirmative responses is unsurprising since a native hard-coded
loop is considered 'as good as it gets'. It's difficult to imagine under
what circumstances a loop address on the stack is faster, but it suggests
one is starting from an inefficient or compromised base.

Anton Ertl

2024-03-12 11:41:15 UTC

Permalink

dxf <***@gmail.com> writes:
>On 10/03/2024 5:09 am, Anton Ertl wrote:
>It's difficult to imagine under
>what circumstances a loop address on the stack is faster, but it suggests
>one is starting from an inefficient or compromised base.

The starting point is gforth-fast from June 2023. Here's an example.
The inner loop of the siev benchmark is:

0 i c! dup +loop

The following shows the threaded code intermixed with the native code:

loop-back address in ...
... threaded code ... return stack
lit 1->2 lit 1->2
#0 #0
mov r15,[r14] mov r15,[r14]
add r14,$10 add r14,$10
i 2->3 i 2->3
mov r9,[rbx] mov r9,[rbx]
add r14,$08 add r14,$08
c! 3->1 c! 3->1
mov [r9],r15lb mov [r9],r15lb
add r14,$08 add r14,$08
dup 1->2 dup 1->2
mov r15,r8 mov r15,r8
add r14,$08 add r14,$08
(+loop) 2->1 (+loop)-rstack 2->1
<PRIMES+$108>
mov rax,[rbx] mov rdx,[rbx]
mov rsi,[r14] mov rsi,$10[rbx]
lea r10,$08[r14] mov rax,rdx
mov rdx,rax sub rax,$08[rbx]
sub rdx,$08[rbx] add rdx,r15
add rax,r15 lea rcx,[r15][rax]
lea rcx,[r15][rdx] xor rcx,rax
xor rcx,rdx xor rax,r15
xor rdx,r15 test rcx,rax
test rcx,rdx js $7F22DC4C075F
js $7F860CE101F1 mov r14,rsi
mov [rbx],rax mov [rbx],rdx
mov rcx,[rsi] add r14,$08
lea r14,$08[rsi] mov rcx,-$08[r14]
jmp ecx jmp ecx

On Zen3 (Ryzen 5800X) and Tiger Lake (Core i5-1135G7) the return stack
variant is faster by a factor >2; we also see speedups on other
processors, but they are smaller. Where do these speedups come from?

If you look at the updates to r14, which contains the virtual-machine
instruction pointer updates, they are as follows:

loop-back address in ...
... threaded code ... return stack
add r14,$10 add r14,$10
add r14,$08 add r14,$08
add r14,$08 add r14,$08
add r14,$08 add r14,$08
mov rsi,[r14] mov rsi,$10[rbx]
lea r14,$08[rsi] mov r14,rsi
add r14,$08

The crucial difference is that in the left column there is an unbroken
dependence chain from the r14 at the end of the previous iteration to
the r14 at the end of the present iteration; this dependence chain has
a latency of 9 cycles per iteration on Zen3, meaning that, with enough
iterations, the loop takes at least 9 cycles.

In the right column r14 at the end of one iteration does not depend on
r14 at the end of the previous iteration, because the dependence chain
starts from the instruction "mov rsi,$10[rbx]". This means that the
loop can be executed faster and on Zen3 and on Tiger Lake, that
speedup happens to be more than a factor of 2.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023