"Back & Forth" is back!

After explaining why Forth is so hard, I'm explaining what reasons there could be to use it. With a little personal journey put in as a side note.
http://youtu.be/MXKZPGzlx14
Hans Bezemer

Who is Forth for? Based on that video it's for folks who enjoy building the
functions they'll need :)

The 'test string for a float' function got my attention. Wondering how I'd
do that I decided on a stripped down >FLOAT .

https://pastebin.com/UgpE14pc

Hans Bezemer

2024-08-11 12:46:08 UTC

After explaining why Forth is so hard, I'm explaining what reasons there could be to use it. With a little personal journey put in as a side note.
http://youtu.be/MXKZPGzlx14
Hans Bezemer

Who is Forth for? Based on that video it's for folks who enjoy building the
functions they'll need :)

If you perceive it like that, I can hardly deny it. I do remember
clearly though adding a few of my professional experiences as well.
I could have included another one, where some "name=value" format was
encountered. So, yes, that one got converted as well. A colleague of
mine wanted the source, I printed it for him and to my surprise he
opened the printer tray. "What are you doing?" I asked him - and he
answered "I thought there'd be more.."

But I thought it was too similar to the XML example - and although it
actually happened (I was there) - a bit over the top.

Post by dxf
The 'test string for a float' function got my attention. Wondering how I'd
do that I decided on a stripped down >FLOAT .
https://pastebin.com/UgpE14pc
Gimme points for originality ;-)

I did a quick hack once of a full (ZenFloat) >FLOAT. It works on a
subset of floats, but it's not as nice as yours.

max-n 10 / constant (limit) \ cell boundary

: (convert) ( a1 n1 n2 -- a2 n3 n4 n5)
0 >r >r \ setup exponent, save accu
begin
over over \ is there any string left?
while \ if so, get digit and compare
c@ [char] 0 - dup 0< over 9 > or 0= r@ (limit) < and
while \ don't cross the cell boundary
r> 10 * + r> 1+ >r >r chop \ shift left, increment exponent
repeat drop r> r> \ drop value, get accu and exponent
;
\ returns an exponent
: (+exp) ( a1 n1 -- a2 n2 n3)
0 >r begin dup >r 0 (convert) nip over r> < while r> + >r repeat drop r>
; \ loop until string no longer changes

: (sign!) if negate then ; ( n bool -- -n|n)
: (sign?) over c@ [char] - = dup >r if chop then r> ;
: (frac) chop rot >r rot (convert) negate r> + 2swap (+exp) drop ;
: (exp) chop (sign?) >r 0 (convert) drop r> (sign!) >r rot r> + -rot ;

: >float ( a n -- f bool|-bool)
-trailing dup if (sign?) -rot else swap true exit then
dup if 0 (convert) >zero 2swap else 2drop drop false ;then
dup if (+exp) swap >r swap >r + r> r> then
dup if over c@ [char] . = if (frac) then then
dup if over c@ bl or [char] e = if (exp) then then
nip if 2drop drop false ;then \ if string left, no floating point

Post by dxf
r swap (sign!) r> true \ apply sign, signal ok

;

Hans Bezemer

Buzz McCool

2024-08-30 16:04:58 UTC

Post by Hans Bezemer
After explaining why Forth is so hard, I'm explaining what reasons there
could be to use it. With a little personal journey put in as a side note.
http://youtu.be/MXKZPGzlx14

I looked through a few of these videos and found them interesting, thank
you Hans for going to the trouble of making them.

I found Hans' recommendation on one of the videos (if I'm paraphrasing
it correctly) to avoid using the stack for more than two or three values
as treating the stack as an array makes for incomprehensible code,
enlightening.

I am trying to follow this recommendation, but am running into trouble
when trying to pass parameters into a loop. I'm trying to avoid using
the stack as a large array but what I came up by injecting a parameter
with a variable doesn't seem right.

Does anyone have suggestions on a better approach when you have several
parameters and loop counts to deal with?

(Trivial Example)

: AreaOfCir 2.0e f** pi f* ; \ w/ radius on stack,
\ compute area (radius^2 * pi)

: VolOfCyl AreaOfCir f* ; \ w/ height & radius on stack,
\ compute vol (height * area)
1.0e AreaOfCir fe.
3.1416E0

2.0e 1.0e VolOfCyl fe.
6.2832E0

fvariable radius \ create a floating point variable
1.0e radius f! \ store 1.0 into radius
radius f@ fe. \ fetch and print radius
1.0000E0

: CylVolLoop
cr ." Radius " radius f@ fe. \ print a new line and then fetch and print
radius
1 \ start counter from a cyl height of 1
begin dup 20 <= \ duplicate counter to see if counter <= end value
while \ while true (i.e. counter is <= to 20)
dup \ duplicate counter
s>f \ convert counter (height) to floating point
fdup \ duplicate height to print and use to compute vol
cr ." Height " fe. \ print a new line and then print height
radius f@ \ fetch radius
VolOfCyl \ compute volume
." Volume " fe. \ print volume
1 + \ add one to counter
repeat \ repeat the test at the "begin" word
drop ; \ remove the leftover loop counter value

CylVolLoop \ Execute CylVolLoop word
Radius 1.0000E0
Height 1.0000E0 Volume 3.1416E0
Height 2.0000E0 Volume 6.2832E0
...
Height 19.000E0 Volume 59.690E0
Height 20.000E0 Volume 62.832E0

minforth

2024-08-30 20:32:51 UTC

Two classic answers:
use DO..LOOPs to hide away loop indices
use locals if you have too many parameters
(some technical/physical formulas are difficult
or impossible to factorise into smaller words
which would otherwise be the classic Forth mantra)

BuzzMcCool

2024-08-31 06:00:50 UTC

Post by minforth
use DO..LOOPs to hide away loop indices
use locals if you have too many parameters

I hadn't thought about using locals. Thanks for the suggestion.

Buzz McCool

2024-09-02 16:03:47 UTC

Post by minforth
use locals if you have too many parameters

I like this quite a bit. Tell me if I like it too much.

: CylVolLoop {: W: StartHeight W: FinalHeight F: Radius -- Tabular Output :}
cr ." Radius " Radius fe.
StartHeight
begin dup FinalHeight <=
while
dup
s>f
fdup
cr ." Height " fe.
Radius
VolOfCyl
." Volume " fe.
1 +
repeat
drop
cr ;

17 20 1.0e CylVolLoop

Radius 1.0000E0
Height 17.000E0 Volume 53.407E0
Height 18.000E0 Volume 56.549E0
Height 19.000E0 Volume 59.690E0
Height 20.000E0 Volume 62.832E0

Buzz McCool

2024-09-03 05:53:54 UTC

...

\ Without locals...
: CylVolLoop ( StartHeight FinalHeight Radius -- )
cr ." Radius " fdup fe.
swap ( FinalHeight Height)
begin 2dup >= while
dup s>f fdup cr ." Height " fe.
fover ( Height Radius) VolOfCyl ." Volume " fe.
1+
repeat 2drop fdrop
cr ;
see CylVolLoop
...
( 148 bytes, 27 instructions )

Nice. I will study your technique.

dxf

2024-09-03 07:27:47 UTC

Post by Buzz McCool
...

\ Without locals...
: CylVolLoop ( StartHeight FinalHeight Radius -- )
   cr ." Radius " fdup fe.
   swap ( FinalHeight Height)
   begin 2dup >= while
     dup s>f fdup cr ." Height " fe.
     fover ( Height Radius) VolOfCyl ." Volume " fe.
     1+
   repeat 2drop fdrop
   cr ;
see CylVolLoop
...
( 148 bytes, 27 instructions )

Nice. I will study your technique.

Efficient use of the stack is Moore's technique :)

dxf

2024-09-03 01:23:20 UTC

Post by Buzz McCool

Post by minforth
use locals if you have too many parameters

Under VFX Forth:

see CylVolLoop
...
( 193 bytes, 39 instructions )

\ Without locals...

: CylVolLoop ( StartHeight FinalHeight Radius -- )
cr ." Radius " fdup fe.
swap ( FinalHeight Height)
begin 2dup >= while
dup s>f fdup cr ." Height " fe.
fover ( Height Radius) VolOfCyl ." Volume " fe.
1+
repeat 2drop fdrop
cr ;

see CylVolLoop
...
( 148 bytes, 27 instructions )

Hans Bezemer

2024-09-11 09:20:14 UTC

Post by minforth
use DO..LOOPs to hide away loop indices
use locals if you have too many parameters
(some technical/physical formulas are difficult
or impossible to factorise into smaller words
which would otherwise be the classic Forth mantra)

Tips:
- Use multiple Return Stack registers (R@, R'@, R"@);
- If parameters come in duplets or triplets, use corresponding stack
operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more
palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
E.g. SPIN ( a b c -- c b a)
STOW ( a b -- a a b)
RISE ( a b c -- b a c)

It helps you to THINK in these patterns and more easily recognize them.
It depends highly on your coding habits, so it helps to analyze your
legacy code to see if they often occur.

Hans Bezemer

minforth

2024-09-11 09:49:37 UTC

Post by Hans Bezemer
- If parameters come in duplets or triplets, use corresponding stack
operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more
palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
E.g. SPIN ( a b c -- c b a)
STOW ( a b -- a a b)
RISE ( a b c -- b a c)
It helps you to THINK in these patterns and more easily recognize them.
It depends highly on your coding habits, so it helps to analyze your
legacy code to see if they often occur.

Good advice if you can access the return stack directly.

Otherwise, for non-trivial words, it is preferable to let the compiler
recognise patterns and save your precious human time. If the compiled
code is too bad, profile and optimise it afterwards.

Hans Bezemer

2024-09-11 12:41:35 UTC

Post by minforth

Post by Hans Bezemer
- If parameters come in duplets or triplets, use corresponding stack
operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more
palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
   E.g. SPIN ( a b c -- c b a)
        STOW ( a b -- a a b)
        RISE ( a b c -- b a c)
It helps you to THINK in these patterns and more easily recognize them.
It depends highly on your coding habits, so it helps to analyze your
legacy code to see if they often occur.

Good advice if you can access the return stack directly.
Otherwise, for non-trivial words, it is preferable to let the compiler
recognise patterns and save your precious human time. If the compiled
code is too bad, profile and optimise it afterwards.

You know - in my experience these kinds of problems mostly manifest
themselves when making my library routines - the stuff you rarely touch
afterwards (and even more rarely in a fundamental way).

Putting the application components to work doesn't affect the stack in
the same way. I think there is where the "10x savings" actually are.

Again - just a hunch of mine..

Hans Bezemer

dxf

2024-09-12 04:01:10 UTC

- If parameters come in duplets or triplets, use corresponding stack operators (3DUP, 3OVER, 3DROP);
- Reorganize parameters at the *very start* of the program in a more palatable order. It saves stack juggling later on;
- Maybe a strange one, but codify stack patterns!
E.g. SPIN ( a b c -- c b a)
STOW ( a b -- a a b)
RISE ( a b c -- b a c)
It helps you to THINK in these patterns and more easily recognize them. It depends highly on your coding habits, so it helps to analyze your legacy code to see if they often occur.

swap rot 0
over swap 0
rot swap 1

dxf

2024-08-31 01:05:15 UTC

...
Does anyone have suggestions on a better approach when you have several parameters and loop counts to deal with?

I see little wrong with your example other than cosmetics - excess comments
that don't add value and missing stack parameter comment in colon definitions.

BuzzMcCool

2024-08-31 05:59:03 UTC

...
Does anyone have suggestions on a better approach when you have several parameters and loop counts to deal with?

I see little wrong with your example other than cosmetics - excess comments
that don't add value and missing stack parameter comment in colon definitions.

Thanks for the feedback. Yes I do need to work on my stack parameter
comments.

Hans Bezemer

2024-09-05 15:18:07 UTC

Post by BuzzMcCool

Post by Buzz McCool
...
Does anyone have suggestions on a better approach when you have
several parameters and loop counts to deal with?

I see little wrong with your example other than cosmetics - excess comments
that don't add value and missing stack parameter comment in colon definitions.

Thanks for the feedback. Yes I do need to work on my stack parameter
comments.

Given that the area of the circle doesn't change - why recalculate that
every time? Ok, I changed VolOfCirc a bit, but it saves me both time and
complexity. Note this only works if there is a separate FP stack. Which
is the standard nowadays.

Alternatives:
1. Change the order of parameters (float last);
2. Change the order of parameters (carnal knowledge of the size of a float);
3. Specify the radius as an integer.

: AreaOfCir fdup pi f* f* ;
: VolOfCyl f* ;

: CylVolLoop
cr ." Radius " fdup fe.
AreaOfCir 1+ swap ?do
i s>f fdup cr ." Height " fe.
fover VolOfCyl ." Volume " fe.
loop fdrop
;

Hans Bezemer

Hans Bezemer

2024-09-05 15:37:03 UTC

Post by Hans Bezemer

Post by BuzzMcCool

Post by Buzz McCool
...
Does anyone have suggestions on a better approach when you have
several parameters and loop counts to deal with?

I see little wrong with your example other than cosmetics - excess comments
that don't add value and missing stack parameter comment in colon definitions.

Thanks for the feedback. Yes I do need to work on my stack parameter
comments.

This is the same routine with a shared stack. Note I used option 3. here
- it retains the same possibilities as the original. Note this is in
4tH. F% is followed by an FP number:

include lib/fp2.4th
include lib/zenconst.4th
include 4pp/lib/float.4pp

: AreaOfCir fdup pi f* f* ;
aka f* VolOfCyl ( 4tH alias)

: CylVolLoop ( radius start end --)

Post by Hans Bezemer
r >r cr ." Radius " fdup fe.

AreaOfCir r> r> 1+ swap ?do
i s>f fdup cr ." Height " fe.
fover VolOfCyl ." Volume " fe.
loop fdrop cr
;

f% 1.2 1 20 CylVolLoop

Radius 1.E0
Height 1.E0 Volume 3.141592653589793238E0
Height 2.E0 Volume 6.283185307179586476E0
Height 3.E0 Volume 9.42477796076937971E0
...
Height 19.E0 Volume 59.69026041820607152E0
Height 20.E0 Volume 62.83185307179586476E0

Hans Bezemer

2024-09-05 15:42:07 UTC

Post by Hans Bezemer
f% 1.2 1 20 CylVolLoop
Radius 1.E0

Yeah, I copied the last test with the output of the fist test. My bad..
Sorry ;-)

Should have been: f% 1 1 20 CylVolLoop

Hans Bezemer

Buzz McCool

2024-09-06 21:03:38 UTC

Post by Hans Bezemer
Given that the area of the circle doesn't change - why recalculate that
every time?

Excellent observation.

Would you have any videos talking about Forth locals? You and dxf are
far more adept at stack manipulations than I. I'm thinking I can get a
word up and working with locals and then convert to manual stack
manipulations afterwards if necessary.

When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

Hans Bezemer

2024-09-07 12:40:41 UTC

Post by Buzz McCool

Post by Hans Bezemer
Given that the area of the circle doesn't change - why recalculate
that every time?

Oh, I talk a lot about locals: don't use them. The point is: you have
random access to locals. So I doubt very much it will help you to
uncover a smart way to do it without them. Basically any non-Forth
Algol-like language will do the job.

And that's in essence you I am opposed to them. It takes out what makes
Forth unique - and the way thinking of Forth unique.

Post by Buzz McCool
When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

I can't really tell. In 4tH (my own implementation) the use of locals
requires an external library - so it always consumes more instructions.
It also heavily depends on the style and the skill of the programmer. If
you're a newbie doing a lot of stack acrobatics, I doubt it.

What bothers me most technologically is that parameters flow through the
stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Needless to say this copying, releasing and stuff takes time. Even when
you don't use locals. In all honesty I must state that this overhead is
not always translated to a diminished performance - at least not in the
tests I did.

****
TL;DR my objections are mostly based on pure architectural arguments,
rather than practicality. I also don't like Python, PHP and Perl for
those very same reasons - one because I think its paradigms are
fundamentally flawed, the second and third because of their "have we
thrown in the kitchen sink yet" mentality.

I don't think there will ever be a "Back&Forth" episode on locals -
frankly, because - apart from some demonstrations - there is only one
single, ported program that uses locals in my repository. How can you
teach if you never used them yourself?
****

Note that 4tH features R@, R'@ and R"@ which can server very
conveniently as "local variables" - provided you leave the Return Stack
alone. I learned that trick from the programmer of the FIG editor.

See:
https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/lib/gcircle.4th
for a nice example of that one.

Hans Bezemer

Paul Rubin

2024-09-10 11:26:51 UTC

Post by Hans Bezemer
What bothers me most technologically is that parameters flow through
the stack undisturbed. You break that paradigm when using locals. With
locals you *HAVE TO* create some kind of stack frame that you have to
destroy when you exit.

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals. Simple
implementations of locals put them in the return stack too.
"Destroying" the stack frame just means adjusting RP when the function
exits. Usually a single instruction.

Post by Hans Bezemer
Needless to say this copying, releasing and stuff takes time.

Similar to DUP (copy) or DROP (release).

Post by Hans Bezemer
In all honesty I must state that this overhead is not always
translated to a diminished performance

Right, I don't think one can assert a performance hit without
measurements supporting the idea.

Post by Hans Bezemer
TL;DR my objections are mostly based on pure architectural arguments,
rather than practicality.

Sure, that's reasonable, it's a matter of what you prefer. That's
harder to take issue with than claims about performance.

Post by Hans Bezemer
I also don't like Python, PHP and Perl for those very same reasons -

Those are at a totally different level than Forth, in terms of layers of
implementation and runtime libraries, overhead, etc. It's better to
compare to something like C, or a hypothetical cleaned up version of C,
or even to Forth with locals ;).

dxf

2024-09-10 13:19:29 UTC

In forth the programmer uses the return stack as a temporary holder. Not
so locals which spill all input to the return stack and then shuffle these
to/from the parameter stack. The latter is akin to a novice programmer who
uses too many variables.

dxf

2024-09-11 02:03:05 UTC

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals.

Looking at an application with 154 colon definitions, only 2 were found
to use the return stack for temporary storage. Even I was surprised :)

dxf

2024-09-11 04:32:36 UTC

Forth programs very frequently end up juggling parameters and other data
to and from the return stack, instead of using locals.

Looking at an application with 154 colon definitions, only 2 were found
to use the return stack for temporary storage. Even I was surprised :)

From the same app:

dup 54
drop 29
swap 22
over 16
2drop 9
rot 8
2dup 3

Post by dxf
r 2

r> 2
2swap 1
2nip 1
locals 0

The easiest stack operations (DUP DROP) account for most. SWAP averaged
1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a problem in forth?
It doesn't appear to be.

Paul Rubin

2024-09-12 06:51:00 UTC

Looking at an application with 154 colon definitions...

The easiest stack operations (DUP DROP) account for most.

Is the code for this app available?

Post by dxf
SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
problem in forth? It doesn't appear to be.

The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction
inversion (with a smart compiler, the data ends up in registers that
could be named by locals) or they are stack traffic whose cost has to be
compared with the cost of indexed references to locals in the return
stack. I'd agree that they aren't necessary "juggling" which evokes
permuting stuff in the stack outside the usual FIFO order. That does
happpen a little bit though, with OVER, ROT, etc.

dxf

2024-09-12 08:21:43 UTC

Looking at an application with 154 colon definitions...

The easiest stack operations (DUP DROP) account for most.

Is the code for this app available?

Previously posted. You may have seen it.

https://pastebin.com/2xcRSbQW

Post by dxf
SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
problem in forth? It doesn't appear to be.

If a cost, it's one the programmer can keep to minimum. With locals there's
an upfront cost that can't be avoided. Using registers is appealing until
one realizes a call to an external function necessitates placing it back on
the stack. Costs multiply in the face of many small functions. Moore touches
on this in one of his speeches:

"I keep asking that question. What is Forth? Forth is highly factored code.
I don't know anything else to say except that Forth is definitions. If you
have a lot of small definitions you are writing Forth. In order to write a
lot of small definitions you have to have a stack. Stacks are not popular.
Its strange to me that they are not. There is a just lot of pressure from
vested interests that don't like stacks, they like registers. Stacks are not
a solve all problems concept but they are very very useful, especially for
information hiding and you have to have two of them." - Chuck Moore 1999

minforth

2024-09-12 09:08:20 UTC

Post by dxf
If a cost, it's one the programmer can keep to minimum. With locals there's
an upfront cost that can't be avoided. Using registers is appealing until
one realizes a call to an external function necessitates placing it back on
the stack. Costs multiply in the face of many small functions.

This is history (or your archaic compiler). Modern compilers try to pass
most parameters through registers.

https://langdev.stackexchange.com/questions/2584/are-modern-compilers-passing-parameters-in-registers-instead-of-on-the-stack

mhx

2024-09-12 10:11:36 UTC

Post by minforth
This is history (or your archaic compiler). Modern compilers try to pass
most parameters through registers.

The rules are very complicated, though. One has to account for there
being
too many parameters, for different architectures with different register
assignments, for integer and floating-point type parameters, and under
some
circumstances both the registers *and* the stack must be used, where
some
extra 'working space' may, or may not, be needed.

I was very happy when it finally worked on all of our target OSes.

-marcel

minforth

2024-09-12 10:31:44 UTC

I can well imagine that. Some wheels are particularly difficult
to reinvent. For desktop systems, it can therefore make sense
to use an IR (e.g. LLVM or WASM, or simply C) and use the
optimisation functions of proven compilers for this IR.

Sometimes a much simpler solution: use code inlining.

Anton Ertl

2024-09-12 10:19:03 UTC

Post by dxf
Using registers is appealing until
one realizes a call to an external function necessitates placing it back on
the stack.

Not if the stack item does not live across the call. And even if it
lives across the call and cannot be placed in a callee-saved register,
the save before and restore after the call is amortized typically
across more than one register access on each side of the call.

Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Post by dxf
Costs multiply in the face of many small functions.

--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net

minforth

2024-09-13 07:56:37 UTC

Post by Anton Ertl
Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Post by dxf
Costs multiply in the face of many small functions.

Moore talked about registers. It's worth repeating for those who may be
new
to forth.
"But such registers raises the question of local variables. There is a
lot of
discussion about local variables. That is another aspect of your
application
where you can save 100% of the code. I remain adamant that local
variables
are not only useless, they are harmful. If you are writing code that
needs
them you are writing, non-optimal code" - Chuck Moore 1999

The only thing that can be deduced from this is that back in 1999
this was Moore's opinion in the specific context of his work.

Besides, the world has changed a wee bit since then...

dxf

2024-09-13 09:47:46 UTC

Post by minforth

Post by Anton Ertl
Register allocation is one of the most effective optimizations in
compilers. That's also true of Forth.

Post by dxf
Costs multiply in the face of many small functions.

The only thing that can be deduced from this is that back in 1999
this was Moore's opinion in the specific context of his work.
Besides, the world has changed a wee bit since then...

Claims made in respect of locals in forth - ease of use, better performance
through less 'stack juggling', better readability/maintainability - were all
made in the 1980's. What has changed? Forthers today are more willing to
believe, to accept the word of authority, lack the interest to discover the
truth for themselves? If so, that would be a pity.

Paul Rubin

2024-09-13 10:38:51 UTC

"I remain adamant that local variables are not only useless, they
are harmful. If you are writing code that needs them you are
writing, non-optimal code" - Chuck Moore 1999 ...

Claims made in respect of locals in forth - ease of use, better
performance through less 'stack juggling', better
readability/maintainability - were all made in the 1980's. What has
changed? Forthers today are more willing to believe, to accept the
word of authority, lack the interest to discover the truth for
themselves?

Is avoiding locals because of the Chuck Moore quote not an example of
accepting the word of authority? And how often do even you care whether
your code is optimal? It's likely difficult to get any interpreted
Forth code to run at better than 1/5th the speed of assembly code. So
if optimization is your main concern, why use Forth to begin with?

I would say that the claim of better performance from locals depends on
the implementation and in any case has to be scrutinized if it matters,
but even if there's a performance loss, that might be an acceptable
trade if the programmer finds offsetting gains in the other areas.

My main programming language for random hacking is Python, which is
possibly 10x slower than interpreted Forth or 50x slower than compiled
Forth or C. Yet it usually doesn't matter unless I'm trying to do
something unusually compute intensive. Once the program is fast enough
to not be annoying to use, I don't need to optimize it more.

Jan Coombs

2024-09-13 12:07:32 UTC

On Fri, 13 Sep 2024 03:38:51 -0700

Post by Paul Rubin
I would say that the claim of better performance from locals depends
on the implementation[...]

Absolutely. As Chucks prime target of interest (hardware) uses LIFO
registers for stacks, only the top top one, or so, R stack items could
be used for restricted local storage (which is also common practice).

I accept that locals are useful, and would like to see hardware stack
engine implementations that support this better while retaining the
performance advantage of a stack cache implemented as LIFO registers
rather than in RAM.

Jan Coombs
--

Anton Ertl

2024-09-13 17:59:27 UTC

Post by Jan Coombs
Absolutely. As Chucks prime target of interest (hardware) uses LIFO
registers for stacks, only the top top one, or so, R stack items could
be used for restricted local storage (which is also common practice).
I accept that locals are useful, and would like to see hardware stack
engine implementations that support this better while retaining the
performance advantage of a stack cache implemented as LIFO registers
rather than in RAM.

AFAIK Chuck Moore implements the stack as SRAM indexed with his stack
pointer; maybe the stack pointer is a rotating shift register with
only one bit set, don't remember.

He also uses an A register in addition to R and the data TOS last I
looked. So much for Chuck Moore denouncing registers. When he
introduced A, some people played with the idea to add A and possibly
more registers to Forth.

- anton

dxf

2024-09-13 15:12:13 UTC

"I remain adamant that local variables are not only useless, they
are harmful. If you are writing code that needs them you are
writing, non-optimal code" - Chuck Moore 1999 ...

Claims made in respect of locals in forth - ease of use, better
performance through less 'stack juggling', better
readability/maintainability - were all made in the 1980's. What has
changed? Forthers today are more willing to believe, to accept the
word of authority, lack the interest to discover the truth for
themselves?

Is avoiding locals because of the Chuck Moore quote not an example of
accepting the word of authority?

Or I've yet to hear a convincing argument from the locals authorities :)

You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

Paul Rubin

2024-09-14 08:56:20 UTC

Post by dxf
You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

: EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;

could be written:

: EMITS {: n char -- :} n 0 ?do char emit loop ;

dxf

2024-09-14 11:56:41 UTC

Post by dxf
You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

: EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;
: EMITS {: n char -- :} n 0 ?do char emit loop ;

Compiling under DX-Forth resulted in a code size of 23 and 26 bytes
respectively. Under VFX ...

( 71 bytes, 18 instructions )

( 102 bytes, 28 instructions )

Not only were you able to read forth code, the result was more efficient.
Perhaps locals in forth were meant to be clever? That would explain the
interest however it's high price to pay.

Paul Rubin

2024-09-14 16:10:58 UTC

Post by dxf
Compiling under DX-Forth resulted in a code size of 23 and 26 bytes
respectively. Under VFX ...

I can't help it if those compilers generate worse code for the locals
version. Can you conveniently try lxf?

Post by dxf
Not only were you able to read forth code, the result was more
efficient.

Sometimes it isn't too hard to read, sometimes it takes head scratching,
and sometimes I can't make any sense of it. The function Anton posted
was an example that didn't make sense. I remember thinking I might sit
down and try to figure it out to rewrite it, but it doesn't seem worth
the effort.

Anyway, if efficiency was important for that example, I'd use CODE.

dxf

2024-09-15 05:17:20 UTC

Post by dxf
Compiling under DX-Forth resulted in a code size of 23 and 26 bytes
respectively. Under VFX ...

I can't help it if those compilers generate worse code for the locals
version. Can you conveniently try lxf?

Windows NT/Forth (32 bit):

( 67 bytes, 19 instructions )
( 87 bytes, 24 instructions )

Post by dxf
Not only were you able to read forth code, the result was more efficient.

It would be no different were locals used. It would still require one to
sit down and figure out what the code did. The more experienced one is in
the language the easier it is.

Going back to the EMITS example:

- despite lack of comments you quickly deduced what it did
- stack operations were few and simple and still you didn't like it
- your ideal is that every stack operation should go, which is what
you did

If one takes from forth that which makes it efficient, then one takes away
its reason for existence. Unfortunately for forth, this is what locals
users are doing, whether they're aware of it or not.

Post by Paul Rubin
Anyway, if efficiency was important for that example, I'd use CODE.

In other words forth is not important to you. I understand. You've stated
Python is your language of preference. Forth is mine and I'll program it
the best way I know how.

Paul Rubin

2024-09-15 16:52:24 UTC

Post by dxf
- despite lack of comments you quickly deduced what it did
- stack operations were few and simple and still you didn't like it
- your ideal is that every stack operation should go, which is what
you did

It was the first word in the program that used any stack operations at
all. I saw that it was more concise and imho more readable without
them. Other words there were much harder to read.

Post by dxf
If one takes from forth that which makes it efficient, then one takes away
its reason for existence. Unfortunately for forth, this is what locals
users are doing, whether they're aware of it or not.

I'm not persuaded that the stack ops make Forth efficient. Certainly
not as much as advanced compilers do, and yet one of the big attractions
of Forth has been very simple interpreters.

On my x86-64 laptop, gcc -c -S -Os on

void emit(char);
void emits(char c, int n) {
while (n-- > 0) emit(c);
}

gives me 27 bytes, 15 instructions, beating all of the Forth examples.
Several of the 14 instructions seem related to passing parameters in
registers. Passing on the stack like in old fashioned systems would
save a few more, at the expense of some speed. So if I want efficiency,
I should use C.

Post by Paul Rubin
Anyway, if efficiency was important for that example, I'd use CODE.

In other words forth is not important to you.

I would say efficiency is usually not very important to me, whether in
forth or any other language. It's the usual story of programs having
hot spots. Aim for efficiency in the hot spots and readability and ease
of implementation everywhere else.

Also, you define "forth" as using stack ops instead of locals. I don't
define it that way. Forth with locals is still Forth. They are in the
standard after all.

dxf

2024-09-16 02:46:43 UTC

It was the first word in the program that used any stack operations at
all. I saw that it was more concise and imho more readable without
them. Other words there were much harder to read.

I'm not persuaded that the stack ops make Forth efficient.

That's been the evidence thus far.

Post by Paul Rubin
Certainly
not as much as advanced compilers do, and yet one of the big attractions
of Forth has been very simple interpreters.
On my x86-64 laptop, gcc -c -S -Os on
void emit(char);
void emits(char c, int n) {
while (n-- > 0) emit(c);
}
gives me 27 bytes, 15 instructions, beating all of the Forth examples.
Several of the 14 instructions seem related to passing parameters in
registers. Passing on the stack like in old fashioned systems would
save a few more, at the expense of some speed. So if I want efficiency,
I should use C.

Yes - if you want efficiency with locals use C since C is built upon a
locals paradigm. Also modern cpu's are optimized for the likes of C.

But just because C can beat forth on a benchmark is no reason to dismiss
either Forth or efficient programming. The weak links are the programmer
and the tools he's given. All I ever seem to hear about other languages
is how they make life easy for the programmer. And this is what some are
trying to bring to forth. To hell with what they offer I say. The universe
gave me a brain. I intend to use it.

Post by Paul Rubin
Anyway, if efficiency was important for that example, I'd use CODE.

In other words forth is not important to you.

I would say efficiency is usually not very important to me, whether in
forth or any other language. It's the usual story of programs having
hot spots. Aim for efficiency in the hot spots and readability and ease
of implementation everywhere else.
Also, you define "forth" as using stack ops instead of locals. I don't
define it that way. Forth with locals is still Forth. They are in the
standard after all.

I don't believe in religion - the priests, the holy books, the promises.
I'll take what is and make the best of it.

a***@spenarnc.xs4all.nl

2024-09-15 09:14:53 UTC

Post by dxf
You have the source to my app. Perhaps you can nominate where locals
could have been used to better effect.

: EMITS ( n char -- ) swap 0 ?do dup emit loop drop ;
: EMITS {: n char -- :} n 0 ?do char emit loop ;

I think TYPE should be the primitive and EMIT should
be handle a 1 char string.

: EMIT DSP@ 1 TYPE DROP ;

Imagine that you have concurrent tasks and one will write
in red, the other in blue. You could lock up the terminal
with undefined escape sequence.

Groetjes Albert

--
Temu exploits Christians: (Disclaimer, only 10 apostles)
Last Supper Acrylic Suncatcher - 15Cm Round Stained Glass- Style Wall
Art For Home, Office And Garden Decor - Perfect For Windows, Bars,
And Gifts For Friends Family And Colleagues.

a***@spenarnc.xs4all.nl

2024-09-13 11:07:32 UTC

Post by minforth

Post by Anton Ertl
Register allocation is one of the most effective optimizations in
compilers.Â That's also true of Forth.

Post by dxf
Costs multiply in the face of many small functions.

Moore talked about registers.Â It's worth repeating for those who may be
new
to forth.
"But such registers raises the question of local variables.Â There is a
lot of
Â discussion about local variables.Â That is another aspect of your
application
Â where you can save 100% of the code.Â I remain adamant that local
variables
Â are not only useless, they are harmful.Â If you are writing code that
needs
Â them you are writing, non-optimal code" - Chuck Moore 1999

The only thing that can be deduced from this is that back in 1999
this was Moore's opinion in the specific context of his work.
Besides, the world has changed a wee bit since then...

I object to locals because it introduce a superfluous extra concept.
It is foreign to a stack oriented language.
Also there are numerous conflicting notations, and giving a name to a
single cell, isn't sufficient. You need not local doubles, floats and
structures.
There are people fond of their information hiding aspect, that can
easily be done with normal data and an addition like marking
some words private.
The remaining argument is re-entrancy, an overrated argument.

I am also fond of Algol68/go. A different end of the spectrum,
but it has a common feature that Forth has: consistency.
Local variables break that.

I don't take Moore's word for gospel, but I pay attention, because
he is an accomplished individual.

Groetjes Albert

Anton Ertl

2024-09-13 18:07:34 UTC

Post by dxf
Claims made in respect of locals in forth - ease of use, better performance
through less 'stack juggling', better readability/maintainability - were all
made in the 1980's.

Where can I find claims about better performance? All I have read is
claims about worse performance.

Post by dxf
What has changed? Forthers today are more willing to
believe, to accept the word of authority

Is that why you cite Chuck Moore on locals rather than arguing from
facts?

- anton

dxf

2024-09-14 02:48:45 UTC

Post by dxf
Claims made in respect of locals in forth - ease of use, better performance
through less 'stack juggling', better readability/maintainability - were all
made in the 1980's.

Where can I find claims about better performance? All I have read is
claims about worse performance.

'Eliminate stack juggling' sounds like an argument for better performance.
It's a catch cry that's become synonymous with locals. Identify something
wrong with forth and introduce a solution is the gameplay.

Post by dxf
What has changed? Forthers today are more willing to
believe, to accept the word of authority

Is that why you cite Chuck Moore on locals rather than arguing from
facts?

The facts AFAICT is locals are an appeal to prejudice. If locals were a bona-
fide extension it ought to be crystal clear when to apply them and when not.
Vague statements about readability and maintainability don't cut it. The fact
is locals challenge and contradict forth - why I'm vitally interested in getting
at the truth of it. The best way I knew of doing that is see whether I needed
locals in practice. When the result is good forth coding can stand on its own,
why shouldn't I quote Moore.

minforth

2024-09-14 05:47:11 UTC

Post by dxf
The facts AFAICT is locals are an appeal to prejudice.

This is one of the best sentences ever uttered on this forum! :-)

Anton Ertl

2024-09-14 06:19:52 UTC

Post by Anton Ertl
Where can I find claims about better performance? All I have read is
claims about worse performance.

'Eliminate stack juggling' sounds like an argument for better performance.

Not to me. To me it sounds like a statement about the ease of writing
and reading the code.

The performance of locals vs. stack juggling depends on the
implementation. I know no implementation that performs register
allocation of locals or stack items (except the TOS) to registers
across basic block boundaries. This seems to hurt code with locals
more than code that keeps everything on the stacks. Here's the data
from an earlier posting <***@mips.complang.tuwien.ac.at>,
now including data from iForth:

locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)

The data from iForth is the outlier here, let's look at the code:

Source code:
defer dummy
: z" [char] " parse 2drop postpone dummy ; immediate
defer zformat
defer z+
defer >name
defer error

: VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid
pindex paddr
ELSE \ Index is invalid
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;

: VICHECK2 ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
Z" Invalid index " 2 PICK ZFORMAT Z+
Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
Z" length " Z+ OVER @ ZFORMAT Z+
ERROR
NIP 0 SWAP \ Use zeroth index
THEN ;

One difference is that VICHECK2 does not just replace the locals with
stack stuff and eliminate the first branch of the IF, but also
replaces ">NAME 1+" with "CELL- @".

Disassembled code:
VICHECK1 VICHECK2
pop rbx pop rbx
lea rsi, [rsi #-16 +] qword mov rdi, [rsp] qword
mov [esi] dword, rbx push rbx
pop rbx push rdi
lea rsi, [rsi #-16 +] qword push 0 b#
mov [esi] dword, rbx mov rbx, [rsp #16 +] qword
mov rbx, [rsi #16 +] qword pop rdi
mov rbx, [rbx] qword mov rax, rdi
mov rdi, [rsi] qword sub rax, [rbx] qword
cmp rbx, rdi neg rax
jbe $10227337 offset NEAR pop rbx
push [rsi] qword sub rbx, rdi
push [rsi #16 +] qword cmp rax, rbx
jmp $10227395 offset NEAR seta bl
call $10226600 qword-offset movzx rbx, bl
push [rsi] qword neg rbx
call $10226E90 qword-offset cmp rbx, 0 b#
call $10226EB0 qword-offset jne $10227465 offset NEAR
call $10226600 qword-offset call $10226600 qword-offset
call $10226EB0 qword-offset mov rbx, [rsp #16 +] qword
push [rsi #16 +] qword push rbx
call $10226ED0 qword-offset call $10226E90 qword-offset
pop rbx call $10226EB0 qword-offset
lea rbx, [rbx 1 +] qword call $10226600 qword-offset
push rbx call $10226EB0 qword-offset
call $10226EB0 qword-offset pop rbx
call $10226600 qword-offset mov rdi, [rsp] qword
call $10226EB0 qword-offset push rbx
mov rbx, [rsi #16 +] qword push [rdi -8 +] qword
push [rbx] qword call $10226EB0 qword-offset
call $10226E90 qword-offset call $10226600 qword-offset
call $10226EB0 qword-offset call $10226EB0 qword-offset
call $10226EF0 qword-offset pop rbx
push 0 b# mov rdi, [rsp] qword
push [rsi #16 +] qword push rbx
add rsi, #32 b# push [rdi] qword
; call $10226E90 qword-offset
call $10226EB0 qword-offset
call $10226EF0 qword-offset
pop rbx
pop rdi
mov rdi, 0 d#
mov rcx, rdi
push rcx
push rbx
;

iForth 5.1-mini does not even keep the TOS in a register on basic
block boundaries, which results in pops and pushes at all the
boundaries, especially for the stack-only code. However, in the
actual application (where Z", ZFORMAT etc. don't compile as deferred
words) it would probably inline many of these words which might result
in better code for the stack variant. It does not keep locals in
stack items, either, but accesses them in memory through a separate
stack pointer.

The code at the start of VICHECK2 does not suffer from basic block
boundaries, yet makes less use of registers than I expected. By
contrast, in VICHECK1 iforth discovers that "0 paddr @ within" is
equivalent to "paddr @ u<", while for "0 2 pick @ within" it fails to
make the equivalent discovery.

- anton

dxf

2024-09-14 08:40:53 UTC

Post by Anton Ertl
Where can I find claims about better performance? All I have read is
claims about worse performance.

'Eliminate stack juggling' sounds like an argument for better performance.

Not to me. To me it sounds like a statement about the ease of writing
and reading the code.
The performance of locals vs. stack juggling depends on the
implementation.
...

Surely you mean locals vs. forth. The easiest way to achieve performance
in forth is making your stack operations efficient. 'Stack juggling' is
a visual cue that it's not. I'm sorry that you feel forth isn't readable.

Stephen Pelc

2024-09-15 15:04:11 UTC

Post by Anton Ertl
locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)
163 175 iforth-5.1 mini (AMD64)

There are design decisions within locals that can impact optimisation.
The design of locals in VFX was influenced by Don Colburn's Forth's
and by a desire to use locals to simplify source code when interfacing
to a host operating system. Many operating systems return data
to the caller by passing the address of a variable/buffer as an input
parameter. Locals that can have an accessible address make such
code much easier to read and write. The example below comes from
early system access code in VFX (see kernel/386Lin/syspatch.fth).
The locals design dates from long before ANS.

$541B equ FIONREAD

: (OS_key?) { | nread[ cell ] -- flag }
?PrepTerm nread[ off
nread[ FIONREAD stdin @ dll_ioctl @ 3 nxcall -1 = if
0 \ Error return from ioctl
else
nread[ @ 0<>
then
;

: (OS_Key) \ -- key ; SFP003
{ | iobuff[ cell ] -- char }
?PrepTerm
1 iobuff[ stdin @ dll_ReadFile @ 3 nxcall drop
iobuff[ c@
;

Code such as this has been around for a very long time and the use
of addresses of locals, and of local buffers, has proven itself over
time. Yes, we could put in a great effort to improve the performance
of locals, but this is Forth and there are other optimisations that may
produce bigger changes to application performance. In the last
decade or so there has been very little customer demand for
faster code. However, higher level source code has been much
in demand. An example is Nick Nelson's value flavoured structures,
which are of particular merit when converting code from 32 bit to
64 bit host Forths.

Just because many of the Forth applications visible to the Forth
community now run on CPUs with 16 or 32 address registers
does not mean that all systems can implement the compiler
techniques required for high-performance locals.

I can buy a lot of CPU cycles for the cost of one day of programmer
time. I am reminded when looking at locals that a client's Forth
engine is currently at 4GHz on a 12nm process. The performance
was detuned to 4GHz becuase the machine was more than fast
enough.

Stephen

--
Stephen Pelc, ***@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com
MPE website
http://www.vfxforth.com/downloads/VfxCommunity/
downloads

Anton Ertl

2024-09-15 16:16:34 UTC

Post by Stephen Pelc

Gforth has had variable-flavoured locals from the start, and
implemented VFX's local-buffer syntax some time ago without problems,
so Gforth's design decisions are obviously compatible with these
requirements.

Now Gforth's numbers above are the worst of all Forth systems, so why
would Gforth be relevant? The native code for locals by iForth seems
to be very much in the same spirit: A separate locals stack, and
locals are accessed relative to the locals-stack pointer; and iForth
has the best locals code size of all (but looking at the VFX code, my
guess is that this happens to be in the present case mainly because
iForth uses RSP for the data stack and some other stack for the return
stack). Actually, even with your approach of keeping the locals on
the return stack, and having a separate locals-frame pointer, I don't
see why the locals code should be worse. But looking at the start of
the VFX64 code for VICHECK1, there is a bit of superfluous work:

: VICHECK1 {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid

VICHECK1
( 0050A460 488BD4 ) MOV RDX, RSP
( 0050A463 48FF7500 ) PUSH QWORD [RBP]
( 0050A467 53 ) PUSH RBX
( 0050A468 52 ) PUSH RDX
( 0050A469 57 ) PUSH RDI
( 0050A46A 488BFC ) MOV RDI, RSP
( 0050A46D 4881EC00000000 ) SUB RSP, # 00000000
( 0050A474 488B5D08 ) MOV RBX, [RBP+08]
( 0050A478 488D6D10 ) LEA RBP, [RBP+10]
( 0050A47C 488B5710 ) MOV RDX, [RDI+10]
( 0050A480 488B12 ) MOV RDX, 0 [RDX]
( 0050A483 B900000000 ) MOV ECX, # 00000000
( 0050A488 482BD1 ) SUB RDX, RCX
( 0050A48B 488B4718 ) MOV RAX, [RDI+18]
( 0050A48F 482BC1 ) SUB RAX, RCX
( 0050A492 483BC2 ) CMP RAX, RDX
( 0050A495 0F8319000000 ) JNB/AE 0050A4B4

It's not clear to me why you push so much on the return stack at the
start, instead of just the two values pindex and paddr (which you do
in 0050A463 and 0050A467). Ok, you also push old locals-frame pointer
RDI in 0050A469, which is a result of having the locals on the return
stack instead of in a separate stack, but why push the old return
stack pointer? You know the size of your locals, just adjust RSP by
that much in the end.

The instruction at 0050A46D seems superfluous. My guess is that it's
there for the possible | part in the locals definition.

The next two instructions refill the TOS register RBX and adjust the
data stack pointer RBP. That completes the code for the locals
definition. From then on locals are loaded from memory, as
in iforth. Let's also inspect the end:

0 paddr \ Use zeroth index
THEN ;

( 0050A535 488D6DF0 ) LEA RBP, [RBP+-10]
( 0050A539 48C7450000000000 ) MOV QWord [RBP], # 00000000
( 0050A541 48895D08 ) MOV [RBP+08], RBX
( 0050A545 488B5F10 ) MOV RBX, [RDI+10]
( 0050A549 488B6708 ) MOV RSP, [RDI+08]
( 0050A54D 488B3F ) MOV RDI, 0 [RDI]
( 0050A550 C3 ) RET/NEXT

The THEN is right before 0050A549. The code before THEN pushes 0 and paddr
on the data stack, and stores the former TOS in memory before loading
the new TOS. The three instructions after the THEN restore the return
stack and locals-frame pointer and return.

So there is a little bit that can be done without much effort, but not
much.

I always thought that a separate locals stack is a thing I did in
Gforth out of lazyness, and pay for it by having to maintain a
separate stack pointer, but it turns out that with locals on the
return stack, you still need an extra register for locals in memory,
and you spend additional overhead.

Post by Stephen Pelc
In the last
decade or so there has been very little customer demand for
faster code.

See below.

Post by Stephen Pelc
However, higher level source code has been much
in demand. An example is Nick Nelson's value flavoured structures,
which are of particular merit when converting code from 32 bit to
64 bit host Forths.

Gforth has worked on 64-bit hosts since early 1996, and I found that
Forth code tends to have fewer portability problems between 32-bit and
64-bit platforms than C code, and that's not just my code, the
applications in appbench and many others are also quite portable.

A major merit for value-flavoured structures is that you can change
the field size (e.g, from 1 byte to 2 bytes or vice versa) without
changing all the code accessing those fields. That's independent of
cell size.

Post by Stephen Pelc
Just because many of the Forth applications visible to the Forth
community now run on CPUs with 16 or 32 address registers
does not mean that all systems can implement the compiler
techniques required for high-performance locals.

It's obvious that hardly any Forth system implements register
allocation of locals, with the exception being lxf, which uses an
architecture with 8 general-purpose registers (address registers
recall bad memories from the 68000 days); and for lxf, register
allocation is limited to basic blocks or less.

Post by Stephen Pelc
I can buy a lot of CPU cycles for the cost of one day of programmer
time.

Some guy called Stephen Pelc (must be a different one) recentlu posted
<vbkdu0$1v8lq$***@dont-email.me>:

|We (MPE) converted much of our TCP/IP stack not to use locals. This
|was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
|the period (say 15 years ago) were similar. Code density improved by
|about 25% and performance by about 50%.

How much time did that conversion cost? And this Stephen Pelc
suggested that Buzz McCool (and probably everyone else) should also
spend their time on avoiding and eliminating locals from their code.

I am with you here, not with the other Stephen Pelc: Programmers
should use locals liberally if it saves them time, even in the face of
slow locals implementations, because you can buy a lot of CPU cycles
for the additional programming cost of avoiding locals.

- anton

Stephen Pelc

2024-09-15 21:35:00 UTC

Post by Stephen Pelc
I can buy a lot of CPU cycles for the cost of one day of programmer
time.

Some guy called Stephen Pelc (must be a different one) recentlu posted
|We (MPE) converted much of our TCP/IP stack not to use locals. This
|was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
|the period (say 15 years ago) were similar. Code density improved by
|about 25% and performance by about 50%.
How much time did that conversion cost? And this Stephen Pelc
suggested that Buzz McCool (and probably everyone else) should also
spend their time on avoiding and eliminating locals from their code.
I am with you here, not with the other Stephen Pelc: Programmers
should use locals liberally if it saves them time, even in the face of
slow locals implementations, because you can buy a lot of CPU cycles
for the additional programming cost of avoiding locals.

What you ignore is that the constraints of embedded systems with small
alow CPUs (by comparison with desktop CPUs) are very different from
those of desktop CPUs. Converting the TCP/IP stack was driven by the
client requirement to fit a TCP/IP app into 128k/256k Flash and 16k RAM.

I would not make that trade off today.

So there's only one Stephen Pelc but two application domains.

Stephen

Paul Rubin

2024-09-15 21:45:22 UTC

Post by Stephen Pelc
I would not make that trade off today.
So there's only one Stephen Pelc but two application domains.

I wonder how much effort de-localizing the TCP/IP stack took, compared
to hypothetically updating the compiler to optimize locals more. If the
TCP/IP stack code can compile with iForth or lxf, is there a way to
compare the code size with VFX's? I can understand wanting to use VFX
for actual delivery, of course.

Stephen Pelc

2024-09-16 12:19:25 UTC

Post by Stephen Pelc
I would not make that trade off today.
So there's only one Stephen Pelc but two application domains.

On modern desktop CPUs, I would probably spend the effort on
optimising locals more. However, the ability to provide the address
of a local is essential in our world. I have not inspected our code
base to see how many uses of a local declaration of a buffer
: bah {: ... | FOO[ cell ] ... -- :}
there are compared to the use of the ADDR (address) operator
applied to a normally defined local
: bah {: ... | FOO ... -- :}
...
addr FOO

Local buffers are remarkably useful.

minforth

2024-09-16 14:37:50 UTC

Post by Stephen Pelc
Local buffers are remarkably useful.

True. In addition, to pass the address of normal locals
to other words or to external library functions
(pass-by-reference instead of pass-by-value)
I borrowed the address operator & from C, like in:

: FUNC { f: a b -- badr f: aval }
... a \ push value of a to fp-stack
... &b \ push address of b to stack
... ;

Anton Ertl

2024-09-14 12:32:07 UTC

Post by dxf
https://pastebin.com/2xcRSbQW

Post by dxf
SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
problem in forth? It doesn't appear to be.

: ARG ( n -- adr len -1 | 0 )

Post by dxf
r 0 0 cmdtail r> 0 ?do

2nip
bl skip 2dup bl scan
rot over - -rot
loop 2drop
dup if -1 end and ;

The heavy use of global variables in this program also does not
support the idea that proper usage of the stacks makes locals
unnecessary.

- anton

Ahmed

2024-09-14 14:52:59 UTC

Hi,
In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
as:

mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
(c-x)/(c-b) for b <= x < c,
0e elsewere.

defining it with locals:

: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;

But defining it without locals ????!!!!!

: tri_mf() ( f: x a b c -- mv) ....

How?

Ahmed

Anton Ertl

2024-09-14 15:08:36 UTC

Post by Ahmed
Hi,
In fuzzy logic, a triangular membership function mf(x;a,b,c) is defined
mf(x;a,b,c) = (x-a)/(b-a) for a <= x < b,
(c-x)/(c-b) for b <= x < c,
0e elsewere.
: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;
But defining it without locals ????!!!!!
: tri_mf() ( f: x a b c -- mv) ....
How?

I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
that tends to get passed around without changing it. In that case
defining it as a structure in memory and accessing its members there
might be a solution.

But OTOH, unless you see programming in Forth as a religious exercise,
why worry, as long as your solution works.

- anton

Ahmed

2024-09-14 17:13:51 UTC

Post by Anton Ertl
I wonder if the notation "mf(x;a,b,c)" indicates that a,b,c is a tuble
that tends to get passed around without changing it. In that case
defining it as a structure in memory and accessing its members there
might be a solution.

a, b and are the parameters of the membership function.
Yes, we can use structures, arrays ...

Post by Anton Ertl
But OTOH, unless you see programming in Forth as a religious exercise,
why worry, as long as your solution works.

I did it without locals as an exercise. Here it is:

Without locals:

: tri_mf: ( f: a b c )
create frot f, fswap f, f,
does> ( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
f@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over float+ ( ad_a -1|0 ad_b) ( f: x)
fdup f@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup f@ f- ( ad_a) ( f: x-a)
dup f@ ( ad_a) ( f: x-a a)
float+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
float+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
f@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over float+ ( ad_b -1|0 ad_c) ( f: x)
fdup f@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup float+ f@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup float+ ( ad_b ad_c) ( f: x-c)
swap f@ f@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: neg_big
-1e 0e 1e tri_mf: zero
0e 1e 1e309 tri_mf: pos_big

: fuzzify ( f: x)
fdup neg_big cr f.
fdup zero cr f.
pos_big cr f.
;

Examples: for x in {-10e, -1e, -0.8e, -0.5e, -0.3e, 0e, 0.2e, 0.5e,
0.7e, 1e, 20e}
-10e fuzzify and so on.

\ ---------------

With locals:
: tri_mf() { f: x f: a f: b f: c } ( f: x a b c -- mv)
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e
;

: neg_big -1e309 -1e 0e tri_mf() ;
: zero -1e 0e 1e tri_mf() ;
: pos_big 0e 1e 1e309 tri_mf() ;

: fuzzify { f: x }
x neg_big cr f.
x zero cr f.
x pos_big cr f.
;

Examples: for x in {-10e, -1e, -0.8e, -0.5e, -0.3e, 0e, 0.2e, 0.5e,
0.7e, 1e, 20e}
-10e fuzzify and so on.

I notice a great difference in readibality and simplicity when using
locals.

Using gforth under WSL (Windows Subsystem for Linux):

utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms

Ahmed

Ahmed

2024-09-14 17:43:52 UTC

Oops.
Please read micro seconds (us) instead of milli seconds (ms).

Without locals: about 18 us
with locals: about 19 us

Ahmed

Ahmed

2024-09-14 17:41:23 UTC

Post by Ahmed
utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms
Ahmed

Oops.

Please read micro seconds (us) instead of milli seconds (ms).

with locals: about 19 us
without locals: about 18 us

Ahmed

mhx

2024-09-14 18:54:46 UTC

Post by Ahmed
utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms
Ahmed

Oops.
Please read micro seconds (us) instead of milli seconds (ms).
with locals: about 19 us
without locals: about 18 us

That can't be correct.

In iForth I used dfloats instead of floats
( 4.9ns instead of 7.3ns).
Using structs is not a great idea in this case.

anew -testlocals

: tri_mf: ( f: a b c )
create frot df, fswap df, df,
does> ( F: x -- y )
( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
df@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
fdup df@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup df@ f- ( ad_a) ( f: x-a)
dup df@ ( ad_a) ( f: x-a a)
dfloat+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
dfloat+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
df@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
fdup df@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup dfloat+ df@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup dfloat+ ( ad_b ad_c) ( f: x-c)
swap df@ df@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: nol_neg_big

: (tri_mf) ( f: x a b c -- mv)
FLOCALS| c b a x |
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e ;

: loc_neg_big -1e309 -1e 0e (tri_mf) ;
: .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

: tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e nol_neg_big FDROP LOOP .timing
CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e loc_neg_big FDROP LOOP .timing ;

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.3ns/call. ok

-marcel

Ahmed

2024-09-14 19:19:25 UTC

Post by mhx
That can't be correct.

You are right.
I find with gforth:

: go 0 do -0.1e neg_big fdrop loop ;

without locals:
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.06762074 us ok for 1e8
times: (67.62 ns)

and with locals:
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09961387 us ok for
1e8 times: (99.61 ns)

I missused the timing in the previous post.
Thanks for the correction.

Post by mhx
FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.3ns/call. ok
-marcel

Ahmed

minforth

2024-09-15 06:17:18 UTC

Post by Ahmed
You are right.
: go 0 do -0.1e neg_big fdrop loop ;
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.06762074 us ok for 1e8
times: (67.62 ns)
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09961387 us ok for
1e8 times: (99.61 ns)
I missused the timing in the previous post.
Thanks for the correction.

So with gforth it's about 30 nanosecs runtime disadvantage.
IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

While the locals version was easy to code, pretty straightforward and
probably bug-free out of the box, how long did it take to code and debug
the stack juggling version?

Say 10 minutes longer. Break-even point would be around 2*10^10 runs,
and the dubious assumption that CPU time is as valuable as human time.

Ahmed

2024-09-15 07:30:24 UTC

Post by minforth
So with gforth it's about 30 nanosecs runtime disadvantage.
IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

I think you mean: if you run the code 3*10^8 times it adds up to 1 sec
disadvantage.

Post by minforth
While the locals version was easy to code, pretty straightforward and
probably bug-free out of the box, how long did it take to code and debug
the stack juggling version?

It took me several tries and corrections (and time).

Perhaps, one can factor the code in the does> part.

Ahmed

Ahmed

2024-09-15 07:35:14 UTC

Post by minforth
So with gforth it's about 30 nanosecs runtime disadvantage.
IOW if you run the code 3*10^7 times it adds up to 1 sec disadvantage.

I think you mean: if you run the code 3*10^8 times it adds up to 1 sec
disadvantage.

Oops!
You are right. 3*10^7 times running the code gives about 1 sec
disadvantage.

Ahmed

a***@spenarnc.xs4all.nl

2024-09-15 09:42:26 UTC

Post by mhx

Post by Ahmed
utime 0.1e neg_big utime d- dnegate d.
with locals: about 19 ms
without locals: about 18 ms
Ahmed

Oops.
Please read micro seconds (us) instead of milli seconds (ms).
with locals: about 19 us
without locals: about 18 us

That can't be correct.
In iForth I used dfloats instead of floats
( 4.9ns instead of 7.3ns).
Using structs is not a great idea in this case.
anew -testlocals
: tri_mf: ( f: a b c )
create frot df, fswap df, df,
does> ( F: x -- y )
( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
f>= ( ad_a -1|0) ( f: x)
over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
f< and if ( ad_a) ( f: x)
dfloat+ ( ad_b) ( f: x-a a)
f/ ( f: [x-a]/[b-a])
exit
then
dfloat+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
f>= ( ad_b -1|0) ( f: x)
over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
f< and if ( ad_b) ( f: x)
f- ( ad_b) ( f: x-c)
dup dfloat+ ( ad_b ad_c) ( f: x-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;
-1e309 -1e 0e tri_mf: nol_neg_big
: (tri_mf) ( f: x a b c -- mv)
FLOCALS| c b a x |
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e ;
: loc_neg_big -1e309 -1e 0e (tri_mf) ;
: .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;
: tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e nol_neg_big FDROP LOOP .timing
CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e loc_neg_big FDROP LOOP .timing ;

This captures the meaning of the problem not good.
Anton Ertl is right that you have to bound a b c
into something, that is more than its parts.

0E0 FDUP FDUP class triangle-function
M: a F@ M; F,
M: b F@ M; F,
M: c F@ M; F,

M: fx ( f1 -- f1 )
FDUP a f>= FDUP b f< and if a f- b a f- f/ exit then
FDUP b f>= FDUP c f< and if c FSWAP f- c b f- f/ exit then
0e M;
endclass

5E0 3E0 1E0 triangle-function orang-utan

orang-utan
2E0 fx F.
4E0 fx F.

Note that I have not introduced anything special, only classes
that you need anyway. These classes are straightforward
generalisation of the CREATE DOES> construct,minus the
awkward syntax.
Note that x is passed as it should, volatile in Forth fashion.
Passing 4 parameters is c-style.

NOTE:
These are presentation of ideas, nothing is tested.

Post by mhx
FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.3ns/call. ok
-marcel

dxf

2024-09-15 08:14:17 UTC

a, b and are the parameters of the membership function.
Yes, we can use structures, arrays ...

Post by Anton Ertl
But OTOH, unless you see programming in Forth as a religious exercise,
why worry, as long as your solution works.

: tri_mf: ( f: a b c )
    create frot f, fswap f, f,
    does>             ( ad_a)           ( f: x)
      dup fdup        ( ad_a ad_a)      ( f: x x)
      f>=             ( ad_a -1|0)      ( f: x)
      over float+     ( ad_a -1|0 ad_b) ( f: x)
      f< and if       ( ad_a)           ( f: x)
        float+        ( ad_b)           ( f: x-a a)
        f/                              ( f: [x-a]/[b-a])
        exit
      then
      float+          ( ad_b)           ( f: x)
      dup fdup        ( ad_b ad_b)      ( f: x x)
      f>=             ( ad_b -1|0)      ( f: x)
      over float+     ( ad_b -1|0 ad_c) ( f: x)
      f< and if       ( ad_b)           ( f: x)
        f-            ( ad_b)           ( f: x-c)
        dup float+    ( ad_b ad_c)      ( f: x-c)
        f/                              ( f: [x-c]/[b-c])
        exit
      then
      drop fdrop
      0e
;

That appears no better than FVALUEs ...

0e fvalue a
0e fvalue b
0e fvalue c
0e fvalue x

: tri_mf() ( f: x a b c -- mv)
to c to b to a to x
x a f>=
x b f< and if
x a f- b a f- f/ exit
then
x b f>=
x c f< and if
c x f- c b f- f/ exit
then
0e
;

Ahmed

2024-09-15 08:58:20 UTC

Post by dxf
That appears no better than FVALUEs ...
0e fvalue a
0e fvalue b
0e fvalue c
0e fvalue x
: tri_mf() ( f: x a b c -- mv)
to c to b to a to x
x a f>=
x b f< and if
x a f- b a f- f/ exit
then
x b f>=
x c f< and if
c x f- c b f- f/ exit
then
0e
;

I knew about this solution and also the use of fvariables,
I wanted tri_mf() to be used in defining for example:
neg_big, zero and pos_big like this:

: neg_big -1e309 -1e 0e tri_mf() ;
: zero -1e 0e 1e tri_mf() ;
: pos_big 0e 1e 1e309 tri_mf() ;

It is ok.

Here the fvalues a, b and c are shared between these words without
problem.

Using the same test to estimate the speed (gforth under wsl) gives about
88 ns/call.
: go 0 do -0.1e neg_big fdrop loop ; ok

utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08933806 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08499321 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08958042 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.09034804 ok

And with fvariables, the timing gives about 86 ns/call

utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08831171 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08438598 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08442013 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.08619858 ok

( with locals: 99 ns/call,
without locals and no fvalues nor fvariables: 67 ns/call) (see
previous posts)

So naming (cells, ...) ( locals, values, variables, ...) simplifies the
elaboration of the solution (code) leaving away heavy stack juggling but
with a loss in speed (not so much).

Ahmed

mhx

2024-09-15 09:58:23 UTC

This unearthed a "bug": -1e309 does not fit in a dfloat,
it prints as -Inf.

anew -testlocals

0e dfvalue a PRIVATE
0e dfvalue b PRIVATE
0e dfvalue c PRIVATE

( based on dxf's outline )
: gv_tri_mf ( f: x a b c -- mv )
to c to b to a
fdup a f>= fdup b f< and if a f- b a f- f/ exit endif
fdup b f>= fdup c f< and if c fswap f- c b f- f/ exit endif
0e ;

: gv_neg_big -1e308 ( ! ) -1e 0e gv_tri_mf ;

: tri_mf: ( f: a b c )
create frot df, fswap df, df,
does> ( F: x -- y )
( ad_a) ( f: x)
dup fdup ( ad_a ad_a) ( f: x x)
df@ ( ad_a) ( f: x x a)
f>= ( ad_a -1|0) ( f: x)
over dfloat+ ( ad_a -1|0 ad_b) ( f: x)
fdup df@ ( ad_a -1|0) ( f: x x b)
f< and if ( ad_a) ( f: x)
dup df@ f- ( ad_a) ( f: x-a)
dup df@ ( ad_a) ( f: x-a a)
dfloat+ ( ad_b) ( f: x-a a)
f@ fswap f- ( f: x-a b-a)
f/ ( f: [x-a]/[b-a])
exit
then
dfloat+ ( ad_b) ( f: x)
dup fdup ( ad_b ad_b) ( f: x x)
df@ ( ad_b) ( f: x x b)
f>= ( ad_b -1|0) ( f: x)
over dfloat+ ( ad_b -1|0 ad_c) ( f: x)
fdup df@ ( ad_b -1|0) ( f: x x c)
f< and if ( ad_b) ( f: x)
dup dfloat+ df@ ( ad_b) ( f: x c)
f- ( ad_b) ( f: x-c)
dup dfloat+ ( ad_b ad_c) ( f: x-c)
swap df@ df@ f- ( f: x-c b-c)
f/ ( f: [x-c]/[b-c])
exit
then
drop fdrop
0e
;

-1e309 -1e 0e tri_mf: nol_neg_big

: (tri_mf) ( f: x a b c -- mv)
FLOCALS| c b a x |
x a f>= x b f< and if x a f- b a f- f/ exit then
x b f>= x c f< and if c x f- c b f- f/ exit then
0e ;

: loc_neg_big -1e309 -1e 0e (tri_mf) ;

: .timing MS? S>F 1e-3 F* 1e7 F/ F.N2 ." s/call." ;

: tnb CR ." \ no locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e nol_neg_big FDROP LOOP .timing
CR ." \ locals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e loc_neg_big FDROP LOOP .timing
CR ." \ globals: " TIMER-RESET #10000000 ( 1e7 times )
0 DO -10e gv_neg_big FDROP LOOP .timing ;

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 21.4ns/call.
\ globals: 6.2ns/call. ok

Surprisingly, there is hardly a difference between no locals and
global variables. The stack juggling in tri_mf: is merely an
intellectual exercise (in this case).

-marcel

ahmed

2024-09-15 12:06:52 UTC

Post by mhx
This unearthed a "bug": -1e309 does not fit in a dfloat,
it prints as -Inf.

In practice, the universe of discourse of x is bounded [xmin, xmax].
I use normalized univers of discours [-1, 1].
So to get neg_big I just use a big value (absolute value) for the
parameter a (for example: -1e6)

-1e6 -1e 0e tri_mf: neg_big
-1e 0e 1e6 tri_mf: pos_big

and this gives: x is between -2e and 2e for example
neg_big(x) equals approximately 1 for all x less than -1.
pos_big(x) equals approximately 1 for all x greater than 1.

So I don't use 1e309 or -1e309.

Ahmed

Ahmed

2024-09-16 09:13:24 UTC

Hi,
Here is another version (no locals (flocals), no fvalues, no
fvariables).
I tried to factor the code little bit.
It gives about 81 ns/call (gforth under wsl).

: x_a_b ( f: x a b c -- x a b c x a b)
3 fpick 3 fpick 3 fpick
;

: x_b_c ( f: x a b c -- x a b c x b c)
3 fpick 2 fpick 2 fpick
;

: fwithin ( f: x r s --) ( -- -1|0)
frot ftuck
f>= f< and
;

: mv ( f: x r s -- mv)
fover f- ( f: x r s-r)
frot frot f- ( f: s-r x-r)
fswap f/
;

: 4fdrop fdrop fdrop fdrop fdrop ;

: tri_mf ( f: x a b c -- mv)
x_a_b fwithin if fdrop mv exit then
x_b_c fwithin if frot fdrop fswap mv exit then
4fdrop 0e
;

: neg_big -1e308 -1e 0e tri_mf ;
: zero -1e 0e 1e tri_mf ;
: pos_big 0e 1e 1e308 tri_mf ;

: fuzzify ( f: x)
fdup neg_big cr f.
fdup zero cr f.
pos_big cr f.
;

: go 0 do -0.1e neg_big fdrop loop ;

utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08081444 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.0806888 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08064737 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08140588 ok
utime 100000000 go utime d>f d>f f- 1e8 f/ f. 0.08233884 ok

Ahmed

mhx

2024-09-16 10:13:19 UTC

[..]
FORTH> tnb
\ no locals: 5ns/call.
\ locals: 18.2ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call. ok

This appears not to be a good idea.
The root cause is piling up too many
items on the F-stack (exceeding the
hardware FPU stack limits).

-marcel

Ahmed

2024-09-16 10:36:38 UTC

Thanks for the information.
So the best is clear.

Ahmed

dxf

2024-09-16 12:47:10 UTC

Post by mhx
[..]
FORTH> tnb
\ no locals: 5ns/call.
\ locals: 18.2ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call. ok
This appears not to be a good idea.
The root cause is piling up too many
items on the F-stack (exceeding the
hardware FPU stack limits).

FVALUEs may be the way to go for hardware stack.
Is this any better?

: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

Ahmed

2024-09-16 13:21:19 UTC

FVALUEs may be the way to go for hardware stack.
Is this any better?
: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

Your solution gives the best speed compared to others. With gforth under
wsl, I find 59ns/call

Here is the code:
\ here is your definition

: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

\ and then the code
: neg_big -1e308 -1e 0e tri_mf ;
: zero -1e 0e 1e tri_mf ;
: pos_big 0e 1e 1e308 tri_mf ;

: fuzzify ( f: x)
fdup neg_big cr f.
fdup zero cr f.
pos_big cr f.
;

: go 0 do -0.1e neg_big fdrop loop ;

utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05871598 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05926772 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05896149 ok
utime 100000000 go utime d>f d>f f- 1e-8 f* f. 0.05899284 ok

Ahmed

mhx

2024-09-16 13:33:53 UTC

On Mon, 16 Sep 2024 12:47:10 +0000, dxf wrote:

[..]

Post by dxf
FVALUEs may be the way to go for hardware stack.
Is this any better?
: tri_mf ( f: x a b c -- mv)
3 fpick ( x) 3 fpick ( x a) f>=
3 fpick ( x) 2 fpick ( x b) f< and if
fdrop \ x a b
frot 2 fpick f- \ a b x-a
fswap frot f- \ x-a b-a
f/ exit
then
3 fpick ( x) 2 fpick ( x b) f>=
3 fpick ( x) 1 fpick ( x c) f< and if
frot fdrop \ x b c
frot fover fswap f- \ b c c-x
fswap frot f- \ c-x c-b
f/ exit
then
fdrop fdrop fdrop fdrop 0e
;

No, it (no locals3) is worse. FPICK is a
problem for iForth because in principle
there can be many values on the FPU stack.
The easy way out was to flush to memory
(assuming real Forthers would balk at
PICK and ROLL anyway).

The title of this thread is quite
appropriate: don't pile on the stack,
don't try to grow it, sparingly re-arrange
and then consume items with operators
that do real work.

FORTH> tnb
\ no locals: 4.9ns/call.
\ locals: 18.3ns/call.
\ globals: 6ns/call.
\ no locals2: 21.9ns/call.
\ no locals3: 23.5ns/call. ok

-marcel

Paul Rubin

2024-09-15 16:56:42 UTC

Post by dxf
That appears no better than FVALUEs ...

Those are essentially global variables, with all of their issues.

dxf

2024-09-16 04:11:35 UTC

Post by dxf
That appears no better than FVALUEs ...

Those are essentially global variables, with all of their issues.

With apparently little issue for the case presented. The push is
to write idiot-proof code that can be used anywhere. Moore calls
that 'solving the general problem' - which he eschews.

Paul Rubin

2024-09-16 06:32:28 UTC

Post by dxf
With apparently little issue for the case presented. The push is
to write idiot-proof code that can be used anywhere. Moore calls
that 'solving the general problem' - which he eschews.

Didn't one of the Chuck Moore quotes you posted say using the stacks was
better for information hiding than using globals? That includes the
return or locals stack, of course. Your computer hardware has the
capability of accessing inside the stack randomly, and Forth has words
like 2ROT which reach up to 6 levels deep in the parameter stack.
What's wrong with being able to give names to the cells? I don't
understand the obsession with refusing to use those capabilities of your
hardware.

The central idea of Forth to me is its traditional implementation as a
threaded interpreter with its extremely simple one-pass compiler. That
made it possible to make a complete interactive development environment
on a 1970s minicomputer with a floppy disc. All the language features
like the stack oriented VM are just incidental affordances on the route
to that simple interpreter. To the extent that there is a cult of the
stack machine, I don't belong to it.

Post by dxf
Moore calls that 'solving the general problem' - which he eschews.

The idea as I saw it was don't do extra work to solve the general
problem, if a simpler approach solves the immediate problem at hand.

If the general solution takes LESS work then the limited one, then doing
the extra work for the limited solution is just masochism.

minforth

2024-09-16 08:48:10 UTC

Twisting even simple problem solutions to fit the stack machine model
just to make code execution easier in the stack machime falls into
Knuth's famous "Premature Optimization is the Root of all Evil".

There are many parallels with some Forth coding styles:
https://www.geeksforgeeks.org/premature-optimization/

dxf

2024-09-16 10:01:32 UTC

Post by dxf
With apparently little issue for the case presented. The push is
to write idiot-proof code that can be used anywhere. Moore calls
that 'solving the general problem' - which he eschews.

Didn't one of the Chuck Moore quotes you posted say using the stacks was
better for information hiding than using globals?

He didn't elaborate what he meant by 'information hiding'. OTOH he did
say "It is necessary to have variables".

Post by Paul Rubin
That includes the
return or locals stack, of course. Your computer hardware has the
capability of accessing inside the stack randomly, and Forth has words
like 2ROT which reach up to 6 levels deep in the parameter stack.
What's wrong with being able to give names to the cells? I don't
understand the obsession with refusing to use those capabilities of your
hardware.

2ROT assumes '3 pairs' of cells on the stack. But even then, how often is
it used? I can't imagine juggling 6 items - though I can imagine a locals
user doing it.

Post by Paul Rubin
The central idea of Forth to me is its traditional implementation as a
threaded interpreter with its extremely simple one-pass compiler. That
made it possible to make a complete interactive development environment
on a 1970s minicomputer with a floppy disc. All the language features
like the stack oriented VM are just incidental affordances on the route
to that simple interpreter. To the extent that there is a cult of the
stack machine, I don't belong to it.

So you are free of all external influences?

Post by dxf
Moore calls that 'solving the general problem' - which he eschews.

The idea as I saw it was don't do extra work to solve the general
problem, if a simpler approach solves the immediate problem at hand.
If the general solution takes LESS work then the limited one, then doing
the extra work for the limited solution is just masochism.

When is a general solution less work? There may be a supposition it
will result in less work in the future but that's far from guaranteed.

a***@spenarnc.xs4all.nl

2024-09-15 09:20:17 UTC

locals doesn't help here. flocals maybe, but that
is the whole point. You are halfway through the rabbit hole
if you demand flocals dlocals ..

Post by Ahmed
Ahmed

dxf

2024-09-15 05:28:24 UTC

Post by dxf
https://pastebin.com/2xcRSbQW

Post by dxf
SWAP averaged 1 in 7 definitions. OVER 1 in 9. Is 'stack juggling' a
problem in forth? It doesn't appear to be.

: ARG ( n -- adr len -1 | 0 )

Post by dxf
r 0 0 cmdtail r> 0 ?do

2nip
bl skip 2dup bl scan
rot over - -rot
loop 2drop
dup if -1 end and ;

I believe it's well written and efficient.

: 2nip 2swap 2drop ;
: end postpone exit postpone then ; immediate
defer cmdtail ( -- adr len)

: ARG ( n -- adr len -1 | 0 )

Post by Anton Ertl
r 0 0 cmdtail r> 0 ?do

2nip
bl skip 2dup bl scan
rot over - -rot
loop 2drop
dup if -1 end and ;

VFX:

( 180 bytes, 44 instructions )

Post by Anton Ertl
The heavy use of global variables in this program also does not
support the idea that proper usage of the stacks makes locals
unnecessary.

I see many small colon definitions and very few variables - global or
local:

integer #TERMS \ number of terminals in DTA file
integer TERM \ working terminal#
variable #DIGIT
variable LEN
integer MAXCHR

The first two are necessarily global and would exist regardless.
The remaining three are used by a group of functions with the view of
keeping them simple. The alternative would be to carry them around as
parameters shuffling them from one function to another. That seems
worse to me.

a***@spenarnc.xs4all.nl

2024-09-15 09:53:09 UTC

Post by Anton Ertl
The heavy use of global variables in this program also does not
support the idea that proper usage of the stacks makes locals
unnecessary.

I see many small colon definitions and very few variables - global or
integer #TERMS \ number of terminals in DTA file
integer TERM \ working terminal#
variable #DIGIT
variable LEN
integer MAXCHR
The first two are necessarily global and would exist regardless.
The remaining three are used by a group of functions with the view of
keeping them simple. The alternative would be to carry them around as
parameters shuffling them from one function to another. That seems
worse to me.

One anecdote. I had a project that consisted of squashing bugs.
Proud to say that I accurately predicted the timing of each bug
separately and I was not 5 % off for the total.
One bug I refused to get a timing estimate on.
This program was written in c by lispers, and they didn't understand
that some variables are group-local, i.e. in fact global.
There was a variable ERROR , and once set the second time there
was an error this was inspected, and the program was supposed to give up.

The lispers went recursively about it and kept defining new ERROR
that were initialised to false. In case of an error,
this program never stopped.

Groetjes Albert

Anton Ertl

2024-09-12 08:55:26 UTC

Post by Paul Rubin
The 100+ occurrences of DUP, DROP, and SWAP are either an abstraction
inversion (with a smart compiler, the data ends up in registers that
could be named by locals)

I don't see an inversion here. The programmer-visible stack abstracts
(ideally) the registers in one way, the programmer-visible locals
abstracts them in a different way.

And if we look at the VICHECK example from Nick Nelson's Better Values
<http://www.euroforth.org/ef22/papers/nelson-values-slides.pdf> the
version with locals, followed by the version that eliminates the
locals:

: VICHECK {: pindex paddr -- pindex' paddr :} \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
pindex 0 paddr @ WITHIN IF \ Index is valid
pindex paddr
ELSE \ Index is invalid
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
Z" Invalid index " 2 PICK ZFORMAT Z+
Z" for " Z+ OVER CELL- @ Z+ \ Add NFA from extra cell
Z" length " Z+ OVER @ ZFORMAT Z+
ERROR
NIP 0 SWAP \ Use zeroth index
THEN ;

So by keeping the values on the stack you not just eliminate their
repeated mention, but also eliminate one branch of the IF. With a
more capable Forth system a synthesis of the two approaches is
possible:

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
{: pindex paddr :}
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
THEN ;

Or one could factor out the code between IF and THEN and stay within
the confines of VFX:

: VIERROR {: pindex paddr -- 0 paddr :}
Z" Invalid index " pindex ZFORMAT Z+
Z" for " Z+ paddr >NAME 1+ Z+ \ >NAME does not work for separated data
Z" length " Z+ paddr @ ZFORMAT Z+
ERROR
0 paddr \ Use zeroth index
;

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
over 0 2 pick @ WITHIN 0= IF \ Index is invalid
VIERROR
THEN ;

The check can be simplified, which also simplifies the stack handling:

: VICHECK ( pindex paddr -- pindex' paddr ) \ Checks for valid index
\ paddr is the address of the data, the first cell of which contains
\ the array size
2dup @ u>= IF \ Index is invalid
VIERROR
THEN ;

Post by Paul Rubin
or they are stack traffic whose cost has to be
compared with the cost of indexed references to locals in the return
stack.

That check often results in the code without locals winning, but that
is, for a large part, due to suboptimal implementations of locals.
Ideally a perfect compiler will produce the same code for code using
locals and for equivalent code using stack manipulation words, because
the data flow is the same. This actually works out in the case of lxf
processing various implementations of 3DUP, including a locals-based
one; see <***@mips.complang.tuwien.ac.at>. However, in
general Forth systems do not produce perfect results.

I have now looked at what happens for the first two variants of
VICHECK; I have defined the non-standard words as follows to make it
possible to compile the code:

defer dummy
: z" [char] " parse 2drop postpone dummy ; immediate
defer zformat
defer z+
defer >name
defer error

I looked at 3 systems: Gforth (because I work on it); lxf (because it
produces the best results in the 3DUP case); VFX (because it's the
system Nick Nelson uses). The numbers below are the number of bytes
of native code:

locals stack
401 336 gforth-fast (AMD64)
179 132 lxf 1.6-982-823 (IA-32)
182 119 VFX FX Forth for Linux IA32 Version: 4.72 (IA-32)
241 159 VFX Forth 64 5.43 (AMD64)

Post by Paul Rubin
I'd agree that they aren't necessary "juggling" which evokes
permuting stuff in the stack outside the usual FIFO order. That does
happpen a little bit though, with OVER, ROT, etc.

In particular, in Starting Forth ROT is illustrated with a juggler
(you see the juggling balls right beside her), and the swap dragon
comments: "I hate jugglers".

Loading Image...

- anton

Paul Rubin

2024-09-15 19:39:28 UTC

Post by Anton Ertl
So by keeping the values on the stack you not just eliminate their
repeated mention, but also eliminate one branch of the IF.

Is the repeated mention just a matter of DRY, assuming the compiler puts
the locals in registers so that the extra mention doesn't transfer them
between stacks a second time? I do prefer your version where you factor
out VIERROR.

I wonder whether Moore's 1999 aversion to locals had something to do
with his hardware designs of that era, where having more registers
(besides T and N) connected to the ALU would have cost silicon and
created timing bottlenecks. Today's mainstream processors have GPR's
anyway, but I wonder what the real problem was with stack caches like
the CRISP: https://thechipletter.substack.com/p/at-and-ts-crisp-hobbits

Commenters there say CRISP failed basically because its early
implementation was buggy, it lost an important design win because of the
bugs, and AT&T management then gave up on it.

I remember the SPARC had "register windows" but I don't know if that's
similar or what went wrong with them.

Anton Ertl

2024-09-16 16:26:51 UTC

Post by Anton Ertl
So by keeping the values on the stack you not just eliminate their
repeated mention, but also eliminate one branch of the IF.

Is the repeated mention just a matter of DRY, assuming the compiler puts
the locals in registers so that the extra mention doesn't transfer them
between stacks a second time?

That, too, but the elimination of the ELSE has more weight with me.

In the VICHECK ( pindex paddr -- pindex' paddr ) case this favours the
locals-less code. For a word that is similar in having an IF where
only one side has to do something other than to make sure that the
stack effect is satisfied, but with the stack effect ( x1 x2 -- ), the
advantage s with locals code:

: WORD1 {: x1 x2 -- :}
... ( f ) if ( )
... x1 ... x2 ...
then ;

: WORD2 ( x1 x2 -- )
... ( f ) if ( x1 x2 )
...
else
2drop
then ;

Forth has a special word ?DUP for one specific variant of this
situation, but it helps only in specific cases.

Post by Paul Rubin
I wonder whether Moore's 1999 aversion to locals had something to do
with his hardware designs of that era, where having more registers
(besides T and N) connected to the ALU would have cost silicon and
created timing bottlenecks.

I think he had the aversion long before he did such hardware designs.
He has been quoted as thinking that humans should do all they can to
make the computer's work easier (or something like that). While his
sayings, like any religious text, are sufficiently fuzzy to be
interpretable in many ways, his denouncing of locals over the years
makes it clear that he thinks that humans should invest time to write
code with stack manipulation words and globals, so that the compiler
does not need to be bloated by the code for dealing with locals.

Post by Paul Rubin
Today's mainstream processors have GPR's
anyway, but I wonder what the real problem was with stack caches like
the CRISP: https://thechipletter.substack.com/p/at-and-ts-crisp-hobbits

I don't think that the CRISP lived long enough for the real problems
to become big: In contrast to GPRs or the stacks of Chuck Moore's
chips, the stack accesses in CRISP alias with potentially all memory
accesses, so every load of a C variable on a stack may potentially
have to produce the result of a preceding store (and it often actually
is the result of the previous instruction). In the last four decades,
CPU designers have invented a number of techniques for predicting when
loads don't alias earlier stores, and for fast store-to-load
forwarding when they do, but these techniques are not cheap. Even
today, a CPU can do maybe 3 loads and two stores, while they can deal
with a dozen or so input operands in registers, and maybe 6 output
operands in registers. The CRISP's successors would have been
uncompetetive soon after introduction, and I doubt that they would
ever have reached competetive performance.

Post by Paul Rubin
I remember the SPARC had "register windows" but I don't know if that's
similar or what went wrong with them.

Not at all similar. Register windows were a window into a larger
register file, no aliasing with memory at all; that was treated as a
stack of register windows.

In a similar vein (all heritage of Berkeley RISC) were the AMD 29K's
and the IA-64's register stack. It's interesting that Forthers were
never excited about that; the register stack allows to push or pop
individual registers instead of register windows. I think the pushing
and popping is not a cheap operation, so you would want to use it only
at the call, but you could have used it for one of the Forth stacks,
and avoided some memory accesses that way.

- anton

Hans Bezemer

2024-09-11 09:02:00 UTC

Post by Hans Bezemer
Needless to say this copying, releasing and stuff takes time.

Similar to DUP (copy) or DROP (release).

Post by Hans Bezemer
In all honesty I must state that this overhead is not always
translated to a diminished performance

Right, I don't think one can assert a performance hit without
measurements supporting the idea.

Post by Hans Bezemer
TL;DR my objections are mostly based on pure architectural arguments,
rather than practicality.

Sure, that's reasonable, it's a matter of what you prefer. That's
harder to take issue with than claims about performance.

Post by Hans Bezemer
I also don't like Python, PHP and Perl for those very same reasons -

A lot depends on how solid you want to make your implementation. I got
locals in uBasic/4tH.

: exec_local ( --)
[: get_exp 0 max 27 frame dup @ - + min negate cells frame + dup local <
if E.MANYLOC throw else frame @ over ! to frame then ;]
exec_function \ execution semantics for LOCALS()
;

This one reserves room for locals. You may use up to 26 locals per
function since there are 26 letters in the alphabet (duh!).

: exec_param ( --)
frame exec_local frame \ allocate locals, save pointers
begin over over > while cell+ (pop) over ! repeat drop drop
;

If the reserved room has to be initialized by the stack, it calls
EXEC_LOCAL and then copies the values there.

: exec_return ( --)
get_token paren? putback if ['] get_push exec_function then
gpop prog ! frame dup local #local 1- cells + >
if E.NOSCOPE throw ;then @ to frame
;

This one looks whether RETURN returns a value - and if it does, it
pushes this value on the stack. Then it sets the return address. It
checks for the sanity of the stack frame and if okay THEN it finally
updates the stack pointer.

You comfortable left out the initialization of the stack frame. Agreed,
if ALL values are transferred to the return stack the overhead is
minimal. But how often happens that?

Post by Paul Rubin
Those are at a totally different level than Forth, in terms of layers of
implementation and runtime libraries, overhead, etc. It's better to
compare to something like C, or a hypothetical cleaned up version of C,
or even to Forth with locals ;).

True - but that's not the level of abstraction I'm considering. I think
a language should have a well designed core, surrounded by a
constellation of extensions. Like C with its standard library and Forth
with its word sets. For comparison - C got a few dozen keywords. PHP got
at least two different ways to extend binary extensions alone. A full
Python installation is scattered all over the filesystem, so you got a
hell of a job to extract a single, transferable application. Not to
mention the awkward syntax (although they fixed some of it in v3). In
Perl you always have to wonder which prefix is fashionable today.

Now, I won't say Forth doesn't have its issues. I think IN ESSENCE
recognizers are a beautiful idea. Extend it to strings and you could
eradicate "parsing words" and have something like:

"lib/mylib.4th" include

"Square" : "the square is:" print dup * cr ;

But okay, we'll do with what we have ;-) And BTW, TURNKEY should be
standard. Clean up the dictionary, pump out an executable.

Hans Bezemer

a***@spenarnc.xs4all.nl

2024-09-11 11:29:03 UTC

In article <nnd$545e2daa$***@548f76d6156a46d8>,
Hans Bezemer <***@gmail.com> wrote:
<SNIP>

Post by Hans Bezemer
Now, I won't say Forth doesn't have its issues. I think IN ESSENCE
recognizers are a beautiful idea. Extend it to strings and you could
"lib/mylib.4th" include
"Square" : "the square is:" print dup * cr ;

You have that backward, it must be:

{ "the square is:" print dup * cr } : Square

If there is one thing to preserve in Forth that is the
convention that defining words can parse new names in
the dictionary by forward scanning, without those considered strings.
Here { introduces a denotation, without being a PREFIX (" recognizer"),
such as 0x in 0xDEADBEEF is. It is the same within a definition like
numbers and nowadays strings.

{ "the square is:" print dup * cr } CONSTANT orang_utan
orang_utan DUP : Square : quadrate

Post by Hans Bezemer
But okay, we'll do with what we have ;-) And BTW, TURNKEY should be
standard. Clean up the dictionary, pump out an executable.

I have create a language on that principle, e.g. meta
accepts 2 xt's a build and a run one. meta is the mother of
all defining words:
{ , } { @ } meta CONSTANT
{ CELL ALLOT } { } meta VARIABLE
{ 2 CELLS ALLOT } { } meta 2VARIABLE
{ } { EXECUTE } meta :
{ } { } meta DATA \ My favorite.

CREATE DOES> is the right idea, an object with an allocation
part and a behavior, but the syntax is akward beyond despair.

I have a backlog, busy with preserving projects dating from the
80's, so don't expect a publication soon.

Post by Hans Bezemer
Hans Bezemer

Paul Rubin

2024-09-12 07:10:03 UTC

Post by Hans Bezemer
You comfortable left out the initialization of the stack
frame. Agreed, if ALL values are transferred to the return stack the
overhead is minimal. But how often happens that?

I don't understand this. {: a b c :} transfers 3 elements from the
parameter stack to the return stack. That has some cost, but it is
offset by avoiding some DUP and similar operations. Is it relevant at
all anyway? Old fashioned Forth interpreters are pretty fast, and if
you're worrying about avoiding a stack transfer here or there, you need
an optimizing compiler.

Adding safety checks has a cost, but once the program appears debugged,
I think Forth philosophy is to turn off the checks.

Post by Hans Bezemer
True - but that's not the level of abstraction I'm considering. I
think a language should have a well designed core, surrounded by a
constellation of extensions. Like C with its standard library and
Forth with its word sets.

You might like Lua or Scheme for simple higher level languages with that
style of design. C has some warts but its complexity in terms of
keywords doesn't seem much worse than Forth's core words.

Stephen Pelc

2024-09-08 14:56:01 UTC

Post by Buzz McCool
Would you have any videos talking about Forth locals? You and dxf are
far more adept at stack manipulations than I. I'm thinking I can get a
word up and working with locals and then convert to manual stack
manipulations afterwards if necessary.

Don't. You will only become dependent on locals. Use of locals should
be a considered decision.

Post by Buzz McCool
When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

We (MPE) converted much of our TCP/IP stack not to use locals. This
was mostly on ARM7 devices, but the figures for other 32 bit CPUs of
the period (say 15 years ago) were similar. Code density improved by
about 25% and performance by about 50%.

Stephen

minforth

2024-09-08 16:09:32 UTC

Post by Stephen Pelc

Don't. You will only become dependent on locals. Use of locals should
be a considered decision.

Post by Buzz McCool
When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

These are good examples of "it depends". And also that one should never
start optimising without profiling. I have had similar experiences in
the
other direction (i.e. with locals) with vector maths.

Another observation is that many Forthers do not seem to put much
emphasis
on programming time and code maintainability or readability, which is
easier to achieve by using locals. The code conversion for your TCP/IP
stack must have taken a lot of programming time, but it must have been
worth it because it paid off on another level.

But when to use or avoid locals is an old argument that has long since
been put to rest. It all depends...

Hans Bezemer

2024-09-09 15:15:32 UTC

Post by minforth
Another observation is that many Forthers do not seem to put much
emphasis
on programming time and code maintainability or readability, which is
easier to achieve by using locals.

I won't dispute that using the "locals" shortcut *may* save some
programming time - but to me, the moment you decide to put the whole
shebang in locals, you enter another mindset. Because at that moment you
cease to consider the algorithm itself, but start banging out code.

You no longer consider "do I need that, do I need that now, do I need
that here", you just start creating more local variables. Somehow that
kills my train of mind..

I do dispute that "no locals" Forth kills maintainability - or
readability. I'm always happy to see a whole bunch of one-liners.
Doesn't happen to me every day, but often enough. And then you can
functionally comment your code. I usually comment it from column 40 on
and at the top of a word.

I've maintained non-trivial programs for *DECADES* without any trouble.
I've plugged in a garbage collection module in my uBasic/4tH interpreter
- and radically changed it later. My rule is: if you can't figure it
out, rewrite it until you do. It happens, but not frequently.

Hans Bezemer

minforth

2024-09-09 21:16:49 UTC

Post by Hans Bezemer
I won't dispute that using the "locals" shortcut *may* save some
programming time - but to me, the moment you decide to put the whole
shebang in locals, you enter another mindset. Because at that moment you
cease to consider the algorithm itself, but start banging out code.
You no longer consider "do I need that, do I need that now, do I need
that here", you just start creating more local variables. Somehow that
kills my train of mind..

The thing is that your train of mind is focused on optimising the
parameter flow via the stack. you are doing stupid work that an
intelligent compiler does automatically today. it makes much more sense
to focus your brainware on the algorithms or automation tasks to be
solved.

Since such algorithms/tasks are mostly formulated mathematically or
logically, an almost 1:1 translation of such formulations by using
locals
is straightforward and less error prone. Use descriptive names and the
code
becomes quasi commented simultaneously.

dxf

2024-09-10 02:21:30 UTC

Post by minforth
...
Since such algorithms/tasks are mostly formulated mathematically or
logically, an almost 1:1 translation of such formulations by using
locals
is straightforward and less error prone. Use descriptive names and the
code
becomes quasi commented simultaneously.

Mathematical formulations are typically expressed algebraically. Forth
is stack-based and uses RPN. It's a different world. To use the latter
effectively requires a different mindset. Do you really formulate or
sketch out tasks algebraically? For me it ended when I stopped using
BASIC.

a***@spenarnc.xs4all.nl

2024-09-10 10:10:06 UTC

Post by Hans Bezemer

Post by minforth
Another observation is that many Forthers do not seem to put much
emphasis
on programming time and code maintainability or readability, which is
easier to achieve by using locals.

I won't dispute that using the "locals" shortcut *may* save some
programming time - but to me, the moment you decide to put the whole
shebang in locals, you enter another mindset. Because at that moment you
cease to consider the algorithm itself, but start banging out code.
You no longer consider "do I need that, do I need that now, do I need
that here", you just start creating more local variables. Somehow that
kills my train of mind..
I do dispute that "no locals" Forth kills maintainability - or
readability. I'm always happy to see a whole bunch of one-liners.
Doesn't happen to me every day, but often enough. And then you can
functionally comment your code. I usually comment it from column 40 on
and at the top of a word.
I've maintained non-trivial programs for *DECADES* without any trouble.
I've plugged in a garbage collection module in my uBasic/4tH interpreter
- and radically changed it later. My rule is: if you can't figure it
out, rewrite it until you do. It happens, but not frequently.

I'm cleaning up the editor that I use all the time. It sports dozens of
global variables and it is hard to see why it could dispense with them.

LOCAL is an expensive feature, because they are re-entrant.
Forthers may know where and why an expensive feature is used.

Post by Hans Bezemer
Hans Bezemer

Anton Ertl

2024-09-08 16:27:47 UTC

Post by Stephen Pelc
Don't. You will only become dependent on locals. Use of locals should
be a considered decision.

Post by Buzz McCool
When is it necessary? dxf showed a word w/o locals to have ~%30 fewer
instructions than a word with locals. Is that a common occurrence?

So MPE (and Forth, Inc.) discourage the use of locals because they
implement locals inefficiently, and they implement locals
inefficiently because there are so few uses of locals around. A
chicken-and-egg problem.

Concerning the conversion of the TCP/IP stack: Have you considered the
alternative of spending MPE's time on making the locals implementation
more efficient?

See also:

@InProceedings{ertl22-locals,
author = {M. Anton Ertl},
title = {Are Locals Inevitably Slow?},
crossref = {euroforth22},
pages = {48--49},
url = {http://www.euroforth.org/ef22/papers/ertl-locals.pdf},
url-slides = {http://www.euroforth.org/ef22/papers/ertl-locals-slides.pdf},
video =

OPTnote = {presentation slides},
abstract = {Code quality of locals on two code examples on
various systems}
}

An update on the table for the example:

: 3dup.3 {: a b c :} a b c a b c ;

instr. bytes system
31 117 Gforth AMD64
16 44 iforth 5.0.27 (plus 20 bytes entry and return code)
7 19 lxf 1.6-982-823 32-bit
32 127 SwiftForth 4.0.0-RC89 (calls LSPACE)
26 92 VFX Forth 64 5.11 RC2

- anton

Anton Ertl

2024-09-09 17:34:03 UTC