Discussion:
Understanding Forth's memory allocation
(too old to reply)
AKE
2013-01-24 20:29:50 UTC
Permalink
I'm a bit puzzled by Forth's memory allocation system:

Example:
here . -- 289916
create x x -- 289920
here . -- 289920
x . -- 289920
10 cells -- 40
allot
here . -- 289960
create y y -- 289964 \ advances 4 bytes
289960 dump4 -- 49 0 0 0 \ 49 is stored in 298960
here . -- 289964
y . -- 289964
10 cells allot
here . -- 290004 \ instead of 290000
AKE
2013-01-24 20:35:11 UTC
Permalink
Not sure why that got posted suddenly.

What has me puzzled is why the 'create' word advances the heap pointer to the next cell.

My understanding from the documentation is that the 'here' word points to the NEXT unallocated cell of memory.

In which case, in principle, 'create x' could simply assign x the value of here. But instead it seems to advance the here pointer one cell ahead, set x to this cell, and sets the value of the cell it just left behind to 49 (decimal).

Is this somehow a buffer overrun sentinel?

BTW dump4 is simply a diagnostic snippet:

: dump4 ( addr -- ) \ displays 4 byte values in little endian order, beginning at addr, i.e. LSB val val MSB
dup c? 1+ dup c? 1+ dup c? 1+ c? ;

Any insight appreciated.

Thanks
Jason Damisch
2013-01-24 21:21:13 UTC
Permalink
I am using SwiftForth

HERE . 5985570
CREATE Z
HERE . 5985604
5985604 5985570 - . 34

What my system is doing is to allocate or compile a complete subroutine, in Forth a Word, at the dictionary pointer, and then afterwards to advance the dictionary pointer to just past the new word. Included in this 34 bytes is the text name for 'Z', a bit of code to give it a behavior, and a link to the previously defined word so that 'Z' can be found in a linked list of other Forth words.

What Forth system are you using? I am a little confused as well as to why your system is only advancing the pointer 4 bytes ahead for a created word.

What CREATE does is to only and just create a word who's sole purpose is to push an address onto the top of the stack which points to where HERE is pointing after the word is defined.

If you wanted to, you could give 'Z' 256 bytes by doing this

CREATE Z 256 ALLOT

In which case your word 'Z' will have 256 bytes of memory alloted to it.

Jason

which would give 'Z' a total memory allocated of 260 bytes, 4 for when Z was made, and
Mark Wills
2013-01-24 21:53:18 UTC
Permalink
Post by Jason Damisch
I am using SwiftForth
HERE . 5985570
CREATE Z
HERE . 5985604
5985604 5985570 - . 34
What my system is doing is to allocate or compile a complete subroutine, in Forth a Word, at the dictionary pointer, and then afterwards to advance the dictionary pointer to just past the new word.  Included in this 34 bytes is the text name for 'Z', a bit of code to give it a behavior, and a link to the previously defined word so that 'Z' can be found in a linked list of other Forth words.
What Forth system are you using?  I am a little confused as well as to why your system is only advancing the pointer 4 bytes ahead for a created word.
What CREATE does is to only and just create a word who's sole purpose is to push an address onto the top of the stack which points to where HERE is pointing after the word is defined.
If you wanted to, you could give 'Z' 256 bytes by doing this
CREATE Z  256 ALLOT
In which case your word 'Z' will have 256 bytes of memory alloted to it.
Jason
which would give 'Z' a total memory allocated of 260 bytes, 4 for when Z was made, and
It makes sense if it's a 32 bit ITC Forth.

As you say, after when CREATE runs it compiles an XT (this only needs
to be a single cell) into the definition to give it some behaviour. By
default it will behave like a variable, so a reference to (DOVAR) is
normally compiled in (assuming ITC system).

Some systems reserve two cells, as the CREATEd word may be assigned a
behaviour via DOES>. In some implementations, when DOES> is used to
assign a behaviour a second cell has to be reserved:

: VAR CREATE 0 , DOES> @ ;
VAR FRED

If you de-compile FRED, on some ITC systems you'll see this:

(DODOES) xxxx

So that's a call to (DODOES) and an in-line parameter to (DODOES)
which tells (DODOES) the address of the parent code (in this example,
it will point to the address of @ in the definition of VAR).

Systems that implement DOES> in this way often have to waste a cell
for all CREATEd words, since DOES> may be applied, but may not. By
reserving two cells in all cases, it greatly simplifies >BODY.

However, it is possible to implement DOES> in an ITC system where only
one cell is reserved. My system does it. The idea was invented by Dean
Sanderson, I believe, who was a Forth TC member.
AKE
2013-01-24 22:14:41 UTC
Permalink
Hi Mark:

Thanks for the posting.
Post by Mark Wills
It makes sense if it's a 32 bit ITC Forth.
I'm using Win32Forth.

Your reference to ITC got me googling and found this very helpful article on W32F internals.

http://win32forth.sourceforge.net/doc/p-arch2.htm
How can I de-compile a word? I'm assuming you mean something different than 'see'? ('see' shows the colon definition or the assembly language definition)
Post by Mark Wills
However, it is possible to implement DOES> in an ITC system where only
one cell is reserved. My system does it. The idea was invented by Dean
Sanderson, I believe, who was a Forth TC member.
So do you know why the 49 is left in the one cell?
Jason Damisch
2013-01-25 00:30:31 UTC
Permalink
Post by AKE
So do you know why the 49 is left in the one cell?
If either before or after I define a word I do this

HERE 20 DUMP
46EBB8 04 64 75 6D 70 74 00 00 00 00 00 00 00 00 00 00 .dumpt..........

then I do this

: LUMP DUMP ;
HERE 20 LUMP
46EBD4 04 6C 75 6D 70 00 00 00 00 00 00 00 00 00 00 00 .lump...........

notice that there is a period, a dot just before the dump and the lump, in the hex this is number 4 which is a count of the string DUMP or LUMP

So the space just after HERE is being used a a text buffer.

SwiftForth is case insensitive.

Personally, this behavior used to confuse me. But, it makes sense on memory constrained systems, which is where Forth was born.

Jason
Albert van der Horst
2013-01-25 14:59:58 UTC
Permalink
In article <8a80ab21-302e-4efa-980d-***@w3g2000yqj.googlegroups.com>,
Mark Wills <***@yahoo.co.uk> wrote:
<SNPI>
Post by Mark Wills
Systems that implement DOES> in this way often have to waste a cell
for all CREATEd words, since DOES> may be applied, but may not. By
reserving two cells in all cases, it greatly simplifies >BODY.
However, it is possible to implement DOES> in an ITC system where only
one cell is reserved. My system does it. The idea was invented by Dean
Sanderson, I believe, who was a Forth TC member.
I went the other way. I was puzzled by the implementations I studied
until I saw the light and got rid of the problem at the expense of
a mere 8 bytes per Forth word, wasting at most 0.0000001 % of my
ressources.

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
AKE
2013-01-24 22:09:58 UTC
Permalink
Hi Jason:

Thanks for that. I'm using Win32Forth. And I think Mark has just posted an explanation of why W32F advances one cel...
Hugh Aguilar
2013-01-25 03:11:18 UTC
Permalink
Post by AKE
My understanding from the documentation is that the 'here' word points to the NEXT unallocated cell of memory.
This is another thing that ANS-Forth screwed up. The code and data
"may" be intermingled, or they "may" be separate. So, as a practical
matter, you have to assume that they are intermingled. This is the
problem with ANS-Forth --- overuse of the word "may" --- it is a
standard that doesn't standardize, but pretty much anything "may" be
considered standard.

My MFX was for the MiniForth, which is a Harvard Architecture
processor. Code and data are in separate memories. In my system,
CREATE was a constant of HERE. Constants got compiled as literals.
When CREATE was executed at compile-time, it did not affect DP (and
HERE) at all.

For example, it was possible to write code like this:

2 constant w \ the cell size of the processor
create four-array
create first w allot
create second w allot
create third w allot
create fourth w allot

In this case, four-array is an array of 4 words all adjacent to each
other as would be expected in an array. First, second, third and
fourth are constants of the addresses of individual elements in the
array.

P.S. --- You may be interested in my ALLOCATE, ALLOCATION and CONCRETE-
ALLOCATE in my novice package:
http://www.forth.org/novice.html
Jason Damisch
2013-01-25 03:43:24 UTC
Permalink
If I am not mistaken, keeping the headers and the bodies of the word definitions separate facilitates saving stand alone Forth applications which do not have headers in them, thus saving space by leaving out something which the end user has no use for.

Also, this would be common in embedded systems where there may not be enough RAM space for the headers.

Jason
Albert van der Horst
2013-01-25 15:17:03 UTC
Permalink
Post by Jason Damisch
If I am not mistaken, keeping the headers and the bodies of the word
definitions separate facilitates saving stand alone Forth applications
which do not have headers in them, thus saving space by leaving out
something which the end user has no use for.
Speaking from practice:
I run all my solutions to Euler problems as turnkey systems like
euler411 10,000
If the problem contains a 10 million cell sieve (not uncommon)
the executable would be 80 Mbyte+. 1]
The 70 Kbyte executable is totally dwarfed by this. Worrying about
making the headers smaller is unproductive.
OTOH, having an initialisation that changes a header (good I didn't
loose them) to point to space allocated at run time is a big win,
such that the program is hardly larger than the Forth system.
Post by Jason Damisch
Also, this would be common in embedded systems where there may not be
enough RAM space for the headers.
Increasingly embedded systems have a dearth of RAM and plenty of flash.
Headers would be in the flash. Recent developments like noforth
for the MSP430 maintain full headers.
Post by Jason Damisch
Jason
Groetjes Albert

1] Not that I worry too much. Testoutputs of euler problems routinely
run in the Gigabytes.
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
AKE
2013-01-25 07:14:42 UTC
Permalink
Post by Hugh Aguilar
Post by AKE
My understanding from the documentation is that the 'here' word points to the NEXT unallocated cell of memory.
This is another thing that ANS-Forth screwed up. The code and data
"may" be intermingled, or they "may" be separate. So, as a practical
matter, you have to assume that they are intermingled. This is the
problem with ANS-Forth --- overuse of the word "may" --- it is a
standard that doesn't standardize, but pretty much anything "may" be
considered standard.
My MFX was for the MiniForth, which is a Harvard Architecture
processor. Code and data are in separate memories. In my system,
CREATE was a constant of HERE. Constants got compiled as literals.
When CREATE was executed at compile-time, it did not affect DP (and
HERE) at all.
<snip>
Post by Hugh Aguilar
In this case, four-array is an array of 4 words all adjacent to each
other as would be expected in an array. First, second, third and
fourth are constants of the addresses of individual elements in the
array.
Excellent example -- this is exactly the kind of thing I would have expected to be able to do with create and allot, and was surprised when I found that consecutive allocation addresses (from create) were not consecutive at all.

Using your example, but changing the allotment to a single byte (char) one might expect a four character string with pointers to each letter -- but things (in Win32Forth) now get much worse:

here . -- 291977
create four-char
create first-letter 1 allot
create second-letter 1 allot
create third-letter 1 allot
create fourth-letter 1 allot
here . -- 292013 \ bizarrely, 36 bytes have been used
\ for this four character array!
\ ... and the characters are non-contiguous

Here's what seems to be going on:
here . -- 292977 \ starts on non-cell aligned boundary (cell is 32 bits)
four-char . -- 292984 \ doesn't like non-aligned,
\ so pad 3 to get to aligned boundary at 292980
\ want to write cell 49, do so at 292980
\ now allot four-char at 292984
first-letter . -- 292988 \ surprise! since four-char had no allotment,
\ it was clobbered by the 49 being written at 292984
\ and first-letter now allotted at 292988
second-letter . -- 292096 \ pad 3 to get to 292092, write 49, then allot
third-letter . -- 292004 \ pad 3 to get to 202000, write 49, then allot
fourth-letter . -- 292012 \ pad 3 to get to 202008, write 49, then allot
here . --- 292013 \ since only a byte was allotted, here point
\ to the next byte, even though it will
\ move to the next cell aligned address 202016 for
\ its next behaviour...
Post by Hugh Aguilar
P.S. --- You may be interested in my ALLOCATE, ALLOCATION and CONCRETE-
http://www.forth.org/novice.html
Thanks for this -- I am somewhat new to Forth, so these will be quite useful.
Mark Wills
2013-01-25 08:08:50 UTC
Permalink
here .          -- 291977
create four-char
create first-letter 1 allot
create second-letter 1 allot
create third-letter 1 allot
create fourth-letter 1 allot
here .         -- 292013  \ bizarrely, 36 bytes have been used
                          \ for this four character array!
                          \ ... and the characters are non-contiguous
I'm not familiar with Win32Forth, but I think I'm correct in saying
that your assumption is incorrect :-) I'll try and explain why.

Assume an ITC system. All bets are off with native systems :-)

Firstly, when CREATE is used, HERE is pointing at the PFA (parameter
field address) of the created word:

CREATE FRED

Results in

CFA PFA
-----+-------+-----+
FRED | DOVAR | ??? |
-----+-------+-----+
^
HERE

CFA=Code Field Address (points to executable code (in this case
DOVAR).
PFA=Parameter Field Address (holds data that the word can work with).

Here is pointing to the cell directly after the cell that calls DOVAR.
The contents of the cell are effectively garbage/unknown, hence ???.

Now, a valid, legal use of CREATEd words is to "comma" or poke data (a
"payload", if you will) into the word. So, that then the CREATEd word
is executed, it can return some data:

CREATE TEN
10 ,

The word , (comma) takes whatever is on the stack and appends it into
the 'code stream' (for want of a better description). It uses HERE to
determine where it should go. It appends it, and updates HERE.

CREATE TEN
10 ,

Results in the following:

CFA PFA
-----+-------+-----+----
TEN | DOVAR | 10 |
-----+-------+-----+----
^
HERE

If you enter the above code and do:

TEN @ .

You'll get 10 displayed. Why? Because the default behaviour of a
CREATEd word is to behave like a variable. Variables leave an address
on the stack. Which address? Generally speaking (most Forths, I guess)
the address of the cell immediately following.

If you do the following:

CREATE TEN
CREATE TWO

You end up with the beginning of the definition (the dictionary entry)
of TWO sitting in the *data-space* (the PFA) of TEN:

CFA PFA CFA PFA
-----+-------+-----+-------+-----+
TEN | DOVAR | TWO | DOVAR | ??? |
-----+-------+-----+-------+-----+
^
HERE

Can you see that? The start of TWOs dictionary entry is sitting in the
data-space of TEN. Woops.

So, when using CREATE, you need to put something in the data-space,
either by comma-ing something in:

CREATE NINE 9 ,

Which gives:

CFA PFA
------+-------+-----+----
NINE | DOVAR | 9 | ???
------+-------+-----+----
^
HERE

Or by using allot:

CREATE TEMP 1 CELL ALLOT

Which gives:

CFA PFA
------+-------+-----+----
TEMP | DOVAR | ??? | ???
------+-------+-----+----
^
HERE

Using ALLOT results in un-initialised data-space. But that's okay,
since the default behaviour of CREATEd words is to return the address
of their data-space, we can get to it with ! :

99 TEMP !

Now there is 99 in TEMPs dataspace.

Obviously, we can have a bigger data space, not just one cell:

If we do this:

CREATE BUFF 4 CELLS ALLOT
: DOUBLE DUP + ;

We get:

CFA PFA
------+-------+-----+-----+-----+-----+--------+-----+---+
BUFF | DOVAR | ??? | ??? | ??? | ??? | DOUBLE | DUP | + |
------+-------+-----+-----+-----+-----+--------+-----+---+
^
HERE

I can fill BUFF with data like this:

: FILL-BUFF
100
BUFF 4 CELLS + BUFF DO
DUP I !
100 +
LOOP
DROP ;

Which gives us:

CFA PFA
------+-------+-----+-----+-----+-----+--------+-----+---+-----------
+-----
BUFF | DOVAR | 100 | 200 | 300 | 300 | DOUBLE | DUP | + | FILL-BUFF
| ...
------+-------+-----+-----+-----+-----+--------+-----+---+-----------
+-----

You get the idea. Note, the data-space (in fact the whole dictionary)
is not protected in any way. You're free to screw up in any way you
like! Taking the last example, if I did:

BUFF 150 99 FILL

I would fill the 150 bytes starting at BUFF with 99, which would write
over the definition of FILL-BUFF, and break the dictionary linkage
(probably) thus rendering the system brain-dead - it wouldn't be able
to find any words.

Hope this helps to give some insight.

Regards

Mark
Bernd Paysan
2013-01-25 14:44:44 UTC
Permalink
Post by Hugh Aguilar
Post by AKE
My understanding from the documentation is that the 'here' word
points to the NEXT unallocated cell of memory.
This is another thing that ANS-Forth screwed up. The code and data
"may" be intermingled, or they "may" be separate. So, as a practical
matter, you have to assume that they are intermingled.
Yes.
Post by Hugh Aguilar
This is the
problem with ANS-Forth --- overuse of the word "may" --- it is a
standard that doesn't standardize, but pretty much anything "may" be
considered standard.
A standard codifies common practise, and leave undefined where no common
practise has arrived yet. I think you are simply mistaken what the
purpose of a standard is. You propose "StrongForth" as "alternative
standard", but it isn't a standard, it will be just another incompatible
system:

http://xkcd.com/927/

The way to reduce the "maybe" is not to smoke Marlboro, but to convince
people that a particular decision is worth to implement, and thereby
common practise emerges. E.g. in Forth200x we have eliminated the
"maybe" unified floating point stack, as the common (and only sane) way
to do it is with a separate stack.

A standard is a compromise between different people. If you don't want
to compromise, fine, but you have to go Chuck's way (or the way of my
footer): You have to do it yourself. This is a fine tradition in the
Forth world. But it's not a standard.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
Elizabeth D. Rather
2013-01-25 07:54:15 UTC
Permalink
Post by AKE
Not sure why that got posted suddenly.
What has me puzzled is why the 'create' word advances the heap pointer to the next cell.
My understanding from the documentation is that the 'here' word points to the NEXT unallocated cell of memory.
In which case, in principle, 'create x' could simply assign x the value of here. But instead it seems to advance the here pointer one cell ahead, set x to this cell, and sets the value of the cell it just left behind to 49 (decimal).
Is this somehow a buffer overrun sentinel?
: dump4 ( addr -- ) \ displays 4 byte values in little endian order, beginning at addr, i.e. LSB val val MSB
dup c? 1+ dup c? 1+ dup c? 1+ c? ;
Any insight appreciated.
Thanks
A lot of people have commented already, but I just want to add a few notes.

In the first place, every Forth implementation manages memory
differently. Some intermingle code (including heads of definitions plus
executable code) and data space, and some do not. Some align definitions
on cell boundaries and some do not (this may depend on the target
processor). Some even allocate extra space so some things fall on, say,
mod-16 addresses if that improves performance on a particular processor.

In general, a person writing an application should only be concerned
with managing data space, because of all the implementation variations.
If you are willing for your programs to be dependent on a particular
implementation, you can find out how it works and make use of that
knowledge, but it's really better not to.

CREATE <name> will construct a dictionary entry for <name> that returns
the address of the next cell, but it doesn't allocate any data space.
You must allocate as much as you like using ALLOT. Your system
(Win32Forth) clearly intermingles code and data space, which is why you
found HERE advanced when you made a definition.

Hope this helps.

Cheers,
Elizabeth
--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================
Andrew Haley
2013-01-25 08:59:45 UTC
Permalink
Post by AKE
Not sure why that got posted suddenly.
What has me puzzled is why the 'create' word advances the heap
pointer to the next cell.
My understanding from the documentation is that the 'here' word
points to the NEXT unallocated cell of memory.
In which case, in principle, 'create x' could simply assign x the
value of here. But instead it seems to advance the here pointer one
cell ahead, set x to this cell, and sets the value of the cell it
just left behind to 49 (decimal).
CREATE may or may not use memory at HERE . On some Forth systems,
everything is allocated in one space: dictionary entries, variables,
and code. On other Forth systems there may be several memory spaces.

Andrew.
Coos Haak
2013-01-25 18:48:00 UTC
Permalink
Post by Andrew Haley
Post by AKE
Not sure why that got posted suddenly.
What has me puzzled is why the 'create' word advances the heap pointer to the next cell.
My understanding from the documentation is that the 'here' word
points to the NEXT unallocated cell of memory.
In which case, in principle, 'create x' could simply assign x the
value of here. But instead it seems to advance the here pointer one
cell ahead, set x to this cell, and sets the value of the cell it
just left behind to 49 (decimal).
CREATE may or may not use memory at HERE . On some Forth systems,
everything is allocated in one space: dictionary entries, variables,
and code. On other Forth systems there may be several memory spaces.
Andrew.
e.g.:

here . 22616 ok
: greet ." hello, world!" cr ; ok
here . 22622 ok
create data ok
here . 22626 ok

I'm pretty sure my CHForth (16 bits) is standard.
--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
AKE
2013-01-24 21:56:29 UTC
Permalink
I'm using Win32Forth.
Brad Eckert
2013-01-25 16:41:46 UTC
Permalink
Post by AKE
here . -- 289916
create x x -- 289920
here . -- 289920
x . -- 289920
In most Forths, the dictionary is one big block of memory. We call it the dictionary, not the heap. Heap may be appropriate for other languages: a place where you throw all your sh*t.

"create x" compiled a header for x into the dictionary, and that header took 4 bytes of space. The header is a data structure that allows Forth to find "x". Code for x may also be compiled into the dictionary. Some Forths separate the dictionary into code, data and header spaces that cause "create x" not to alter HERE, but you can't count on that behavior. Well you can, but that's a dependency.
AKE
2013-01-25 16:53:36 UTC
Permalink
Post by Brad Eckert
"create x" compiled a header for x into the dictionary, and that header took 4 bytes of space. The header is a data structure that allows Forth to find "x".
So why is that the header for create x 4 allot is the same as the header for create y 4 allot? Both headers appear to be 49 (decimal number). Surely Forth needs more than just 49 to distinguish between finding x and y?
Coos Haak
2013-01-25 18:54:03 UTC
Permalink
Post by AKE
Post by Brad Eckert
"create x" compiled a header for x into the dictionary, and that header took 4 bytes of space. The header is a data structure that allows Forth to find "x".
So why is that the header for create x 4 allot is the same as the header for create y 4 allot? Both headers appear to be 49 (decimal number). Surely Forth needs more than just 49 to distinguish between finding x and y?
create x here cell- dup . ? 4441964 4198431 ok
create y here cell- dup . ? 4441968 4198431 ok

So here only a pointer (called DOVAR, try see DOVAR) is compiled in the
space that is controlled by HERE. The headers are in a different region.
--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html
Continue reading on narkive:
Loading...