Krishna Myneni
2024-05-19 21:28:12 UTC
The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as well.
The following implementation of F*/ uses the non-standard word, UD/MOD,
although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).
How is F*/ different from using F* and F/ sequentially?
Consider the following cases:
1) 1.0e300 -3.0e259 F* 1.0e280 F/
2) 1.0e-300 3.0e-259 F* 1.0e-280 F/
3) 1.0e302 -3.0e302 F* DP_+INF F/
4) -1.0e-302 -1.0e-302 F* 0.0e0 F/
(DP_+INF is the double-precision IEEE-754 representation of plus infinity).
The first two cases, on a double-precision FP system, result in -inf and
zero, respectively.
Cases 3 and 4 result in NaN (not a number).
A proper implementation of F*/ can evaluate the result to a number for
cases 1 and 2, recognizing that the final result is representable as a
double-precision number. For cases 3 and 4 F*/ recognizes that the final
results are signed zero and +INF. F*/ represents the intermediate result
of the product with a wider number representation.
The code below, which requires a separate FP stack in memory, has been
tested on kForth-64 and Gforth. Gforth appears to have a problem
representing signed zero, although it does represent plus and minus
infinity.
1.0e302 -3.0e302 DP_+INF f*/ fs. 0.00000000000000E0 ok
( should be minus zero in the above case).
The factor of _F*/ was done on purpose, to be used as a basis for higher
precision calculations.
The tests do not include subnormal number testing or for NANs at the
present time.
--
Krishna Myneni
=== start of fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\ 128 bits
\
include ans-words
1 cells 8 < [IF]
cr .( Requires 64-bit Forth system ) cr
ABORT
[THEN]
fvariable temp
: >fpstack ( u -- ) ( F: -- r )
temp ! temp f@ ;
base @ hex
8000000000000000 constant DP_SIGNBIT_MASK
000FFFFFFFFFFFFF constant DP_FRACTION_MASK
7FF0000000000000 constant DP_EXPONENT_MASK
\ DP special value constants
7FF0000000000001 >fpstack fconstant DP_NAN
FFF0000000000000 >fpstack fconstant DP_-INF
7FF0000000000000 >fpstack fconstant DP_+INF
base !
52 constant DP_FRACTION_BITS
1023 constant DP_EXPONENT_BIAS
2047 constant DP_EXPONENT_MAX
1 DP_FRACTION_BITS LSHIFT constant DP_IMPLIED_BIT
63 constant DP_SIGN_BIT
: dp-fraction ( F: r -- r ) ( -- u )
FP@ @ DP_FRACTION_MASK and ;
: dp-exponent ( F: r -- r ) ( -- e )
FP@ @ DP_EXPONENT_MASK and DP_FRACTION_BITS rshift ;
: dp-sign ( F: r -- r ) ( -- sign )
FP@ @ DP_SIGN_BIT rshift ;
: _isZero? ( ufrac uexp -- b ) 0= swap 0= and ;
: _isInf? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0= and ;
: _isNaN? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0<> and ;
: _isSubNormal? ( ufrac uexp -- b ) 0= swap 0<> and ;
: frac>significand ( u -- u' ) DP_IMPLIED_BIT or ;
\ Show the significand, unbiased base-2 exponent, and sign bit
\ for a double-precision floating point number. To recover the
\ dp floating point number, r, from these fields,
\
\ r = (-1)^sign * significand * 2^(exponent - 52)
\
: dp-components. ( F: r -- )
dp-sign
dp-exponent
dp-fraction
2dup D0= invert IF frac>significand THEN
swap DP_EXPONENT_BIAS - swap
fdrop
." significand = " 18 u.r 2 spaces
." exponent = " 4 .r 2 spaces
." sign = " .
;
: dp-normal-fraction ( u -- expinc u')
dup DP_FRACTION_BITS 1+ rshift
dup IF
tuck rshift
ELSE
drop -1 swap \ -1 u
BEGIN
1 lshift
dup DP_IMPLIED_BIT and 0=
WHILE
swap 1- swap
REPEAT
THEN
;
0 value r1_sign
0 value r2_sign
0 value r3_sign
0 value num_sign
variable r1_exp
variable r2_exp
variable r3_exp
variable r_exp
variable r1_frac
variable r2_frac
variable r3_frac
: get-arg-components ( F: r1 r2 r3 -- )
dp-fraction r3_frac !
dp-exponent r3_exp !
dp-sign to r3_sign
fdrop
dp-fraction r2_frac !
dp-exponent r2_exp !
dp-sign to r2_sign
fdrop
dp-fraction r1_frac !
dp-exponent r1_exp !
dp-sign to r1_sign
fdrop ;
0 value b_num_zero
0 value b_den_zero
0 value b_num_Inf
0 value b_den_Inf
variable remainder
: _F*/ ( F: r1 r2 r3 -- ) ( -- usign uexponent udfraction true )
\ or ( F: -- DP_NAN | +/- DP_INF ) ( -- false )
get-arg-components
\ Check for NaN in args
r1_frac @ r1_exp @ _IsNaN?
r2_frac @ r2_exp @ _IsNaN? or
r3_frac @ r3_exp @ _IsNaN? or IF DP_NAN FALSE EXIT THEN
r1_frac @ r1_exp @ _IsZero?
r2_frac @ r2_exp @ _IsZero? or to b_num_zero
r3_frac @ r3_exp @ _IsZero? to b_den_zero
b_num_zero b_den_zero and IF DP_NAN FALSE EXIT THEN
r1_sign r2_sign xor to num_sign
r1_frac @ r1_exp @ _IsInf?
r2_frac @ r2_exp @ _IsInf? or to b_num_Inf
r3_frac @ r3_exp @ _IsInf? to b_den_Inf
b_num_Inf b_den_Inf and IF DP_NAN FALSE EXIT THEN
b_num_zero b_den_Inf or IF \ return signed zero
num_sign r3_sign xor
0
0 s>d TRUE EXIT
THEN
b_den_zero b_num_Inf or IF \ return signed Inf
DP_+INF
num_sign r3_sign xor IF FNEGATE THEN
FALSE EXIT
THEN
num_sign r3_sign xor
r1_exp @ r2_exp @ + r3_exp @ -
r1_frac @ frac>significand
r2_frac @ frac>significand um*
r3_frac @ frac>significand ud/mod
rot remainder !
TRUE
;
variable uhigh
: F*/ ( F: r1 r2 r3 -- r )
_F*/ \ ( -- usign uexp udfrac true ) or
\ ( -- false ) ( F: NAN | +/-INF )
IF
2 pick 0 DP_EXPONENT_MAX WITHIN 0= IF
-43 throw
THEN
dup IF \ high 64 bits of the quotient
uhigh !
-43 throw
ELSE
drop
2dup D0= IF \ ufrac = 0 and uexp 0
drop \ usign uexp
ELSE
\ usign
\ quotient: normal double precision fraction
dp-normal-fraction \ usign uexp expinc uquot
swap DP_SIGN_BIT lshift or
THEN
;
\ Tests
1 [IF]
[UNDEFINED] T{ [IF] include ttester.4th [THEN]
1e-17 rel-near f!
1e-17 abs-near f!
TESTING F*/
\ These results are different using F* and F/
t{ 1.0e300 -3.0e259 1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e-300 3.0e-259 1.0e-280 f*/ -> 3.0e-279 }t
t{ 1.0e300 -3.0e259 -1.0e280 f*/ -> 3.0e279 }t
t{ -1.0e300 -3.0e259 -1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e302 -3.0e302 DP_+INF f*/ -> -0.0e0 }t
t{ -1.0e-302 -1.0e-302 0.0e0 f*/ -> DP_+INF }t
\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
[THEN]
length result, can be implemented for floating point arithmetic as well.
The following implementation of F*/ uses the non-standard word, UD/MOD,
although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).
How is F*/ different from using F* and F/ sequentially?
Consider the following cases:
1) 1.0e300 -3.0e259 F* 1.0e280 F/
2) 1.0e-300 3.0e-259 F* 1.0e-280 F/
3) 1.0e302 -3.0e302 F* DP_+INF F/
4) -1.0e-302 -1.0e-302 F* 0.0e0 F/
(DP_+INF is the double-precision IEEE-754 representation of plus infinity).
The first two cases, on a double-precision FP system, result in -inf and
zero, respectively.
Cases 3 and 4 result in NaN (not a number).
A proper implementation of F*/ can evaluate the result to a number for
cases 1 and 2, recognizing that the final result is representable as a
double-precision number. For cases 3 and 4 F*/ recognizes that the final
results are signed zero and +INF. F*/ represents the intermediate result
of the product with a wider number representation.
The code below, which requires a separate FP stack in memory, has been
tested on kForth-64 and Gforth. Gforth appears to have a problem
representing signed zero, although it does represent plus and minus
infinity.
1.0e302 -3.0e302 DP_+INF f*/ fs. 0.00000000000000E0 ok
( should be minus zero in the above case).
The factor of _F*/ was done on purpose, to be used as a basis for higher
precision calculations.
The tests do not include subnormal number testing or for NANs at the
present time.
--
Krishna Myneni
=== start of fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\ 128 bits
\
include ans-words
1 cells 8 < [IF]
cr .( Requires 64-bit Forth system ) cr
ABORT
[THEN]
fvariable temp
: >fpstack ( u -- ) ( F: -- r )
temp ! temp f@ ;
base @ hex
8000000000000000 constant DP_SIGNBIT_MASK
000FFFFFFFFFFFFF constant DP_FRACTION_MASK
7FF0000000000000 constant DP_EXPONENT_MASK
\ DP special value constants
7FF0000000000001 >fpstack fconstant DP_NAN
FFF0000000000000 >fpstack fconstant DP_-INF
7FF0000000000000 >fpstack fconstant DP_+INF
base !
52 constant DP_FRACTION_BITS
1023 constant DP_EXPONENT_BIAS
2047 constant DP_EXPONENT_MAX
1 DP_FRACTION_BITS LSHIFT constant DP_IMPLIED_BIT
63 constant DP_SIGN_BIT
: dp-fraction ( F: r -- r ) ( -- u )
FP@ @ DP_FRACTION_MASK and ;
: dp-exponent ( F: r -- r ) ( -- e )
FP@ @ DP_EXPONENT_MASK and DP_FRACTION_BITS rshift ;
: dp-sign ( F: r -- r ) ( -- sign )
FP@ @ DP_SIGN_BIT rshift ;
: _isZero? ( ufrac uexp -- b ) 0= swap 0= and ;
: _isInf? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0= and ;
: _isNaN? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0<> and ;
: _isSubNormal? ( ufrac uexp -- b ) 0= swap 0<> and ;
: frac>significand ( u -- u' ) DP_IMPLIED_BIT or ;
\ Show the significand, unbiased base-2 exponent, and sign bit
\ for a double-precision floating point number. To recover the
\ dp floating point number, r, from these fields,
\
\ r = (-1)^sign * significand * 2^(exponent - 52)
\
: dp-components. ( F: r -- )
dp-sign
dp-exponent
dp-fraction
2dup D0= invert IF frac>significand THEN
swap DP_EXPONENT_BIAS - swap
fdrop
." significand = " 18 u.r 2 spaces
." exponent = " 4 .r 2 spaces
." sign = " .
;
: dp-normal-fraction ( u -- expinc u')
dup DP_FRACTION_BITS 1+ rshift
dup IF
tuck rshift
ELSE
drop -1 swap \ -1 u
BEGIN
1 lshift
dup DP_IMPLIED_BIT and 0=
WHILE
swap 1- swap
REPEAT
THEN
;
0 value r1_sign
0 value r2_sign
0 value r3_sign
0 value num_sign
variable r1_exp
variable r2_exp
variable r3_exp
variable r_exp
variable r1_frac
variable r2_frac
variable r3_frac
: get-arg-components ( F: r1 r2 r3 -- )
dp-fraction r3_frac !
dp-exponent r3_exp !
dp-sign to r3_sign
fdrop
dp-fraction r2_frac !
dp-exponent r2_exp !
dp-sign to r2_sign
fdrop
dp-fraction r1_frac !
dp-exponent r1_exp !
dp-sign to r1_sign
fdrop ;
0 value b_num_zero
0 value b_den_zero
0 value b_num_Inf
0 value b_den_Inf
variable remainder
: _F*/ ( F: r1 r2 r3 -- ) ( -- usign uexponent udfraction true )
\ or ( F: -- DP_NAN | +/- DP_INF ) ( -- false )
get-arg-components
\ Check for NaN in args
r1_frac @ r1_exp @ _IsNaN?
r2_frac @ r2_exp @ _IsNaN? or
r3_frac @ r3_exp @ _IsNaN? or IF DP_NAN FALSE EXIT THEN
r1_frac @ r1_exp @ _IsZero?
r2_frac @ r2_exp @ _IsZero? or to b_num_zero
r3_frac @ r3_exp @ _IsZero? to b_den_zero
b_num_zero b_den_zero and IF DP_NAN FALSE EXIT THEN
r1_sign r2_sign xor to num_sign
r1_frac @ r1_exp @ _IsInf?
r2_frac @ r2_exp @ _IsInf? or to b_num_Inf
r3_frac @ r3_exp @ _IsInf? to b_den_Inf
b_num_Inf b_den_Inf and IF DP_NAN FALSE EXIT THEN
b_num_zero b_den_Inf or IF \ return signed zero
num_sign r3_sign xor
0
0 s>d TRUE EXIT
THEN
b_den_zero b_num_Inf or IF \ return signed Inf
DP_+INF
num_sign r3_sign xor IF FNEGATE THEN
FALSE EXIT
THEN
num_sign r3_sign xor
r1_exp @ r2_exp @ + r3_exp @ -
r1_frac @ frac>significand
r2_frac @ frac>significand um*
r3_frac @ frac>significand ud/mod
rot remainder !
TRUE
;
variable uhigh
: F*/ ( F: r1 r2 r3 -- r )
_F*/ \ ( -- usign uexp udfrac true ) or
\ ( -- false ) ( F: NAN | +/-INF )
IF
2 pick 0 DP_EXPONENT_MAX WITHIN 0= IF
-43 throw
THEN
dup IF \ high 64 bits of the quotient
uhigh !
-43 throw
ELSE
drop
2dup D0= IF \ ufrac = 0 and uexp 0
drop \ usign uexp
ELSE
\ usign
\ quotient: normal double precision fraction
dp-normal-fraction \ usign uexp expinc uquot
r + DP_FRACTION_BITS lshift r> or
THENswap DP_SIGN_BIT lshift or
fpstack
THENTHEN
;
\ Tests
1 [IF]
[UNDEFINED] T{ [IF] include ttester.4th [THEN]
1e-17 rel-near f!
1e-17 abs-near f!
TESTING F*/
\ These results are different using F* and F/
t{ 1.0e300 -3.0e259 1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e-300 3.0e-259 1.0e-280 f*/ -> 3.0e-279 }t
t{ 1.0e300 -3.0e259 -1.0e280 f*/ -> 3.0e279 }t
t{ -1.0e300 -3.0e259 -1.0e280 f*/ -> -3.0e279 }t
t{ 1.0e302 -3.0e302 DP_+INF f*/ -> -0.0e0 }t
t{ -1.0e-302 -1.0e-302 0.0e0 f*/ -> DP_+INF }t
\ The following should give same results as using F* and F/
t{ 0.0e0 3.0e302 1.0e-100 f*/ -> 0.0e0 }t
t{ 3.0e302 0.0e0 1.0e-100 f*/ -> 0.0e0 }t
t{ 1.0e302 -3.0e259 0.0e0 f*/ -> DP_-INF }t
t{ DP_+INF 3.0e300 1.0e0 f*/ -> DP_+INF }t
t{ DP_+INF 3.0e-300 -1.0e0 f*/ -> DP_-INF }t
[THEN]