F*/ (f-star-slash)

Discussion:

F*/ (f-star-slash)

(too old to reply)

Krishna Myneni

2024-05-19 21:28:12 UTC

The analog of */ for single length integers, with intermediate double
length result, can be implemented for floating point arithmetic as well.
The following implementation of F*/ uses the non-standard word, UD/MOD,
although it might be possible to code it with UM/MOD as well. It also
requires FP@ (requiring a separate fp stack in memory).

How is F*/ different from using F* and F/ sequentially?

Consider the following cases:

1) 1.0e300 -3.0e259 F* 1.0e280 F/

2) 1.0e-300 3.0e-259 F* 1.0e-280 F/

3) 1.0e302 -3.0e302 F* DP_+INF F/

4) -1.0e-302 -1.0e-302 F* 0.0e0 F/

(DP_+INF is the double-precision IEEE-754 representation of plus infinity).

The first two cases, on a double-precision FP system, result in -inf and
zero, respectively.

Cases 3 and 4 result in NaN (not a number).

A proper implementation of F*/ can evaluate the result to a number for
cases 1 and 2, recognizing that the final result is representable as a
double-precision number. For cases 3 and 4 F*/ recognizes that the final
results are signed zero and +INF. F*/ represents the intermediate result
of the product with a wider number representation.

The code below, which requires a separate FP stack in memory, has been
tested on kForth-64 and Gforth. Gforth appears to have a problem
representing signed zero, although it does represent plus and minus
infinity.

1.0e302 -3.0e302 DP_+INF f*/ fs. 0.00000000000000E0 ok

( should be minus zero in the above case).

The factor of _F*/ was done on purpose, to be used as a basis for higher
precision calculations.

The tests do not include subnormal number testing or for NANs at the
present time.

--
Krishna Myneni

=== start of fstarslash.4th ===
\ fstarslash.4th
\
\ Implementation of F*/ on 64-bit Forth
\
\ Krishna Myneni, 2024-05-19
\
\ F*/ ( F: r1 r2 r3 -- r )
\ Multiply two IEEE-754 double precision floating point
\ numbers r1 and r2 and divide by r3 to obtain the result,
\ r. The intermediate product significand, r1*r2, is
\ represented with at least IEEE quad precision for both
\ its significand and exponent.
\
\ F*/ uses an intermediate significand represented by
\ 128 bits
\

include ans-words

1 cells 8 < [IF]
cr .( Requires 64-bit Forth system ) cr
ABORT
[THEN]

fvariable temp
: >fpstack ( u -- ) ( F: -- r )
temp ! temp f@ ;

base @ hex
8000000000000000 constant DP_SIGNBIT_MASK
000FFFFFFFFFFFFF constant DP_FRACTION_MASK
7FF0000000000000 constant DP_EXPONENT_MASK
\ DP special value constants
7FF0000000000001 >fpstack fconstant DP_NAN
FFF0000000000000 >fpstack fconstant DP_-INF
7FF0000000000000 >fpstack fconstant DP_+INF
base !

52 constant DP_FRACTION_BITS
1023 constant DP_EXPONENT_BIAS
2047 constant DP_EXPONENT_MAX

1 DP_FRACTION_BITS LSHIFT constant DP_IMPLIED_BIT
63 constant DP_SIGN_BIT

: dp-fraction ( F: r -- r ) ( -- u )
FP@ @ DP_FRACTION_MASK and ;
: dp-exponent ( F: r -- r ) ( -- e )
FP@ @ DP_EXPONENT_MASK and DP_FRACTION_BITS rshift ;
: dp-sign ( F: r -- r ) ( -- sign )
FP@ @ DP_SIGN_BIT rshift ;

: _isZero? ( ufrac uexp -- b ) 0= swap 0= and ;
: _isInf? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0= and ;
: _isNaN? ( ufrac uexp -- b ) DP_EXPONENT_MAX = swap 0<> and ;
: _isSubNormal? ( ufrac uexp -- b ) 0= swap 0<> and ;

: frac>significand ( u -- u' ) DP_IMPLIED_BIT or ;

\ Show the significand, unbiased base-2 exponent, and sign bit
\ for a double-precision floating point number. To recover the
\ dp floating point number, r, from these fields,
\
\ r = (-1)^sign * significand * 2^(exponent - 52)
\
: dp-components. ( F: r -- )
dp-sign
dp-exponent
dp-fraction
2dup D0= invert IF frac>significand THEN
swap DP_EXPONENT_BIAS - swap
fdrop
." significand = " 18 u.r 2 spaces
." exponent = " 4 .r 2 spaces
." sign = " .
;

: dp-normal-fraction ( u -- expinc u')
dup DP_FRACTION_BITS 1+ rshift
dup IF
tuck rshift
ELSE
drop -1 swap \ -1 u
BEGIN
1 lshift
dup DP_IMPLIED_BIT and 0=
WHILE
swap 1- swap
REPEAT
THEN
;

0 value r1_sign
0 value r2_sign
0 value r3_sign
0 value num_sign
variable r1_exp
variable r2_exp
variable r3_exp
variable r_exp
variable r1_frac
variable r2_frac
variable r3_frac

: get-arg-components ( F: r1 r2 r3 -- )
dp-fraction r3_frac !
dp-exponent r3_exp !
dp-sign to r3_sign
fdrop
dp-fraction r2_frac !
dp-exponent r2_exp !
dp-sign to r2_sign
fdrop
dp-fraction r1_frac !
dp-exponent r1_exp !
dp-sign to r1_sign
fdrop ;

0 value b_num_zero
0 value b_den_zero
0 value b_num_Inf
0 value b_den_Inf

variable remainder

: _F*/ ( F: r1 r2 r3 -- ) ( -- usign uexponent udfraction true )
\ or ( F: -- DP_NAN | +/- DP_INF ) ( -- false )
get-arg-components

\ Check for NaN in args
r1_frac @ r1_exp @ _IsNaN?
r2_frac @ r2_exp @ _IsNaN? or
r3_frac @ r3_exp @ _IsNaN? or IF DP_NAN FALSE EXIT THEN

r1_frac @ r1_exp @ _IsZero?
r2_frac @ r2_exp @ _IsZero? or to b_num_zero
r3_frac @ r3_exp @ _IsZero? to b_den_zero

b_num_zero b_den_zero and IF DP_NAN FALSE EXIT THEN

r1_sign r2_sign xor to num_sign

r1_frac @ r1_exp @ _IsInf?
r2_frac @ r2_exp @ _IsInf? or to b_num_Inf
r3_frac @ r3_exp @ _IsInf? to b_den_Inf

b_num_Inf b_den_Inf and IF DP_NAN FALSE EXIT THEN

b_num_zero b_den_Inf or IF \ return signed zero
num_sign r3_sign xor
0
0 s>d TRUE EXIT
THEN

b_den_zero b_num_Inf or IF \ return signed Inf
DP_+INF
num_sign r3_sign xor IF FNEGATE THEN
FALSE EXIT
THEN

num_sign r3_sign xor
r1_exp @ r2_exp @ + r3_exp @ -
r1_frac @ frac>significand
r2_frac @ frac>significand um*
r3_frac @ frac>significand ud/mod
rot remainder !
TRUE
;

variable uhigh

: F*/ ( F: r1 r2 r3 -- r )
_F*/ \ ( -- usign uexp udfrac true ) or
\ ( -- false ) ( F: NAN | +/-INF )
IF
2 pick 0 DP_EXPONENT_MAX WITHIN 0= IF
-43 throw
THEN
dup IF \ high 64 bits of the quotient
uhigh !
-43 throw
ELSE
drop
2dup D0= IF \ ufrac = 0 and uexp 0
drop \ usign uexp
ELSE
\ usign
\ quotient: normal double precision fraction
dp-normal-fraction \ usign uexp expinc uquot

r + DP_FRACTION_BITS lshift r> or

THEN
swap DP_SIGN_BIT lshift or

fpstack

Krishna Myneni

2024-05-19 21:44:30 UTC