Discussion:
copying and pasting from pdf to Forth
(too old to reply)
Krishna Myneni
2024-07-01 10:38:23 UTC
Permalink
Yesterday, I learned a good lesson to not copy and paste text from a pdf
into the Forth environment. There can be hidden characters when doing
so, and then a word fails because the input isn't correct.

For an hour or so I was chasing down an imaginary bug in the
(non-standard) word NUMBER? used to convert a counted string into a
signed double length number.

--
Krishna Myneni
mhx
2024-07-01 11:47:24 UTC
Permalink
Krishna Myneni wrote:
[..]
Post by Krishna Myneni
For an hour or so I was chasing down an imaginary bug in the
(non-standard) word NUMBER? used to convert a counted string into a
signed double length number.
I know about '-', and single/double quote characters. They are a
nuisance not only in Forth.

Was it something else?

-marcel
Krishna Myneni
2024-07-01 23:33:58 UTC
Permalink
Post by mhx
[..]
Post by Krishna Myneni
For an hour or so I was chasing down an imaginary bug in the
(non-standard) word NUMBER? used to convert a counted string into a
signed double length number.
I know about '-', and single/double quote characters. They are a
nuisance not only in Forth.
Was it something else?
-marcel
It was a large number containing commas. I deleted the commas.

The text is pasted here:

−170,141,183,460,469,231,731,687,303,715,884,105,728

It appears to have pasted properly, but there is a difference between
Line 1 and Line 2. The latter is entered by hand.
\ Line 1
c" −170141183460469231731687303715884105728" NUMBER? .s

0
0
0
ok
drop 2drop \ Now enter by hand
ok
\ Line 2
c" -170141183460469231731687303715884105728" NUMBER? .s

-1
-9223372036854775808
0

ok

From my newsreader, I can copy Line 1 into my Forth environment and
reproduce the error. When I copy Line 2 which is entered by hand,
NUMBER? works as expected.

--
KM
Krishna Myneni
2024-07-01 23:43:09 UTC
Permalink
Post by Krishna Myneni
Post by mhx
[..]
Post by Krishna Myneni
For an hour or so I was chasing down an imaginary bug in the
(non-standard) word NUMBER? used to convert a counted string into a
signed double length number.
I know about '-', and single/double quote characters. They are a
nuisance not only in Forth.
Was it something else?
-marcel
It was a large number containing commas. I deleted the commas.
−170,141,183,460,469,231,731,687,303,715,884,105,728
It appears to have pasted properly, but there is a difference between
Line 1 and Line 2. The latter is entered by hand.
\ Line 1
c" −170141183460469231731687303715884105728" NUMBER? .s
        0
        0
        0
 ok
drop 2drop  \ Now enter by hand
 ok
\ Line 2
c" -170141183460469231731687303715884105728" NUMBER? .s
Ok, when I apply COUNT I see there is a 2 character difference between
lines 1 and 2. If you look closely, the minus sign is different between
the two! The first line must have copied a UTF-8 encoded unicode
character for the minus sign.

--
KM
sjack
2024-07-01 12:13:36 UTC
Permalink
Post by Krishna Myneni
Yesterday, I learned a good lesson to not copy and paste text from a pdf
into the Forth environment. There can be hidden characters when doing
so, and then a word fails because the input isn't correct.
I've encountered that problem often enough over many years to
get a feel for it as when action doesn't match logic.
Vim has a ':list' option that will display non-printiables
and for block files just re-write the suspect line. Also
have ADUMP that prints text and displays any non-printable
as '^nn', where nn was the value of the non-printable.
--
me
Krishna Myneni
2024-07-01 23:45:10 UTC
Permalink
Post by sjack
Post by Krishna Myneni
Yesterday, I learned a good lesson to not copy and paste text from a pdf
into the Forth environment. There can be hidden characters when doing
so, and then a word fails because the input isn't correct.
I've encountered that problem often enough over many years to
get a feel for it as when action doesn't match logic.
Vim has a ':list' option that will display non-printiables
and for block files just re-write the suspect line. Also
have ADUMP that prints text and displays any non-printable
as '^nn', where nn was the value of the non-printable.
No display difference when I used :SET LIST in Vim. But I found the
issue was the difference between a unicode minus sign and an ASCII minus
sign -- see above.

--
KM
mhx
2024-07-02 08:03:36 UTC
Permalink
iForth silently drops non-ASCII :

( mouse copy and paste followed by keyboard ENTER )
FORTH> 17 ok
[1]FORTH> . 17 ok
FORTH>

Octave is more explicit:

octave:1> −17
error: parse error:

invalid character '�' (ASCII 226)
−17
^
octave:1>

-marcel
sjack
2024-07-02 14:29:14 UTC
Permalink
Post by Krishna Myneni
No display difference when I used :SET LIST in Vim. But I found the
issue was the difference between a unicode minus sign and an ASCII minus
sign -- see above.
Hard to believe Vim :SET LIST wouldn't catch that. But I use codepage
KOI8-R so that wouldn't show up for me. I'll have to try Vim on unicode
terminal.
But my ADUMP would catch it for sure; it's showing the values Forth sees.

Ok, tried on unicode terminal and the PDF I used had the correct value,
45, for minus after copy to text file that I loaded with Forth.
(Vim ga over the minus also showed 45). Also if I type minus in
Vim on text file and it had the correct value. So your particular
PDF file was the culprit; so yes, all be warned.
--
me
Krishna Myneni
2024-07-03 00:49:43 UTC
Permalink
Post by sjack
Post by Krishna Myneni
No display difference when I used :SET LIST in Vim. But I found the
issue was the difference between a unicode minus sign and an ASCII minus
sign -- see above.
Hard to believe Vim :SET LIST wouldn't catch that. But I use codepage
KOI8-R so that wouldn't show up for me. I'll have to try Vim on unicode
terminal.
But my ADUMP would catch it for sure; it's showing the values Forth sees.
Ok, tried on unicode terminal and the PDF I used had the correct value,
45, for minus after copy to text file that I loaded with Forth.
(Vim ga over the minus also showed 45). Also if I type minus in
Vim on text file and it had the correct value. So your particular
PDF file was the culprit; so yes, all be warned.
Forth is a good environment for troubleshooting this -- just paste the
text after S" followed by a space and close the quote, then perform DUMP
to see the hex codes of all the characters. Many implementations of DUMP
also show the printable characters.

About the PDF file: it was one I created (the User's manual for
kForth-64); now I specifically remember changing the ASCII minus sign to
Unicode minus so that it would be more readable! I had forgotten about that.

--
Krishna

Loading...