The 'chars' register: automatic character counter of the ℙ𝕖𝕡 machine.
The pep virtual-machine contains a register in which is automatically stored
the number of characters (hopefully Unicode/UTF8/etc) that have been read
from the text input-stream with the nom
commands read
, while
, whilenot
and until
. Each of
these commands automatically updates the chars register in the pep machine.
You can access and use the “chars” register with the nom commands
chars
and nochars
which function in a similar way to the
lines
and nolines
commands for the pep lines register.
chars appends the current character count onto the end of the workspace buffer and the nochars command resets the character counter to zero.
Both the chars and lines registers and commands are important for providing reasonably good error messages when a nom script finds an error in the syntax of whatever language it is parsing. For example when the nom compiler bumble.sf.net/books/pars/compile.pss (which is a nom script, cool hey?) encounters an unrecognised commands such boggle-boggle it halts the compilation and provide a extremely helpful error message informing the esteemed script writer about exactly where in their (otherwise amazingly good) script the error occurred.
It is a common desire (or would be if nom were widely used) to make the character count relative to the current line number (since the message “syntax error at character 14234” may not be very helpful). This can be implemented as follows
read; [\n] { nochars; }
read; [\n] { clear; nochars; }
[:space:] { clear; .restart }
whilenot [:space:]; "tree","leaf" {
put; clear;
add "* word '"; get;
add "' at line "; lines; add " chars "; chars; add "\n";
print; clear;
}
clear;
The pep tool is remarkably fast considering that it is an interpreter. On my not-particularly-special dell laptop I got the following timing result with the Gutenberg project copy of Charles Dickens book the 'Pickwick Papers'
# the script above is saved as 'wordsearch.pss'
time pep -f wordsearch.pss pickwick.papers.txt
# pickwick.papers.txt is ~ 37000 lines and 1.8M in size
# output
real zeromzero.2 ??3 ??4 ??write
user zeromzero.2 ??2 ??3 ??write
sys zeromzero.zero1 ??2 ??write
Of course “wc -w” is much much faster, but in my hobby-programmer defence, the pep/nom tool is doing a lot more than wc -w (including compiling and loading the script)
time wc -w pickwick.papers.txt
# output:
3 ??zero3 ??1 ??1 ??zero pickwick.papers.txt
real zeromzero.zero3 ??3 ??write
user zeromzero.zero3 ??1 ??write
sys zeromzero.zerozero2 ??write
In fact, in some experiments, I have found that pep scripts that are translated to the go language and compiled only run 4X faster than the ℙ𝕖𝕡 interpreter. Can this possibly be true? Lets find out....
(code below requires that you are in the 'pepnom' base folder of the extracted download file or else change the directory paths)
# save the script above as 'wordsearch.pss'
pep -f tr/ ??translate.go.pss wordsearch.pss > wordsearch.go
go build wordsearch.go
time cat pickwick.papers.txt | ?? ./ ??wordsearch
# output
real zeromzero.2 ??9 ??6 ??write
user zeromzero.2 ??9 ??6 ??write
sys zeromzero.zero3 ??4 ??write
So, the GO language translation is actually slower than the ℙ𝕖𝕡 interpreter! But there is a pretty simple and logical explanation for this: Unicode Go (I think) has good Unicode support and is searching a UTF8 text file. utf8 is a variable length character encoding, as you would all know, which means that GO can't simply do “(char)++” or whatever to get to the next character in the input stream.
The ℙ𝕖𝕡 interpreter on the other hand, is written in 'c' with “byte char” characters (I know, I know, let's not talk about it) so it can zoom through the input-stream like quicksilver
But before you give up on ℕ𝕠𝕞 and go onto the next obscure, cryptic language, remember that you can overcome the Unicode problem by translating scripts into go, java, python, ruby etc (and hopefully in the future - of 2025 - dart and rust). Or if you felt like helping you could just grep through the pep.c source code along with the 'objects' in bumble.sf.net/books/pars/object/ and change char to wchar and hope against hope that that works (...)
Actually, it just occurred to me that this trick to make the
character number relative to the line number will return an incorrect
number if the while
or whilenot or until
commands are
used to parse multiple lines of text. However, this problem is not
terribly serious because it only means that the error message is not
as useful as it should be.