A journal of work carried out on pep and nom.
Pauca sed Matura Carl Friedrich Gauss
Here I will just make notes about work I am carrying out on the ℙ𝕖𝕡 and ℕ𝕠𝕞 system. September 2019 seems to be the date that I finally had a decent implementation in 'c'. This file shows a long litany of programming work carried out, a lot of it in Colombia probably because time goes slower in that country.
Comments, suggestions and contributions to mjb at nomlang dot org
Doing work on the rust translator /tr/nom.torust.pss
Looking again at the remarkable work of fabrice bellard and his refreshingly simple html. Life is too short of html and css. But I do it anyway.
Had the idea that nom could also be used for binary files. Matching binary patterns it also useful. For example UTF8 text.
Also, I need to rethink the whole approach to Unicode and grapheme
clusters, which are sets of Unicode code points that amalgamate to
form one visible character (eg certain letters with accents).
This is a tricky issue. Nom should normally read
one cluster at
a time, not one Unicode code point at a time. But there may be
rare cases where we want to get one Unicode char at a time.
So the default behaviour of read
should be read one cluster
like the dart characters api. But there could be a variant read
command
that reads one Unicode character (Rune?) and also maybe
one byte char. The byte-read
command could be used for parsing
binary files....
Have been reading about PEG parsing expression grammars which seems to be what nom is good at parsing, with some caveats. Playing with more formats in /eg/text.tohtml.pss eg No 123 and maybe horizontal bar-charts. which could just go in lists.But you probably need a table to line up the starts of the bars.
- [bar:nom:42/100:%]
- [bar:lua:12/100:%]
- [bar:wren:54/100:%]
The % is a unit name and the 42/100 is a bar width. This number should be printed inside the bar at the end. “nom” lua etc are the labels printed before the bar. Need to calculate width in em or ex as below. But percent is relative to container.
/* calc(expression) */
calc(100% - 80px)
/* Expression with a CSS function */
calc(100px * sin(pi / 2))
/* Expression containing a variable */
calc(var(--hue) + 180)
/* Expression with color channels in relative colors */
lch(from aquamarine l c calc(h + 180))
<table class="chart">
<tr>
<th>wren</th><td>
<div class="chart-bar wren" style="width: 14%;">0.12s </div></td>
</tr>
<tr>
<th>luajit (-joff)</th><td>
<div class="chart-bar" style="width: 18%;">0.16s </div></td>
</tr>
<tr>
<th>ruby</th><td>
<div class="chart-bar" style="width: 23%;">0.20s </div></td>
</tr>
<tr>
<th>python3</th><td>
<div class="chart-bar" style="width: 91%;">0.78s </div></td>
</tr>
</table>
still working on the /eg/nom.todart.pss which is now working with about 3 commands and <stdin> Added No abbrevation to text.tohtml
The script /eg/nom.tolatex.pss is now working more or less ok. The colours are not great, but that is easy to change. I invented a new and interesting parsing technique for escaping special LATEX characters and styling each component of a ℕ𝕠𝕞 script.
Added some abbreviations to vim about translation to go, and also a shortcut in /eg/text.tohtml.pss for links to translation scripts like this go | java | javascript | ruby | python | tcl | c and also a list like this [nom:translation.list]
working on the /eg/nom.tolatex.pss script which has proved tricky because I couldn't use the listing package nor the minted package for code listings so I had to write my own formatter. which seems to be working.
The script /eg/nom.tolatex.notunicode.pss does print nice ℕ𝕠𝕞 code listings but can't handle any unicode characters in the source, which is silly.
Finished the script /eg/nom.tohtml.pss which prints colourised ℕ𝕠𝕞 in HTML using <span> tags. Also wrote /eg/nom.snippet.tohtml.pss which only pretty prints withing <code class='nom-lang'> tags. This allows it to work with /eg/text.tohtml.pss
I reformatted most of the documentation with the new pretty printed ℕ𝕠𝕞 example code. Also just plain file names that start with / get linked in html now. Also made a new codeblock syntax starting with ---+ and ending ,,, which is specifically for nom code.
Added unordered lists to eg/text.tohtml.pss
. The parsing was sort of
bizarre because there is no start token for an unordered list in my
quirky plain-text document format
Lists are just started with a list-item indicator which is a dash word
'-' which starts a line. Lists are terminated with a blank line or
the end of the document, so I reduce the lists in reverse: that is,
when I find the end of the list I add an endlist*
token and then
reduce all the item*text*endlist*
sequences and keep reparsing
until there are none left.
I was sort of surprised that it works, but it seems to.
list*
token live for a bit longer and
parse starline*list sequences, just like I do for block
quotations.
<:0:4:>>:/image/name.gif> or <name.gif>
Added equationset*
token to maths.tolatex.pss
This script is producing
really nice output from simple ascii arithmetic expressions (Unicode
symbolic expressions should work when the
ℕ𝕠𝕞 script is translated
to go or
JAVA etc. add more
LATEX symbols like greek letters.
Still don’t have derivative and partial derivative symbols.
pep -f eg/maths.tolatex.pss \
-i 'x == (-b :plusminus sqrt(b^2-4*a*c))/(2*a);' > test.tex
pdflatex test.tex;
# see below for rendering
pep -f eg/maths.tolatex.pss \
-i ' cuberoot(:Theta + x/(x+1))/(x^-0.5 + 1/2) '
# see below for rendering with pdflatex
made abbreviation code in eg/nom.syntax.reference.pss
much better. also
added a help system (but still need to write it)
Made a template script eg/nom.template.pss
which has an error and help
token and a parse-stack watch and some common parsing code
Looking at making some html colourized output. The css is in /site.blog.css I inspected elements at the wren.io site for ideas.
Added comments in documents (lines that start with #: ) in
eg/text.tohtml.pss
Yesterday, I created eg/maths.to.latex.pss
which transforms
formulas like x:=sqrt(x^2+y^2)/(2*x^-1.23) into really nice
LATEX formatted mathematics. I am impressed with my own
work. It works.
I also invented the following lookahead rule which does a lot of work. This is a positive grammar lookahead rule, because it reduces a set of token sequence if any of the sequences IS followed by ) or , or ; Because it is a positive rule I can combine several sequences into one rule.
e := e op.compare e | e op.and e | e op.or e
(LOOKAHEAD 1: ')' | ',' | ';' )
The bracketed expression means only perform the reductions if the following parse token - in this case a 'literal' value - is one of the 3 alternatives.
# when surrounded by brackets or terminated with ; or ,
# eg: (x^2 != y) or x && y,
B"expr*op.compare*expr*",B"expr*op.and*expr*",B"expr*op.or*expr*" {
!"expr*op.compare*expr*".!"expr*op.and*expr*".!"expr*op.or*expr*" {
E")*",E",*",E";*" {
replace "expr*op.and*expr*" "expr*";
replace "expr*op.or*expr*" "expr*";
replace "expr*op.compare*expr*" "expr*";
push;push;
# assemble attribs for new exp token,
--; --; add " "; get; ++; get; ++; get; add " "; --; --; put;
# transfer unknown token attrib
clear; ++; ++; ++; get; --; --; put; clear;
# realign tape pointer
++; .reparse
}
}
}
I have been reforming and developing the arithmetic expression
parser which is now called eg/maths.parse.pss
It now includes
good error handling and a help-text system (for explaining the
syntax of the expression parser). I also added an assignment operator
(:=) logic operators (&& || AND OR) comparison operators
(== != < <= > >= etc) and functions like sqrt(...)
It is now basically a template for how to write a language with
ℕ𝕠𝕞 and it also forms a reasonably big chunk of any kind of
computer language parser/compiler. It can be more or less
cut-and-paste into other scripts.
Things to do on the ℙ𝕖𝕡 and ℕ𝕠𝕞 system.
translate.perl.pss
using the new syntax at
eg/nom.syntax.reference.pss
nom.syntax.recognise.pss
based on script above
which just says “yes: nom syntax ok” or “no: etc” This script would
become the basis of an non-error checking nom parser.
eg/drawbasic.pss
text.tohtml.pss
eg/
to html with text.tohtml.pss
text.tolatex.pss
for printing a book
Starting to write the parser for a simple LOGO ish drawing language. The language is /eg/drawbasic.pss but I hope to change that name if the language gets good. â™›
Also, I was working today on /eg/nom.to.listing.pss which is supposed to be a simple precursor to a nom.to.html.pss html pretty-printer.
Have uploaded some solutions to problems at the www.rosettacode.org
site. The
ℕ𝕠𝕞 syntax checker nom.syntax.reference.pss is close
to complete. I will use this script as a "reference*
for what
the syntax of nom is and should be.
Need to think about the mark
and go
syntax. It might
be better to use the workspace value as the mark I am not sure.
I will no longer permit ridiculous ranges in classes
silly ranges: [\n-\f] I think in the
ℙ𝕖𝕡 interpreter these
are currently accepted, but what does it even mean? I will just
allow simple ranges like [a-g] Also, I need to remove '<' and
'>' as abbreviations for ++
and --
because they
clash with <eof> etc. Also, remove 'll' and 'cc' as aliases for
chars
and lines
.
Have been working on the /eg/nom.syntax.reference.pss which is a syntax
checker for the
ℕ𝕠𝕞 script language. I have made good progress
and have added some new parse tokens, such as statement*
and
statementset*
(for a list of statements). And command*
is now just
the command word like add
or push
and not the
whole statement. “eof” and “reparse” are parsed as the word*
grammar
token and reduced later to test*
and statement*
later.
Trying to write an nom error checking script without adapting
an existing translation script. So I am writing it
from scratch and in the process I am discovering strange things about
the existing grammar that I have been using. For example I use
'>' and '<' as abbreviations for ++
and --
but this
clashes with the <eof> syntax . So I will remove these abbreviations
and also the '+' '-' abbreviations for a+
and a-
because it is really silly to have 1 character abbreviations for
2 characters commands.
Also, I should have a command*
token for add
clear
upper
for example and then a statement*
token for
add "hi"; replace "x" ""; clear;
Also just use a word*
token for “eof” reparse etc. Things that are
not valid commands or tests but part of.
I made symlinks for the bumble.sf.net/books/pars/tr and bumble.sf.net/books/pars/eg folders on this site so that the example scripts and translators will also be available here. And I will create a document index for them.
Yesterday I wrote quite a large chunk of an XML parser (it is still in a documentation page). I was surprised how easy it was. I thought xml parsing was going to be difficult because it seemed to have multiple levels of nesting Firstly on an internal tag-level (a list of attributes withing the tag) etc.
I also discovered some new techniques for error checking and reporting. For
example: look for a parse-token which is at the end of a sub-pattern, in the
case of
XML an example is
>*
and />*
(literal tokens). These should always resolve to a
tag*
parse token, so if you check for this token followed by anything else
you will trap a lot of errors.
# a fragment
pop;pop;
B">*",B"/>*" {
# error, the tag* token didn't resolve or reduce as
# it should have.
}
push;push;
Also, I realised that you can just create the error message and then print it with line number etc at the end of the error block. I now favour putting all errors in a big block just after the parse label (although EOF errors will probably still need to go at the end of the script). Also, 2 token error checking seems to be the most useful in general. Also, it’s a good idea to have a list of tokens at the start of your script, and then just look at them to see which ones can follow others.
I have been doing a lot of work on the nomlang.org site (where this file is) including writing a quite useful BASH script which manages this website. This site nomlang.org is now the sort-of “home” of the ℙ𝕖𝕡 🙵 ℕ𝕠𝕞 system, or at least of all the documentation
While writing the primitive-but-good static site generator in BASH I also wrote a new ℕ𝕠𝕞 script to format the plain-text into HTML . This script is called eg/text.tohtml.pss and it works remarkably well as far as I can see. I actually started off with very humble aims for the script in fact I just wanted to do something like this
begin { add "<html><body>\n"; print; clear; }
until "\n";
replace "<" "<"; replace ">" ">"; replace "&" "&";
[:space:] { clear; add "<p>\n"; }
B"###" {
clop;clop;clop; put; clear;
add "<h3>"; get; add "</h3>\n";
}
B"##" {
clop;clop; put; clear;
add "<h3>"; get; add "</h2>\n";
}
B"#" {
clop; put; clear;
add "<h3>"; get; add "</h1>\n";
}
print; clear;
(eof) { add "</body></html>\n"; print; quit; }
In other words, it would just mark paragraphs and sort-of MARKDOWN headings. But it just grew and grew and it has been really successful because I can just add syntax to it willy-nilly and if it breaks I can easily fix it. So I will almost certainly never use eg/mark.html.pss again because it is hard to debug.
Wrote a nom script eg/bash.show.functions.pss which prints bash functions in a file and the comments above them.
I am revisiting this system after almost 3 years of not doing anything on it. It still seems like a remarkably new way of parsing and compiling and worth pursuing. I updated the website at nomlang.org and improved some example scripts (like bumble.sf.net/books/pars/eg/exp.tolisp.pss) and created a new text-to-html formatter bumble.sf.net/books/pars/eg/text.tohtml.pss which is much simpler to maintain than bumble.sf.net/books/pars/eg/mark.html.pss because it uses a less complex grammar.
Working on the mark.latex.pss
script which now supports most
syntax including images. The script is quite complex. It should
be strait-forward to translate it to other targets such as
“markdown", html, man [groff] etc.
”
Made a magical interpret() method in the perl translator which will allow running of scripts.
Working on a simplified grammar for tr/translate.perl.pss
which
I hope to use in all the translator scripts. So far so good. Also
introducing a new expression grammar for tests eg:
(B"a",B"b").E"z" { ... }
This allows mixing AND and OR logic in tests. Also, a nom script that extracts all unique tokens from a script would be useful.
Looking at ANTLR example grammars, for ideas of simple languages such as “logo",” abnf", BNF , “lambda",” tiny basic" Reforming grammars of the translators, writing good “unescape ” and “escape” functions that actually walk and transform the workspace string. Converting perl translator to a parse method Need an “esc” command to change the escape char in all translators. The perl translator is almost ready to be an interpreter.
Debugged the TCL translator- appears to be working well except for second generation scripts.
Current tasks: finish translators, perl/c++/rust/tcl
start translators: lisp/haskell/R (maybe)
Write a new command “until” with no arguments.(done in some translators)
Make the translators use a “run” or “parse” method, which
can read and write to a variety of sources.
Make the tape in object/pep.c
dynamically allocated.
See if begin { ++; } create space for a variable. And use this
strategy for variable scope.
Starting to create date-lists in eg/mark.latex.pss
to render lists
such as this one. Also, had the idea of a new test
F:file.txt:"int" { ... }This would test if the file “file.txt” contain a line starting with “int” and ending with “:” + workspace. This test would allow checking variable types and declarations. It would also allow better natural language parsing, because a list of nouns/adj/verbs etc could be stored in a simple text file and looked up.
int.global:x
int.fn:x
string.global:name
string.local:name
etc
F:name.txt: { ... }Would check the file
name.txt
for a line which begins with the tape
and ends with the workspace.
A lot of work on the Javascript translator tr/translate.js.pss
1st gen tests are working. Working on the rust translator and
the eg/sed.tojava.pss
translator.
New ideas: create a lisp parser, create a brainf***
compiler (done)
create a “commonmark” markdown translator. This should be
not too hard, using the ideas in bumble.sf.net/books/pars/eg/mark.latex.pss
will create a 'date list' format for mark.latex.pss
and mark.html.pss
Started a lisp parser eg/lisp.pss
Worked on eg/mark.latex.pss
which is now producing
reasonable pdf output (from .tex
via pdflatex). Also realised
that the accumulator could be used to simplify the grammar
by counting words.
Developed a SED to java script, “eg/sed.tojava.pss” which has progressed well. Still lacking branching commands and some other gnu sed extensions.
Wrote a simple
SED parser and formatter/explainer at
eg/sed.parse.pss
(commands a,i,c not parsed yet).
Some work on the javascript and perl translators.
Introducing an 'increment' method into the various machine classes in the target languages. This allows the 'tape' and 'marks' arrays to grow if required.
Looking at translation scripts. Changing tape and mark arrays to be dynamically growable in various target languages.
reviewing documentation, tidying.
Working on the pl/0 scripts. eg/plzero.pss
and eg/plzero.ruby.pss
eg/plzero.pss
now checks and formats a valid pl/0 program.
Working on the palindrome scripts eg/pal.words.pss
and
eg/palindrome.pss
. Both are working well and can be translated
to various languages (go, ruby, python, c, java)
I would like to add hyphen lists to mark.latex.pss
and date
lists (such as this one)
Go translator now working well. I would like to write a translator for the Kotlin, R (the statistical language), swift rust. The script function pep.tt (in helpers.pars.txt) greatly helps debugging translation scripts.
More progress. A number of the translation scripts are now quite bug free and can be tested with the helper function pep.tt <langname> This script also tests 2nd generation script translation, which is very useful where the original pep engine is not available (for example, on a server).
Continuing work. Starting many translation scripts such as
tr/translate.cpp.pss
and trying to debug and complete others.
working on tr/translate.c.pss
good progress. simple scripts translating
and compiling and running. Did not eliminate dependencies so that
scripts need to be compiled with libmachine.a
in the object/
folder.
tr/translate.ruby.pss
Should try to make a 'brew' package with ruby for pep.
gh.c
to pep.c
Made pep look for asm.pp
in the current folder or else in the
folder pointed to by the “ASMPP” environment variable.
Need to add “upper” lower and “cap” to the translation scripts
in pars/tr/
Things done:
gh.c
files etc).
machine.interp.c
(need to count preceding escape chars)
Need to fix the same in the translation scripts
Have made some more good progress over the last few days. Modified the script bumble.sf.net/books/pars/eg/json.check.pss so that it recognises all JSON numbers (scientific etc)
Fixed /books/pars/tr/translate.py.pss so that it can translate scripts as well as itself. Started to fix /books/pars/tr/translate.tcl.pss. Still have an infinite loop when .restart is translated, and this is a general problem with the “run-once” loop technique (for languages that don’t have labelled loops or goto statements, for implementing .reparse and .restart). The solution is a flag variable that gets set by .restart before the parse> label (see translate.ruby.pss)
The script eg/mark.latex.pss
is progressing well. It transforms a
markdown-ish format (like the current doc) into LaTeX. Need to do
lists/images/tables/dates
Having another look at this system. I still see enormous potential
for the system, but don’t know how to attract anyones attention!
I updated the eg/json.check.pss
script to provide helpful
error messages with line+character numbers. Also, that script
incorporates the scientific number format (crockford) in
eg/json.number.pss. However, Crockfords grammer for scientific numbers
seems much stricter than what is often allowed by json parsers
such as the “jq” utility.
I became distracted by a bootable x86 forth stack-machine system I was coding at /books/osdev/os.asm That was also interesting, and I had the idea of somehow combining it with this. Hopefully these ideas will come to fruition.
I think the best idea would be to edit the /books/pars/pars-book.txt document, generate a pdf, print it out, and send it to someone who might be interested. This parsing/compiling system is revolutionary (I think), but nobody knows about it!!
I have not done any work on this project since about August 2020 but the idea remains interesting. Finishing the “translate.c.pss” script would be good (done: sept 2021), make “translate.go.pss” for a more modern audience (done: sept 2021).
Working on the script “translate.c.pss” to create c code from a pep script. I may try to eliminate dependency files and include all the required structures and functions in the script. That should facilitate converting the output to wide chars “wchar".
”
translate.tcl.pss
translate.java.pss
....)
[done: the pep.tt function]
In the java translator , make the parse/compile script a method of the class, with the input stream as a parameter. So that the same method can be used to parse/compile a string, a file, or [stdin], among other things. (note: not yet done: march 2025)
This technique can be used for any language but is easier with languages that support data-structures/classes/objects.
Continuing to work on the scripts translate.py.pss
and translate.tcl.pss.
Had the idea to split the pars-book.txt
into separate
MAN pages just like
the
TCL system “man 3tcl string” etc. (could generate man pages from
the command documentation at nomlang.org/doc/commands/ )
Made great progress on the script translate.java.pss which could become a template for a whole set of scripts for translating to other languages.
Continuing to work on translate.java.pss
Still need to convert the push
pop
code and test and debug.
Many methods have been in-lined and the Machine class code
is now in the script.
Rethinking the translation scripts bumble.sf.net/books/pars/tr/translate.java.pss and bumble.sf.net/books/pars/tr/translate.js.pss These scripts can be greatly simplified. I will remove all trivial methods from the Machine object and use the script to emit code instead. Hopefully translate.java.pss will become a template for other similar scripts. Also, I will include the Machine object within the script output so that there will be no dependency on external code.
Wrote the script /books/pars/eg/json.number.pss which parses
and checks numbers in json scientific format (Eg -0.00012e+012)
This script can be included in the script eg/json.parse.pss
to
provide a reasonable complete json parser/checker.
Working on the script /books/pars/eg/mark.html.pss The script is working
reasonably well for transforming the pars-book.txt
file into html.
It can be run with:
pep -f eg/mark.html.pss pars-book.txt > pars-book.html
Cleaning up the files in the /books/pars/ folder tree. Renaming the executable to “pep” from “pp". I think” pep" will be the tools definitive name.
I will rename the tool and executable to “pep” which would stand for “parsing” engine for patterns". I think it is a better name than “pp” and only seems to conflict with “python enhancement process” in the unix/linux world.
Wrote a substantial part of the script /books/pars/eg/json.parse.pss which can parse and check the json file format. However, the parser is incomplete because at the moment it only accepts integer numbers. Recursive object and array parsing is working.
I will try to improve the mark.html.pss
“markdown” transform
script. I would still like to promote this parsing VM since
I think it is a good and original idea.
Did some work on mark.html.pss
Cleaned up memory leaks (with valgrind). Also some one-off errors and invalid read/writes. The double-free segmentation fault seems to be fixed. Still need to fix a couple of memory bugs in interpret() (one is in the UNTIL command).
Trying to clean up the pars-book.txt
file which is the primary
documentation file for the project.
Posted on comp.compilers and comp.lang.c
to see if anyone might
find this useful or interesting...
The implementation at bumble.sourceforge.net/books/pars/object has arrived at a usable beta stage (barring a segmentation fault when running big scripts).
Started the current implementation in the c language. I created a simple loop to test each new command as it was added to the machine, and this proved a successful strategy as it motivated me to keep going and debug as I went.
Wrote an incomplete c version of this machine called “chomski".
”
Wrote incomplete versions in c++ and java. The java Machine object at /books/pars/object.java/ got to a useful stage and will be a useful target for a script, very similar to /books/pars/tr/translate.c.pss (and will be called “translate.java.pss” ). This script creates compilable java code using the java Machine object. In fact, we will be able to run this script on itself (!). In other words we can run:
pep -f tr/translate.java.pss tr/translate.java.pssThe output will be compilable java code that can compile any parse machine script into compilable java code. Having this java system we are able to use unicode characters in scripts.
It will be interesting to see how much slower the java version is.
Started to think about a tape/stack parsing machine.
The coding of this version was begun around mid-2014. A number of other versions have been written in the past but none was successful or complete.
Trying to get this to look for ASMPP env variable to find the “asm.pp ” file which it needs to actually compile and run scripts, here is a
printf("test\n");
const char* s = getenv("PATH");
printf("PATH :%s\n",(s!=NULL)? s : "getenv returned NULL");
printf("end test\n");
First new code for a while. I will add a switch that prints the stack when
.reparse is called. (note: no, we can just print the stack after the
parse> label with the stack
and unstack
commands) This
should help in debugging complex grammars (such as mark.html.pss
or
newmark.pss) But it may be easier to add this to the compile.pss
script
since .reparse is just a jump to the parse label.
Trying to use an environment variable to locate 'asm.pp'
Added stack
and unstack
commands. But they don't update
the tape pointer (yet).
Small adjustments to “compile.pss". Starting to rewrite compilable.c.pss to” convert back to a single class test and also convert to changes made to
compile.pss
(eg negation and “ortestset*” compilation). This is a
maintainance problem trying to keep compile.pss
and compilable.c.pss
in sync
so that they recognise the same syntax. (note: the translation scripts don't
really need to use the grammar as the bumble.sf.net/books/pars/compile.pss compiler with
ortestset etc because they don't have to compile assembly-style
“jumps". )
” a simpler way to reduce test* tokens
# fragment.
# we use a leading or trailing comma to make a test*
# parse-token. This is sort-of "context parsing"
pop;pop;
"quoted*,*","class*,*",",*quoted*",",*class*" {
replace "quoted*" "test*"; replace "class*" "test*";
push; push; .reparse
}
pop;
"test*,*test*" {
push;push;push;
All the “,*” comma tokens above get confusing to look at when in a test with commas, so it could be better to actually make a “comma*” token.
Added the delim
command which changes the stack token delimiter
for push
and pop
commands.
Rewrote quoteset parsing in bumble.sf.net/books/pars/compile.pss Much better now, doesn't use “hops". Also, replaced nomsf://asm.pp with an asm.pp generated ” the nom compiler.
pep -f compile.pss compile.pss > asm.test.pp
This means that ℕ𝕠𝕞 is now self-hosting yaaaay. Thought it would be nice to have a javascript machine object ...
pep -f translate.js.pss translate.js.pss > pep.js
(not implemented, need to write the machine object and command methods, and the convert script. The convert script is a straightforward conversion of the “compilable.c.pss” script, but the machine object will take a little longer to write - but presumable, much less time than writing the struct machine object in c).
Once we have these things we will be able to run scripts in a browser which will be nice for testing. And we will also be able to use UNICODE characters!!
Writing a man page for pep. But I will use the asciidoc system and convert to html and troff. Also wrote ghman in the bash helper functions file which installs the page (in LINUX at least).
Cleaning up memory leaks with valgrind. Still one problem in UNTIL in
execute() function in machine.interp.c
Also an initialised value bug in
TESTIS (need a newParameter func?) But TESTIS should not be called unless
the parameter .text value is set...
Fixed a “one-off” bug. Also, found a bug in “until” in the execute()
function in machine.interp.c
(via valgrind). Can fix with endsWith() in
buffer.c. Memory leaks when growing cells and buffer needs to be fixed.
Valgrind on osx doesn' work properly so I need to use
LINUX for
this job.
Discovered many memory leaks and “one-off” errors and other more obvious bugs using valgrind.
Bogota, Colombia - raining
Added begin-blocks to compile.pss, asm.pp
and compilable.c.pss
These work in a similar way to awk's begin {} rules.
Added negated text tests to compile.pss
and compilable but
not to asm.pp
. So now we can do
pep -f compile.pss -i ' !""{a"not empty!!";}t;d;'to check if the workspace is empty
made the script eg/exp.recogniser.pss
work and also
eg/exp.tolist.pss
Need to deal with a segmentation fault. I think it has to do with “scriptFile” not being closed properly, but am not sure. Also, when we do the “quit” command we should free the machine and inputstreams no?
Changed the enum boolean because true and false were back to front.
Can compile test/test.natural.language.pss
with the
script compilable.c.pss
(see the “pepcl” helper bash function)
and it runs as a standalone.
Compiled the files in the object/
folder to a static library
libmachine.a
and then compiled the output of translate.c.pss
successfully with “gcc -o test test.c -Lobject/ -lmachine
” So, we can generate standalone executable parsing/transforming
programs from a script with
pep -f translate.c.pss
script.pss
> script.c
gcc -o scriptx script.c
-Lobject/
-lmachine
Continued to separate the code in pep.c
into separate 'object'
files in the pars/object/
folder. Currently up to machine.c
now will do machine.interp.c
The code is compiling with the bash
function ppco which is in the file pars/helpers.pars.sh
. I am
not using 'make' to compile, currently.
Reorganising the source code files. The main c file is now
pars/object/pep.c
and this includes the other 'object' files which
are in this directory. Moved the old pep.c
source code files
to the folder Monolith.gh (because everything was in the one
file).
Made the files in the pars/object folder the canonical source code for the machine. This means I need to make ppc etc compile with these files.
Discovered a bug in classtests. An empty workspace returns true for a range test.
Because eg/expression.pss
to parse arithmetic expressions such as “(7 + 100)” -100". Need to arrange the grammar so that it has a "lookahead" of 1 token
so that operator precedence can be handled. Also thought that “/” would be a
better token delimiter. Need a command to set the token delimiter character
on the machine. Also need a way to give statements to a script that are only
executed once, when the script starts. Perhaps the (eof) section/test should
work in the same way (be a script section, rather than a state-test).
Also, thought that the machine needs a “testhas” test, which would return true if the workspace currently contains the text in the current tapecell. This would allow parsing strings such as “eeee",” fffff". Also a “testtapeend” which returns true if the workspace currently ends with the text in the current tapecell.
Also, maybe need a “untiltape” command which reads until the workspace ends with the text in the current tape cell. This would allow parsing SED syntax “s#...##” or “s/...//” where the delimiter character can be anything which occurs after the “s” character.
trying to organise the bumble.sf.net/books/pars/pep.c source code into separate objects
in the pars/object/
folder.
translate.c.pss
split the class*
token into charclass*, range*
and list*
with corresponding negated classes.
Worked on translate.c.pss
would be handy to have multiline quotes.... (implemented)
working on compile.ccode.pss
(note: this was the ancestor of
bumble.sf.net/books/pars/tr/translate.c.pss and the other translation scripts)
I think I finally tracked down the “until” bug, which was actually a bug in readc(). A character pointer lastc was assigned before a growBuffer() call (which calls realloc()). When realloc() assigned a new memory block the character pointer was no longer valid.
Still looking at the “until” bug. Basically the problem occurs when the text read with until is greater than about 950 bytes. This is caused because <950 bytes realloc() basically did nothing, hence no problem!
A useful command for calculating jumps: “+int” which will add the given integer to all integers in the workspace. This command may be necessary when certain forward jumps are not known during compilation.
Maybe, it could be useful to have a very basic pattern matching syntax for
tests. Similar to a filename match: eg /word*?\*\*
/ where ? matches any one
character, * matches multiple, and \*
is a literal asterix. This could be
useful in error handling blocks, so as not to have to write out every single
combination of tokens. However, it would not be very readable.
bumble.sf.net/books/pars/compile.pss appears to be working. It is more readable and
maintainable than bumble.sf.net/books/pars/asm.pp but in the case of quoteset*
it
compiles not very efficient code (multiple jumps where
asm.pp
compiles only one). See the asm.pp
file for a much
better error handling idea.
compile.pss 664 lines
asm.pp 1485 lines
Had the idea for an “expand” command in which the machine will convert an abbreviated command into it’s full form in the workspace. Probably not.
Converting asm.pp
into compile.pss
which is much more compact
and readable. Finished converting, but not debugged.
Creating notclass*
syntax in asm.pp. eg ![a-z] { nop; }
Realised that I can just directly translate asm.pp
into
a compiling script. It will be convenient to have ![class] {}
syntax. We can implement this in asm.pp
quite easily.eg:
notclass*
<- !*class*
command*
<- notclass*{*commandset*}*
Started translating asm.pp
into parse-script language. It seems
quite straight forward. Also, we could write a script that
compiles “recognisers", just like the 2 bnf grammar rules above” eg:
notclass <- ! class
command <- notclass { commandset }
Continued converting execute() into functions in machine.methods.c
Realised that I have to modify how jumps and tests work when
creating executable scripts. In fact it may be necessary to use
the c “goto” instruction in order to implement “.reparse” and
“.restart".
” file sizes:
pep.c 187746 bytes pep 99432 bytes machine.methods.c 16761 bytes
created some machine methods in machine.methods.c
by copying
code from execute(). The process seems straight forward.
Added an “-i” switch to make it easier to provide input
when running interactively. (we will be able to do
echo “abcd” | pep -f palindrome.pss
eventually)
Looking again at the test.palindrome.pss
script, which doesn't
quite work because of “.restart” on eof.
Wrote a palindrome detector which seems very complicated for the simple task that it does, and also it does not actually work in all cases.
I implemented “quotesets” with a few nifty tricks. quotesets allow
multiple equals tests for a given block. The difficulty is that they are
parsed before the braces are encountered in the stream, so it is not
possible to resolve the forward jump. But there was a solution to this,
best understood by looking at the source code in “asm.pp". So multiple” tests for one block are possible with “quotesets” which are implemented
in asm.pp
and resolve into tests for blocks. They are very useful
because they allow syntax like this:
“noun*verb*object*", ” article*verb*object*", “verb*object” {
Discovered that the “until” instruction was not growing the
workspace buffer properly, leading to bugs. The same bug will
apply to “while". See the bugs: section for more information.” For some reason readc() is not growing the workspace properly
at the right time. The bug become apparent when parsing
test.commands.pss
and trying to read past a large multi-line
comment block. eg:
pep -If test.commands.pss input.txt
Worked on test.commands.pss
which acts like a kind of syntax check
and demonstration for all commands and structures implemented in
asm.pp
working on the asm.pp
compiler. wrote the .reparse keyword and
the “parse>” parse label. Finished end- and beginstest and blocks.
Implemented the “replace” machine instruction but not really debugged.
Added replace to the asm.pp
compiler so that it can be used in
scripts as well.
Writing the parameterFromText() function. This will allow parsing multiple parameters to an instruction. The tricky bit is that parameterFromText() has to return the last character scanned to that the next call to it, will start and the current scan position. Once I have multiple parameters, then I can write the “replace” command: eg replace “one” two";
Realised that I need a replace command, and this requires the use of 2 parameters. Maybe a bit of infrastructure will have to be written. An example of the use of “replace” is converting c multi-line comments into bash style comments. It would be possible to parse line by line and achieve this without “replace” but it is a lot more work.
various bits of tidying up. Still can't accept input from standard-in for some reason (program hangs and waits for console input)
Implemented the swap instruction (x) to swap current tape cell and the workspace buffer.
Fixed a bug in the get command which did not allocate enough memory for the stack/workspace buffer.
pep -f script.pss input.txt
and the system compiles the script to assembler, loads it, and runs it against the input stream in input.txt. No doubt there are a number of bugs, but the general idea works.
Made progress with “asm.pp". Class blocks seem to be working.” Some nested blocks now work. Asm.pp
is at a useful stage. It
can now compile many scripts. Still need to work out how
to implement the -f switch (segmentation fault at the moment).
In theory the process is simple... load asm.pp, run it on
the script file (-f), then load sav.pp
(output of asm.pp) and
run it on the inputstream.
Bug! When the program grows during loading a segmentation fault occurs.
Created test.commands.pss
which contains simple commands which
can be parsed and compiled by the asm.pp
script.
Also, realised that the compilation from assembler should stop with errors when an undefined instruction is found. Dealt with a great many warnings that arise when one uses “gcc -Wall
” implemented:
command 'cc' adds the input stream character count to the workspace buffer Also made an automatic newline counter, which is incremented every time a \n character is encountered. And the 'll' command which appends the newline counter as a string onto the end of the workspace buffer.
Since the main function of this parse-machine is to compile “languages” from a text source, the commands above are very useful because they allow the compilation script to give error messages when the source document is not in the correct format (with line number and possibly character count).
Did some work on “asm.pp” which is the assembler file which compiles scripts. Sounds very circular but it works. Realised that after applying bnf rules, need to jump back to the “parse:” label in case other previous rules apply.
Discovered a bug when running bumble.sf.net/books/pars/asm.pp in UNIX filter mode “Abort trap: 6” which means writing to some memory location that I should not be. Strangely, when I run the same script interactively (with “rr") it works and doesnt cause the ” abort.
Created a “write” command, on the machine, which writes the current workspace to a file called “sav.pp". This has a parallel” in sed (which also has a 'w' write command). This command should be useful when compiling scripts and then running them (since they are compiled to an intermediate “assembler” phase, and then loaded into the machine).
Made some progress to use the pattern-machine as a unix-style filter program. Added some command line options with getopt(). The parser should be usable (in the future) like sed: eg
cat somefile | pep -sf script.pp > result.txtor
cat somefile | pep -sa script.ppa > result.txt
where script.ppa is an “assembler” listing which can be loaded into the machine.
Working on parsing with asm.pp. Seem to have basic commands parsing and compiling eg: add “this"; pop; push; etc” Simple blocks are parsing and compiling. There are still some complications concerning the order of shift-reductions.
0: success no problems
1: end of stream reached
2: undefined instruction
3: quit/crash executed (exit script)
4: write command could not open file sav.pp for writing
More work. Some aesthetic fixes to make it easier to see what the machine is doing. Wrote showMachineTapeProgram() to give a nice view of pretty much everything that is going on in the machine at once. Working on how to collate “attributes” in the tape array register. Made an optional parameter to printSomeTape() that escapes \n \r etc in the tape cells which makes the output less messy.
A lot of progress. Starting to work on asm.pp
again. Have
basic shift-reduction of stack tokens working. Now to get
the tape “compiling” attributes as well.
The bug seems to be: that JUMP is not treated as a relative jump by execute() but is being compiled as a relative jump by instructionFromText(). So, either make, JUMPs relative or ...
Made the “labelTable” (or jumpTable) a property of the program. This is a good idea. Also made the command 'jj' print out the label table. Still using “jumptable” phrase but this is not a good name for this.
I should organise this file: first structure definitions. rhen prototype declarations, and then functions. I haven't done this because it was convenient to write the function immediately after the structure def (so I could look at the properties). But if I rearrange, then it will be easier to put everything in a header file, if that is a good idea.
Lots of minor modifications. made searchHelp also search the
help command name, for example. Added a compileTime (milliseconds)
property to the Program structure, and a compileDate (time_t).
81 instructions (which is how many instructions in asm.pp
at the
moment) are taking 4 milliseconds to compile. which seems pretty
slow really.
pep.c 138430 bytes
pep 80880 bytes (compiled c code)
Trying to eliminate warnings from the gcc compiler, which are actually
very handy. Also seem to have uncovered a bug where the “getJump
” function was actually after where it was used (and this pep.c
does
not use any header files, which is very primitive). So the
label jumptable code should not have been working at all...
changing lots of %d to %ld for long integers. Also, on BSD unix
the ansi colour escape code for “grey” appears to be black.
Looking at this on an OSX macbook. The code compiles (with a number of warnings) and seems to run. The colours in this bash environment are different.
After stepping through the asm program I discovered that unconditional jump targets are not being correctly encoded. This probably explains why the script was not running properly. Also I may put labels into the deassembled listings so that the listings are more readable.
Revisiting.
Need to create command line switches: eg -a <name> for loading
an assembler script. and -f <name> to load a script file.
Need to step through the asm.pp
script and
work out why stack reduction is not working... (see above for
the answer). An infinite
loop is occurring. Also, need to write the treemap app for iphone
android, not related to this. Also, need to write a script that
converts this file and book files to an asciidoctor format for
publishing in html and pdf. Then send all this to someone more
knowledgeable.
pep.c 133423 bytes
pep 78448 bytes
Would be handy to have a “run until 10 more chars read” function. This would help to debug problematic scripts. (note: these things have all been implemented with the pep -I" switch which means interactive or really debug
Segmentation fault probably caused by trying to “put” to non-existant tape cell (past the end). Need to check tape size before putting, and grow the tape if necessary.
Could try to make a palindrome parser. Getting a segmentation fault
when running the asm.pp
program right through. Wrote an INTERPRET
mode for testing- where commands are executed on the machine
but not compiled into the current program. Wrote runUntilWorkspaceIs()
and adding a testing command to invoke this. This should make is easier
to test particular parts of a script. found and fixed a problem with
how labels are resolved, this was cause by buildJumpTable() not ignoring
multiline comments.
Made multiline comments (#*
... #) work in assembler scripts. Made the machine.delimiter character visible and used by push and pop in
execute(). There is no way to set the delimiter char or the escape char
in scripts
Added multiline comments to asm.pp
(eg #*
... #) as well as single line comments with #.
Idea: make pep.c
produce internal docs in asciidoctor format
so we can publish to html5/docbook/pdf etc.
working on the asm.pp
script. Made “asm” command reset the
machine and program and input stream. Added quoted text and
comments to the asm.pp
script parsing, but no stack parsing yet.
Need to add multiline comments to the loadAssembledProgram() function. while and whilenot cannot use a single char: eg: whilenot “\n” doesn't work. So, write 'whilenot [\n]' instead Also should write checkInstruction() called by instructionFromText() to make sure that the instruction has the correct parameter types. Eg: add should have parameter type text delimited by quotes. Not a list [...] or a range [a-z]
If the jumptable is a global variable then we can test jump calculations interactively. Although it’s not really necessary. Would be good to time how long the machine takes to load assembler files, and also how long it takes to parse and transform files.
wrote getJump() and made instructionFromText() lookup the label jump table and calculate the relative jump. It appears to be working. Which removes perhaps the last obstacle to actually writing the script parser. Need to make program listings “page” so I can see long listings.
writing printJumpTable() and trying to progress. Looking at Need to add “struct label table[]” jumptable parameter to instructionFromText(), and compile(). asciidoctor.
Continued to work on buildJumpTable. Will write printJumpTable.
Renamed the script assembler to “asm.pp
” Made a bash function to insert a timestamp. Created an
“asm” command in the test loop to load the asm.pp
file into the program.
Started a buildTable function for a label jump table. These
label offsets could be applied by the “compile” function.
“pep.c” source file is 117352 bytes. Compiled code is 72800 bytes. I could reduce this dramatically by separating the test loop from the machine code.
Revisiting this after taking a long detour via a forth bytecode machine which currently boots on x86 in real mode (see bumble.sourceforge.net/books/osdev/os.asm ) and then trying unsuccessfully to port it to the atmega328p architecture (ie arduino) at bumble.sf.net/books/arduino/os.avr.asm
The immediate task seems to be to write code to create a label table for assembly listings, and then use that code to replace labels with relative jump offsets. After that, we can start to write the actual code (in asm.pp) which will parse and compile scripts.
So the process is: the machine loads the script parser code (in “asm” format) from a text file. The machine uses that program to parse a given script and convert to text “asm” format. The machine then loads the new text asm script and uses it to parse and transform ("compile") an input text stream.
For some reason, the code was left in a non compilable state in 2016. I think the compile() and instructionFromText() functions could be rewritten but seem to be working at the moment.
The code is not compiling because the parameter to the “compile() ” function is wrong. When we display instructions, it would be good to always indicate the data type of the parameter (eg: text, int, range etc) Modify “test” to use different parameter types, eg list, range, class.
used instructionFromText() within the compile() function and changed compile to accept raw instruction text (not command + arguments) wrote scanParameter which is a usefull little function to grab an argument up to a delimiter. It works out the delimiter by looking at the first char of the argument and unescapes all special chars. Now need to change loadAssembled to use compile().
Added a help-search / and a command help search //. Added escapeText() and escapeSpecial(), and printEscapedInstruction(). add writeInstruction() which escapes and writes an instruction to file. Added instructionFromText() and a test command which tests that function.
Worked on loadAssembledProgram() to properly load formats such as “while” [a-z]" and “while [abc\] \\ \r \t]” etc. All this work is moving towards having the same parse routine loading assembled scripts from text files as well as interactively in the test loop.
Discovered that swap is not implemented.
Added loadlast, savelast, runzero etc. a few convenience functions in the interpreter. One hurdle: I need to be able to write testis “\n” etc where \n indicates a newline so that we can test for non printing characters. So this needs to go into the machine as it’s ascii code. Also, when showing program listings, these special characters \n \t \r should be shown in a different colour to make it obvious that they are special chars... Also: loadprogram is broken at the moment.... need to deal with datatypes.
When in interpreter mode, reading the last character should not exit, it should return to the command prompt for testing purposes.
Wrote an “int read” function which reads one character from stdin and simplifies the code greatly. Still need to fix “escaping". need to make ss give better output, configurable” Escaping in 'until' command seems to be working.
Added a couple more interpreter commands to allow the manipulation of the program and change the ip pointer. Now it is possible to jump the ip pointer to a particular instruction. Also, looked at the loadAssembledProgram and saveAssembledProgram functions to try to rewrite them correctly. The loadAssembledProgram needs to be completely cleaned up and the logic revised. My current idea is to write a program which transforms a pep script into a text assembly format, and then use the 'loadAssembledProgram' to load that script into the machine. Wrote 'runUntilTrue' function which executes program instructions until the machine flag is set to true (by one of the test instructions, such as testis testbegins, testends... This should be useful for debugging complex machine programs.
wrote a cursory freeMachine function with supporting functions
Tidying up the help system. Had the idea of a program browser, ie browse 'prog' subfolder and load selected program into the machine. Need to write the actual script compilation code.
Writing a compile function which compiles one instruction given command and args. changed the cells array in Tape to dynamic. Since we can use array subscripts with pointers the code hardly changes. Added the testclass test Made program.listing and tape.cells pointers with dynamic memory allocation.
Working on compiling function pointers for the character class tests with the while and testis instructions. Creating reflection arrays for class and testing.
Continued work. Trying to resolve all malloc and realloc problems. Using a program with instruction listing within the machine. Each command executed interactively gets added to this.
Saving and loading programs as assembler listings. validate program function. “until” & “pop” more or less working. “testends” working ...
Lots of small changes. The code has been polished up to an almost useable stage. The machine can be used interactively. Several instructions still need to be implemented. Push and pop need to be written properly. Need to realloc() strings when required. The info structure will include “number of parameter fields” so that the code knows how many parameters a given instruction needs. This is useful for error checking when compiling.
Revisiting this after a break. Got rid of function pointers, and individual instruction functions. Just have one executing function “execute()” with a big switch statement. Same with the test (interpreter) loop. A big switch statement to process user commands. Start with the 'read' command. Small test file. The disadvantage of not having individual instruction functions.
(eg void pop(struct Machine * mm)
void push(struct Machine * mm) etc)
is that we cannot implement the script compiler as a series of function calls. However the “read” instruction does have a dedicated function.
The development strategy has been to incrementally add small bits to the machine and concurrently add test commands to the interpreter.
Had the idea to create a separate test loop file (a command interpreter) with a separate help info array. show create showTapeHtml to print the tape in html. These functions will allow the code to document itself, more or less.
Changes to make: The conditional jumps should be relative, not absolute. This will make it easier to hand write the compiler in “assembly language” . Line numbers are not necessary in the assembly listings. The unconditional jump eg jump 0 can still be an absolute line number.
Put a field width in the help output. Change help output colours. Make “pep” help command “ls ”
make p.x -> px or just "."
Was working on a c version of this called “chomski
”
Attempted to write various versions of this machine, in java, PERL , c++ etc, but none was completed successfully see bumble.sf.net/pp/cpp for an incomplete implementation in c++. But better to look at the current version, which is much much better.
I started to think about this parsing machine while living in Almetlla de Mar. My initial ideas were prompted by trying to write parsing scripts in sed and then reading snippets of Compilerbau by N. Wirth, thinking about compilers and grammars