The basic idea behind Nom (and the Pep engine) is of a synchronized stack and array where the stack holds the grammar parse tokens for a context-free language and the array contains the corresponding “attribute” for each parse token. When a token is pushed onto the token stack the array pointer (or “tape” as I call it) is incremented and when a token is popped off the stack, the tape pointer is decremented.
The idea for the pep/nom system occurred to me while I was walking in the Catalonian countryside. It was also sparked by realising that the Unix sed text stream editor contained a type of virtual machine consisting of 2 text registers, the “pattern space” and the “hold space” , and a set of commands ("instructions") to manipulate those registers. This was the first time that I had realised that a text-system could also be thought of as a virtual machine.
I had often wondered why the techniques of regular expression matching and subsitution could not be applied to context-free languages. Regular expressions had become common in Unix and then ubiquitous on the web with the Perl language and then spread to every modern computer language. But journeymen and journeywomen programmers (such as myself, briefly) would try to use them for parsing and transpiling tasks which were beyond the capabilities of regular expressions. The problem was a relatively straightforward theoretical one: that regular expression patterns cannot match context-free language patterns, and the developers of Perl may have made the problem worse by adding “extensions” to their regular expression engine.
Another prompt for the Pep/Nom idea was trying to read Nicolas Wirth's book Compilerbau which I had printed off from some web pdf. I didn’t get very far into the book until I saw that he was only interested in LL parsers which (if my shaky understanding Formal Language Grammar Theory is correct) mainly relates to recursive descent parsers and compilers. From a practical point of view, I can’t see anything wrong with recursive descent compilers, but personally I am not interested in them. This is because I am interested in language parsers where the grammar of the language can be seen clearly in the syntax of the parser.
But despite this, there was one paragraph in Compilerbau which caught my attention where Wirth describes the action of a parser reducing the grammar tokens on what looked like a stack.
The 1st time the idea for Pep/nom occurred to me, I immediately saw that it could be implemented in itself. That is, that the Nom language is ideally suited to parsing and compiling the Nom language. I also thought that the idea was so simple that it was impossible that it had not been already implemented by someone. But despite searching, I couldn’t find any system that was even similar.
Eventually I achieved a decent implementation in C and then saw that it was quite simple to write Nom translation scripts to translate a Nom script into another language (For example Go Java Javascript Python C (executable not interpreted) Tcl and Perl )
So these Nom scripts translate Nom scripts into different computer languages (both compiled and interpreted languages). For example when I compile the Go translation of a Nom script, I then have a standalone executable which does the same thing that the Nom script did.
pep -f tr/translate.go.pss -i "r;t;t;t;d;" > test.go
go build test.go
echo "abcde" | ./test
# should print 'aaabbbcccdddeee'
But to return to the topic of this blog post, the idea for Pep/Nom seemed like a nugget of gold that I stumbled apon while walking and seems like the sort of idea that needs to be looked at, analysed and admired without any commercial or other ulterior motive. And that is the purpose of this blog; to present this small humble nugget of gold to the world.
mjb