The idea of this blog about the Nom language and the Pep engine is or was to discuss parsing, compiling and patterns in general. But there is always a temptation to start writing about technical aspects of Nom. So here, I am going to write about patterns which is general and maybe about type-checking which is specific.
A pattern is any repeating entity in which you can see some kind of order. I think it is one of the fundamental components of human intelligence. But the patterns Nom deals with are quite a specific subset. In mathematics, to recap, a formal language is just a set of words formed from an alphabet. So:
{ab, ac, ad}
is, according to mathematicians, a language (and the alphabet is the set {a,b,c,d}. This is all so simple but the interesting part comes when trying to analyse the structure of a language. Nom deals only with the subset of (formal mathematical) languages which are called “context-free” . Either the linguist Noam Chomsky or else some mathematicians called Backus and Naur were the first to analyse languages this way, surprisingly recently (or maybe it was somebody else). The name “Nom” may be a vague reference to Noam,or an indo-european root for “name ” (as you wish).
I have no proof that Nom can parse all context free languages, but I am pretty confident about it.
All modern computer languages do some kind of type-checking*. Even dynamically typed scripting languages will do it so that they can automatically cast types. Although my initial aim with Pep/Nom was just to parse and translate context-free languages, it would be nice if it could also have a reasonable go at compiling a modern computer language. The problem is that type-checking lies outside of the scope of context-free languages. I am not a mathematical language theorist. I can’t prove the “pumping lemma” or even know why I need to know it. But type-checking involves “remembering” the type of a variable that may be declared anywhere in a program. If you a remembering something, that is “context", context-full not context-free.
”
But despite this Nom maybe able to type-check. Here is a quick sketch of how it maybe done: The Nom language contains the commands “mark” and “go".” This is a way of marking a particular tape-cell with a name and then going (jumping) to it, using the previously assigned name.
mark "here"; # mark the current tape cell with the name 'here'
++; ++; ; # increment the tape pointer by 2
go "here" ; # jump back 2 tape cells to the marked cell
This means that the tape (which I have always claimed is a simple array with a pointer) is actually a kind of associative array, or a hash or a dictionary (Why does each new language decide to use different terminology for basic data structures?).
To type-check, as we encounter variable declarations, we could assign a tape cell to hold a textual description of the type and “mark” the tape cell with the name of the variable. The scope of the variable declaration might be handled by adding a dotted text prefix with the name of the scope in which the variable is declared.
Then, when are parsing statements, and encounter the variable name again, we go (jump) back to the marked tape cell for that variable and extract the type. This may just work, but there are a couple of issues. Firstly, it means that when Nom is type-checking, it can’t do anything else. That is, it can’t compile at the same time. This means that type-checking would be the first “pass” of the compiler, and then the actual compiler would be a separate Nom script which doesn’t type check.
Secondly, a bigger problem is how to know how many tape cells to allocate for the variable declarations, or how to handle declarations of local variables interspersed with the parsing stack. As I said initially, this is only a “sketch” of how a Nom type-checker would work.