<quote>
This post is about a new parsing and compositing language that I
am in the process of developing. The language is called syntagma
because the name “grammatica” sounded too pompous. Syntagma seems
remarkably expressive and useful. It is entirely built apon the
scaffolding of pep and nom, using the same virtual machine and having
it’s implementation written in nom. The current implementation is
at /tr/syntagma.pss
Below is the copy language implemented in syntagma. The copy language is a formal context-sensitive language where any word is followed by it’s “copy”, the same word.
word:[:alpha:]+; del; L=word word /$1==$2/;
In the example above word:[:alpha:]+; is a lexing rule, which means that it reads and text-matches the input stream. In this case, it reads a sequence of alphabetic characters from the (text) input stream and assigns them to a parse token called “word”. The next syntagma statement is “del;” which essentially deletes or ignores all other types of characters in the input stream (white-space, punctuation, numbers and so on).
The next syntagma statement is “L=word word /$1==$2/;” . This is a parse rule which matches and reduces a sequence of parse tokens on the parse-stack. In this case, the parse rule also has a condition which is the text between “/” and “/”. The function of a condition is to prevent the parse-rule from triggering or executing if the condition is not met. Within the condition “$1 ” refers to the token attribute value" which is whatever alphabetic text happened to be lexed in the first lexing rule. If the input stream consists of “lithe slither oof woof” then the $1 value will be “lithe” and the $2 value (on the first iteration) will be “slither”.
So the condition will prevent the parse rule from being applied if “lithe” does not equal “slither”. These conditions are one mechanism to allow syntagma to recognise and compose context-sensitive languages (as opposed to only context-free languages). If the input stream is “og og” then $1==$2 and the parse rule “L = word word; ” will be applied which means that the input is recognised as the copy language (L).
The example above only uses a small fraction of the abilities and features of the syntagma language. The script line above is a language, pattern, or format recogniser in the sense that it simply says “The pattern is, or is not the copy language” . This is useful for validating input but it is usually important to translate the input into an output format. This process is referred to by many different words, depending on the context, such as: translating, transforming, composing, re-formatting, compiling and transpiling. If the output format is a binary format, or a low-level computer language, such as an assembly language format, then the process is often labelled “compiling”.
Actually all of these words have drawbacks in certain contexts; The word “translating” often evokes the idea of translating one human language to another, but most parsing systems don’t do this particularly well (syntagma included). “composing” has musical connotations, “transforming” seems more geometical than linguistic, “compiling” seems to imply that an executable computer format should be the output, and “transpiling” is a fairly uncommon, niche sort of word which may not be clear to many people. Another candidate word for this process is “compositing” which has origins in the process of preparing text for publication in a newspaper and this word is appropriate in the context of syntagma because it is a system that takes pure text as input, and produces pure text as output.
Syntagma is able to translate an input format into an output format using it’s rule-block and left-hand-side attributes. An example follows
lit:[()+-]; e: [0-9]+; e:[a-z]; del;
e = e ("+"|"-") e { @1:="($2 $1 $3)"; }
eof { stack(e) { println "$1"; } }
The script above converts an additive symbolic arithmetic expression to a lisp-like prefix notation - so “a+b+100” becomes “(+ (+ a b) 100)” . It uses the token attribute variables $1,$2,$3 to rearrange the order of the operands. It is concise and expressive. But syntagma contains many other capabilities, which I will go into later.