About the character classes that can be used in NOM scripts.
The pep/nom system does not support www.google.com/search?q=regular+expressions which may seem a very odd “feature* considering that its main purpose” in life is parsing and compiling context-free languages which are a super-set of regular languages (which are the type of patterns that regular expressions match). Now, not having regexes in pep/nom is, admittedly, at times quite trying, because one is forced to actually parse the input stream rather than just “matching and dispatching” .
But the lack of regular expressions has some big advantages. One is that you won’t be tempted to use them, or rather, you won’t be tempted to try to recognise context-free patterns using regular expressions (which is almost by definition impossible) which is a surprisingly common foible amongst we journeyman programmers. More-over, since context-free patterns are a superset of regular languages you can definitely match and transform regular expression patterns with nom - but it is more work.
In addition not having regular expressions makes everything faster and simpler and that is a good thing.
The closest thing that you have in nom to regular expression are character classes like these
[:space:] [:alnum:] [:alpha:] [a-g] [5^&*(]
These may look very familiar but they are not regex elements, for example, be careful of the following
[^abc] # ^ doesn't have any special meaning in []
[xyza-z] # nope: can't combine a range and a list,
In the pep interpreter these character classes are just ctype.h classes or lists of (byte) characters and they know nothing about Unicode whatsoever. But when you translate a NOM script into another nice modern language like go or java, then suddenly, for free, you get all the wonderful (or not-so-wonderful) UNICODE support that that language supplies. So [:alpha:] should recognise any alphabetic character anywhere in the Unicode character map.
Allow user defined character classes in nom scripts since that will increase readability
begin {
class "keywordchar" [abcxyz];
# use logic or concatenation to create a set. This is quite
# fancy and potential difficult to implement in the interpreter
# but easier in the translation scripts.
class "keywordchar" [:space:],[a-x];
}
read;
[:keywordchar:] {
put; clear; add "Found keyword character (";get; add ")\n";
print; clear;
}
print; clear;