Parsing System (v0.7)
Goldie Home (v0.7) -> Developers -> Library Source and StaticLang

Library Source and StaticLang

Q: I've looked at the source for Token, Language, Lexer and Parser, and OMG, what the hell is that mess?! (Especially Token! Geez!)

A: Heh :) Put simply, it's a temporary kludge. The StaticLang tool generates static-style languages directly from Goldie's dynamic-style library source. That mess is all the directives used to help instruct StaticLang in how to do the conversion.

To read the dynamic-style source, just ignore all the weird /+...+/ directives and anything inside version(Goldie_StaticStyle) (it will NOT be defined when you compile).

To read the static-style source, look at StaticLang's output and remember that version(Goldie_StaticStyle) WILL be defined when compiling.

To read the dynamic-style source, and see how StaticLang will transform it, see the explanation of StaticLang's preprocessing below.

I do intend to clean that all up.

Q: Goldie is already a language processing tool. Why don't you just use Goldie and a D grammar to do the conversion? Or use D's fantastic metaprogramming abilities?

A: I do intend to do something like that.

StaticLang's Preprocessing

Tag Structure

StaticLang's preprocessing system is based on some basic tags:

/+LETTER:TAG_NAME+/ /+LETTER:TAG_NAME+/DUMMY

Where:

  • LETTER is one of: P, S, or E
    • P tags stand for "Place" and are used by themselves. These tags are replaced by some other text.
    • S and E tags stand for "Start" and "End" and are used in pairs surrounding code. Both tags and all the code in between are all replaced.

      Example:

      /+P:SOME_TAG+/ /+P:SOME_TAG+/DUMMY /+S:SOME_TAG+/some D code here/+E:SOME_TAG+/
  • TAG_NAME is the name of the desired tag (see below).
  • The text DUMMY can optionally be appended whenever needed to make D's parser happy. This is usually only useful for P tags.

Note that, aside from the DUMMY text, the tags are seen as nested comments by D.

Defined Tags

You'll notice that some of the following are not yet documented. Full documentation of StaticLang's preprocessing is delayed because I may make use of D's metaprogramming to take over much of the work currently done by StaticLang.

There are three classes of tags: cvars (Constant Vars), svars (Store Vars), and mvars (Modify Vars)

cvars (Constant Vars):

These tags are replaced with constant predetermined text that is always the same for a given language. (But it may differ from one language to another.)

If S and E tags are used, all the code inside the start and end tags is replaced.

REM

Comments. Short for "remark", just like in BASIC and BATCH. These are removed completely by StaticLang, and replaced with absolutely nothing.

Generally, it's preferred to use version(Goldie_DynamicStyle) instead of this.

STATIC
Replaced with the text static
OVERRIDE
Replaced with the text override
PACKAGE

Replaced with the package name of the language.

For instance, if StaticLang is run with the command line parameter -pack:myApp.langs.myLang, then this tag is replaced with the text myApp.langs.myLang.

SHORT_PACKAGE

Replaced with the name of the language, ie., the last part of the package name.

For instance, if StaticLang is run with the command line parameter -pack:myApp.langs.myLang, then this tag is replaced with the text myLang.

INIT_STATIC_LANG
Inserts some basic initialization for each module:
version = Goldie_StaticStyle; private enum _packageName = "/+P:PACKAGE+/"; private enum _shortPackageName = "/+P:SHORT_PACKAGE+/"
LANG_CLASSNAME
Replaced with Language_myLang (assuming the language name, ie. SHORT_PACKAGE, is myLang)
LANG_INHERIT
Replaced with public class Language_myLang : Language
LANG_INSTNAME
Replaced with language_myLang
LANG_FILENAME
Inserts the name of the CGT file this static-style language is being created from, without the path. For instance, myLang.cgt
LEXER_CLASSNAME
Replaced with Lexer_myLang
LEXER_INHERIT
Replaced with public class Lexer_myLang : Lexer
PARSER_CLASSNAME
Replaced with Parser_myLang
PARSER_INHERIT
Replaced with public class Parser_myLang : Parser
TOKEN_CLASSNAME
Replaced with Token_myLang
TOKEN_INHERIT
Replaced with private class _Token_myLang(SymbolType staticSymbolType, string _staticName) : Base_Token_myLang
SUBTOKENTYPE_NAME
Replaced with SubTokenType_myLang
IS_CORRECT_TOKEN_FUNCNAME
Replaced with isCorrectToken_myLang
RULE_ID_OF_FUNCNAME
Replaced with ruleIdOf_myLang
TOKEN_CTOR_TERMFIRST
Replaced with static if(staticSymbolType != SymbolType.NonTerminal) {
TOKEN_CTOR_NONTERMFIRST
Replaced with static if(staticSymbolType == SymbolType.NonTerminal) {
BLOCK_ELSE
Replaced with } else {
TOKEN_CTOR_END
Replaced with }
TOKEN_CLASSNAME_TOPNODE
Inserts the name of the nonterminal Token_{languageName}!{symbol} that is the root, or "Start Symbol", of the parse tree. Ie, this is the type of Parser_{languageName}.parseTree.
TOKEN_TEMPLATE_1P
TOKEN_TEMPLATE_2P
TOKEN_TEMPLATE_RULE
Inserts the bodies of various helper templates for referring to the various static-style token types.
SUBTOKENTYPE_TEMPLATE
Inserts the implementation of the SubTokenType_myLang template used by Token_{languageName}!{rule}.sub(int index) to determine the correct static-style token type for each subtoken of a Token_{languageName}!{rule}.
SET_STATICID_TERM
To be documented...
SET_STATICID_NONTERM
To be documented...
RULE_ID_OF
To be documented...
LANG_STATICDATA
Used to insert all the data from the given CGT file into the static-style Language_{languageName}.
INIT_SYMBOL_LOOKUP
Used in the Language_{languageName} constructor to insert code that initializes the symbolLookup table.
LOOKUP_SYMBOL
Used to insert most of the body code for the Language_{languageName}.staticLookupSymbol() function.

svars (Store Vars):

These are fairly complex compared to the cvars and mvars. They involve the use multiple distinct forms:

/+S:TAG_NAME:STORE+/ code here /+E:TAG_NAME:STORE+/ /+P:TAG_NAME+/ /+S:TAG_NAME:SOME_NAME+/ code here /+E:TAG_NAME:SOME_NAME+/ /+P:TAG_NAME:SOME_NAME+/

First, the form /+S:TAG_NAME:STORE+/ code here /+E:TAG_NAME:STORE+/ is used to tell StaticLang to "store" the code inside. The code code here is stored, but the whole set of tags are removed and replaced with nothing.

Then, the form /+P:TAG_NAME+/ is used. The stored code is inserted into some internally-defined boilerplate code, repeated however many times necessary (determined by the TAG_NAME and the language), surrounding boilerplate is added, and then the final result is inserted in.

Within the "stored" code, the final two forms can be used to insert certain data that's different for each repetition.

ACCEPT_TERM
ACCEPT_TERM:STORE

Inside the lexer, this stores/inserts the code to accept and create a new terminal token of the appropriate type.

This is repeated (along with boilerplate) for each possible terminal token type, such as Token_myLang!(SymbolType.Whitespace, "Whitespace") or Token_myLang!(SymbolType.Terminal, "Number").

ACCEPT_TERM:TOKEN_CLASSNAME
Inserts the class name for a particular type of terminal token, such as Token_myLang!(SymbolType.Whitespace, "Whitespace") or Token_myLang!(SymbolType.Terminal, "Number").
REDUCE
REDUCE:STORE

Inside the parser, this stores/inserts the code to reduce a group of subtokens into a new nonterminal token of the appropriate type.

This is repeated (along with boilerplate) for each possible Token_{languageName}!{rule} type, such as Token_myLang!("<Mult Exp>", 5) (where "5" is the ruleId).

REDUCE:TOKEN_CLASSNAME
Inserts the class name for a particular type of nonterminal rule token, such as Token_myLang!("<Mult Exp>", 5) (where "5" is the ruleId).

mvars (Modify Vars):

These take the code between the S and E tags, and modify it in some predetermined way.

STATIC_IDENT
Converts an identifier such as fooBar to staticFooBar.
LANG_IDENT
Converts an identifier such as fooBar to fooBar_nameOfLang. For instance, if the language name (ie, the SHORT_NAME from above) is myLang, then /+S:LANG_IDENT+/fooBar/+E:LANG_IDENT+/ becomes fooBar_myLang.