Parsing System (v0.9)
Goldie Home (v0.9) -> GoldieLib Reference -> class Language

class Language

This is the main interface to GoldieLib.

module goldie.lang

This class corresponds to a loaded CGT file, ie. a compiled grammar. This is used for dynamic-style.

this()

Creates a new blank Language.

This should only be used if you're going to manually create a Language, which is an advanced feature. Normally, you would create a Language via Language.load, StaticLang or possibly one of the Language.compileGrammar* functions.

static Language load(string cgtFilename)
DEPRECATED static Language loadCGT(string filename)
Loads a CGT compiled grammar file.
void save(string cgtFilename)

Saves this Language to a CGT file.

static bool compileGrammarGoldCompatibility

Used by the Language.compileGrammar* functions. This is equivalent to the -gold command-line switch in GRMC: Grammar Compiler. Default value is false.

static Language compileGrammar(string grammarDefinition, string filename="")

Compiles a grammar definition into a dynamic-style Language. This uses the exact same code as GRMC: Grammar Compiler.

The filename from which the grammar definition originated can be provided so any errors during compilation can report the grammar definition's filename.

If there's an error in the grammar, instead of an exception being thrown, an error message will be sent to stdout and null will be returned. This will be fixed in a future version of Goldie.

Because grammar compiling uses static-style internally, the compileGrammar functions cannot be used on DMD 2.057 due to DMD Issue #7375.

static Language compileGrammarFile(string filename)

Just like compileGrammar, but loads the grammar definition from a .grm file.

static Language compileGrammarDebug(string grammarDefinition, string filename="", bool verbose=false)
static Language compileGrammarFileDebug(string filename, bool verbose=false)
Same as compileGrammar and compileGrammarFile, except it stores the lexer's NFA and DFA (in Graphviz DOT format) into Language.nfaDot and Language.dfaDot. Also, a verbose mode can optionally be enabled.
string filename
The path and name of the CGT file this Language was loaded from (if any).
string name
string ver
string author
string about
bool caseSensitive
Metadata about the language. For more information, see GOLD's documentation for the grammar definition language and CGT files. Note that all of these, including caseSensitive, are informational-only and do not actually affect GoldieLib's behavior.
Symbol[] symbolTable
CharSet[] charSetTable
Rule[] ruleTable
DFAState[] dfaTable
LALRState[] lalrTable

int startSymbolIndex
int initialDFAState
int initialLALRState

int eofSymbolIndex
int errorSymbolIndex

The actual language-defining information in the CGT file. See GOLD's CGT documentation for more information.

These are very low-level to the lexing/parsing process and most people will not need to access these directly. In particular, modifying any of these is an advanced feature that should only be done if you really know what you're doing.

@property Symbol eofSymbol
The Symbol for end-of-file.
@property Symbol errorSymbol
The Symbol for a lexing error.
string nfaDot
string dfaDot

The lexer's NFA and DFA in Graphviz DOT format.

These are always empty unless the Language was created via Language.compileGrammarDebug or Language.compileGrammarFileDebug. Languages loaded from a CGT file or via StaticLang will never have these filled in.

int nfaNumStates

The number of NFA states created when generating the lexer. The number of DFA and LALR states can always be found with dfaTable.length and lalrTable.length.

This is always 0 unless the Language was created via Language.compileGrammarDebug or Language.compileGrammarFileDebug. Languages loaded from a CGT file or via StaticLang will never have this filled in.

Parser parseFileX(string filename)

Loads a file, lexes and parses it with a new Lexer and a new Parser, and returns the Parser which can then be used to obtain the parsing (and lexing) results or can be reused to parse something else.

Throws a ParseException if the source contains an error.

Parser parseCodeX(string source, string filename="")

Creates a new Parser and a new Lexer, uses them to lex and parse "source", and returns the Parser which can then be used to obtain the parsing (and lexing) results or can be reused to parse something else.

Throws a ParseException if the source contains an error.

The filename from which the source originated can be provided so the error messages upon any parsing or lexing errors can report the filename.

Parser parseTokensX(Token[] tokens, string filename="", Lexer lexerUsed=null)

Usually just called by the other parse functions.

Creates a new Parser, uses it to parse an already-lexed array of Tokens, and returns the Parser which can then be used to obtain the parsing results or can be reused to parse something else.

Throws a ParseException if the source contains an error.

The filename from which the source originated can be provided so the error messages upon any parsing errors can report the filename.

The Lexer that was used can be provided so that the Parser returned can provide access to the lexing results.

Lexer lexFileX(string filename)

Usually just called by the parse functions.

Loads a file, lexes it with a new Lexer, and returns the Lexer which can then be used to obtain the lexing results or can be reused to lex something else.

Throws a LexException if the source contains an error.

Lexer lexCodeX(string source, string filename="")

Usually just called by the parse functions.

Creates a new Lexer, uses it to lex "source", and returns the Lexer which can then be used to obtain the lexing results or can be reused to lex something else.

Throws a LexException if the source contains an error.

The filename from which the source originated can be provided so the error messages upon any lexing errors can report the filename.

string[] uniqueSymbolNames()
Returns an array of all valid symbol names. If more than one Symbol exists with the same name, the name is only included in the array once.
Symbol[] symbolsByName(string name)

Returns an array of every Symbol with the name "name".

Note that GOLD, and therefore Goldie, allows multiple symbols with the same name as long as each symbol is of a different SymbolType.

SymbolType[] symbolTypesByName(string name)
Just like symbolsByName except this only returns the SymbolType of each symbol, rather than the Symbol itself.
string symbolTypesStrByName(string name)
Like symbolTypesByName, but returns a human-readable list in string form.
bool isSymbolNameValid(string name)
Returns true if at least one Symbol exists with the given name.
bool isSymbolNameAmbiguous(string name)

Returns true if more than one Symbol exists with the given name.

Note that GOLD allows multiple symbols with the same name as long as each symbol is of a different SymbolType.

int ruleIdOf(string parentSymbol, string[] subSymbols...)

Returns an index into ruleTable given the name of the reduction symbol and the names of the symbols being reduced.

For example, if your grammar has a rule like this:

<Add Exp> ::= <Add Exp> '+' <Mult Exp>

Then you can retrieve the corresponding Rule like this:

myLang.ruleTable[ myLang.ruleIdOf("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>") ]

Throws if such a rule doesn't exist or if any of the given symbol names are ambiguous (ie, if more than one symbol exists with the given name).

Note: This is just a quick-n-dirty implementation at the moment. It works, but it might run slow.

string ruleToString(int ruleId)

Returns the given rule in a human-readable string. Note this might not actually be valid code for the grammar description language.

For example, if your grammar has a rule like this:

<Add Exp> ::= <Add Exp> '+' <Mult Exp>

Then:

auto ruleId = myLang.ruleIdOf("<Add Exp>", "<Add Exp>", "+", "<Mult Exp>") assert(myLang.ruleToString(ruleId) == "<Add Exp> ::= <Add Exp> + <Mult Exp>");

module {user-specified package}.lang

{languageName} = Name of static-style language

This is the static-style counterpart to Language, generated by the StaticLang tool.

If the name of a language is foo (for example), then the name of this class will be Language_foo.

All of the Language members are available, but alternate versions are added.

static enum string staticName
static enum string staticVer
static enum string staticAuthor
static enum string staticAbout
static enum bool staticCaseSensitive

static immutable Symbol[] staticSymbolTable
static immutable CharSet[] staticCharSetTable
static immutable Rule[] staticRuleTable
static immutable DFAState[] staticDFATable
static immutable LALRState[] staticLALRTable

static enum int staticStartSymbolIndex
static enum int staticInitialDFAState
static enum int staticInitialLALRState
static enum int staticEofSymbolIndex
static enum int staticErrorSymbolIndex

static immutable string[] staticUniqueSymbolNameArray
static enum Symbol staticEofSymbol
static enum Symbol staticErrorSymbol

static bool staticIsSymbolNameValid(string name)
static bool staticIsSymbolNameAmbiguous(string name)
Compile-time counterparts to the corresponding Language members.
static enum string packageName
static enum string shortPackageName
If StaticLang was passed, for example, --pack=myApp.langs.myLang, then packageName is "myApp.langs.myLang" and shortPackageName is the name of the language: "myLang".
static enum string langInstanceName
static enum string langClassName
static enum string lexerClassName
static enum string parserClassName
static enum string tokenClassName
The names of the various classes StaticLang created for this language. For example, "language_myLang", "Language_myLang", "Lexer_myLang", "Parser_myLang", and "Token_myLang".
Parser_{languageName} parseFile(string filename)
Parser_{languageName} parseCode(string source, string filename="")
Parser_{languageName} parseTokens(Token[] tokens, string filename="", Lexer lexerUsed=null)
Lexer_{languageName} lexFile(string filename)
Lexer_{languageName} lexCode(string source, string filename="")

Type-safe static-style counterparts to the respective "X"-suffixed lex and parse functions in Language.

module {user-specified package}.lang

Language_{languageName} language_{languageName}

A pre-instantiated instance of a Language_{languageName}, generated by StaticLang and only created for static-style languages.

For example, if the name of a language is foo, then the declaration of this will be:

Language_foo language_foo;