Goldie Homepage

Documentation (v0.3)
Goldie -> Documentation (v0.3) -> API Overview

API Overview

Import

Importing is simple:

import goldie.all;

Conventions Used By Goldie

Line and Column Numbers

All line numbers and column numbers are internally stored and treated by the API as zero-indexed and displayed to the user as one-indexed.

When Goldie refers to a "column number", it really means "the number of characters (ie, UTF code-points) from the start of the line". This behavior is more reliable and more useful than a true "column number" because:

Lexing, Parsing and Semantic Analysis

What many people refer to as "parsing" is really three separate steps: Lexing (or "Lexical Analysis"), Parsing (or "Grammatical/Syntactical Analysis") and Semantic Analysis.

  1. Lexing: This separates the source into a series of tokens. For instance, int numApples = 10 gets converted into "Keyword 'int', Identifier 'numApples', Equals sign, Number 10". Goldie does this in the Lexer class by using a DFA. Lexers are also sometimes called tokenizers and scanners. You can view the result of this step using Parse and JsonViewer.

  2. Parsing: This arranges the lexed tokens into a tree. The structure of the tree is based on the rules in the language's grammar. Goldie does this in the Parser class by using an LALR algorithm. You can view the result of this step using Parse and JsonViewer.

  3. Semantic Analysis: This step is generally NOT performed by automatic parsers (such as Goldie, YACC, Bison, or ANTLR). The user of such tools has to perform this step their self because it's not as easily formalized as lexing or parsing.

    In this step, the parse tree generated from the parsing step is analyzed and actual meaning is interpreted. This often involves extra error checking. For instance, in statically-typed languages, the type system exists in the semantic analysis phase. This step is also where type-mismatch errors and "undefined function/variable" errors are generated.

    This step can, but doesn't have to, involve constructing an AST (Abstract Syntax Tree). An HTML or XML DOM is an example of an AST. For another example, see the output of GenDocs's /ast flag. See the GenDocs source for an example of lexing/parsing with Goldie and then constructing an AST tree and performing semantic analysis.

Tokens, Symbols, and Symbol Types

A Token can be thought of as an instance of a Symbol. A Token is part of the parsed source, and a Symbol is part of the grammar.

For example, consider this grammar: Word = {Letter}+ <Sentence> ::= <Sentence> Word And this source: Hello world

These are the Tokens and Symbols:

WordSymbolThis Symbol's type is SymbolType.Terminal.
<Sentence>SymbolThis Symbol's type is SymbolType.NonTerminal.
HelloTokenThis Token's Symbol is Word.
worldTokenThis Token's Symbol is Word.
Hello worldTokenThis Token's Symbol is <Sentence>.

Note that there are more symbol types than just Terminal and NonTerminal (the SymbolType documentation explains this). So do NOT check if a Symbol or Token is a SymbolType.NonTerminal by comparing the type with SymbolType.Terminal. Just because something isn't a SymbolType.Terminal does NOT imply that it's a SymbolType.NonTerminal.

Simple Examples

For simple examples of how to use Goldie, see the Included Sample Apps.