## vim: nowrap tabstop=2 shiftwidth=2 expandtab textwidth=10000 #acl All:read <> {{{#!wiki comment We are a week slow this semester, and I don't think it is right for me to toss this ZOBOS project on your shoulders. It will be a heavy lift like WRECK, and I feel it is too late in the semester for that. (The project CZAR will be a heavy lift too, but it is worth a good portion of your grade, and we're closer to on schedule with CZAR.) On the other hand, I have the impression there are students that like the challenge of these projects and genuinely like the idea of building a compiler from the ground up. So I feel obligated to make ZOBOS available, and indeed grade it. So: ZOBOS will go in my grades database as a '''beneficial''' grade. It can only help your overall course grade, if it doesn't help your grade I just ignore it. }}} {{{#!wiki important /* 1. The [[#compgrading|grader tarball]] is finally available.*/ 1. A video walk through [[https://youtu.be/Utym7L_VbuY|is here]]. 1. [[Assignments/ZOBOS/zlang|The zlang page]] has language details, nuances, and caveats you might be interested in. 1. <>You can download all the resources used in this writeup: <> or <>.<
>The tarball contains source files (`.src`, `.syn`), token streams (`.tok`), AST PDFs, and verbose and `OUTPUT` filtered `stdout` listings (`.vrb` and `.out`) respectively. '''Read the `README.txt`!''' 1. Here is [[#|the change log|&action=info]] for this assignment write-up. I will try to be descriptive in my log messages. 1. You can also [[#|subscribe|&action=subscribe]] to this page and receive Emails when changes are made. }}} In this assignment will you will implement a syntax checking SLR(1) parser, a symbol table(s), and perform semantic analysis on a simple programming language grammar. The name of your application must be `ZOBOS` (case sensitive). = Input = `ZOBOS` will accept three command line parameters: || Argument || Value || Description || || 1 || `program.tok` || The filename containing a '''token stream''' to be parsed, the format is identical to that produced by [[../LUTHOR|LUTHOR]] || || 2 || `ast.dat` || The filename for storing your [[#aststep|AST]] || || 3 || `symtable.dat` || The filename for storing symbol tables [[#symtablestep|when instructed]]. || = Coding = `ZOBOS` will do do the following: 1. Your compiler will parse a [[Assignments/LUTHOR#tokenstream|token stream]] (the output of [[Assignments/LUTHOR|LUTHOR]]) with the LR <> algorithm (<>), using <> (the format is described in <>) and the <> grammar. Syntax error messages will be produced in a [[#errorfmt|consistent manner]] to facilitate grading. {{{#!wiki important * If program source has a syntax error, the remaining steps for `ZOBOS` are not to be performed and `ZOBOS` should `exit()` with a non-zero exit status. }}} 1. Either during the parse (using '''syntax directed translation''' procedures during production rule reduction, <>, <>), after the parse using '''visitor patterns''', or a combination of these methods; `ZOBOS` will construct an ''abstract syntax tree'' with the following properties a. all `EXPR` (sub)trees will be simplified so that i. leaves are either literals (`intval`s, `floatval`s) or variables (identifiers holding a value, not function names) i. the root and internal nodes are either one of the operations associated with the non-terminals `BOOLS`, `PLUS`, `MULT`, `UNARY`, or `CAST`, a `FUNCALL` node or a [[#domain|node]]. a. <> control structures `IF`, `IFELSE`, `WHILE`, `DOWHILE`, `STEPS`, `SOLOSTMT`, and `BRACESTMTS` will be simplified '''in the spirit''' of textbook figure 7.15 (<>) --- essentially lose the terminals, keep the functional children (blocks of statements, predicate expressions). {{{#!wiki important Simplification of all other grammar constructs are at the student's discretion and inclination. They will not be graded.<
> There are example `.src` and AST PDFs in the `zobos-resources` [[#zobosresources|archives for download]]. Outside of the rules listed above, '''you are not required''' to duplicate the ASTs provided, they serve as examples! }}} 1. <> `ZOBOS` will write to disk a [[#treeformat|simple tree representation]] of its AST. 1. `ZOBOS` will then use a scope respecting symbol table and visitor pattern(s) to perform several [[#semanticchecks|semantic checks]] on the variable, expression and function use within the parsed program source. a. Warnings or error messages will be emitted in a [[#errorfmt|consistent and simple manner]] to facilitate grading. a. <> When the special `EMIT` form of `symtable` is encountered during this process, your `ZOBOS` will write the symbol tables for all scopes to disk in a [[#symtableformat|simple format]] to facilitate grading. a. When a special `domain` node is encountered, the domain of its expression tree child should be emitted in a [[#domain|in a format similar to ERRORs]] ''to facilitate grading''. The possible domain values are `int`, `float`, `string` or `bool`. 1. Lastly, `ZOBOS` will `exit(0)` if there were '''no errors''' detected in the program, and `exit(1)` otherwise. {{{#!wiki tip `WARNING` and `DOMAIN` messages are not `ERROR` messages }}} == The ZOBOS language == The language compiled for this project is "C-like", but with some caveats and nuances. [[Assignments/ZOBOS/zlang|See this page]] for details. {{{#!wiki important You may either hard code the LR parsing table and grammar rules into your `ZOBOS` application or read `zlang.lr` and the grammar rules from disk on each invocation. '''If the latter''', include `zlang.lr` and (or) `zlang-pure.cfg` in your submitted archive along side `Build.sh` or the `Makefile`; which will probably be where ever your `ZOBOS` is built. See [[Assignments/Requirements|this page]] to recall submission details. }}} == Syntax and Semantic Checks == <> Your `ZOBOS` compiler will inspect the `program.tok` for these types of errors and inconsistencies. {{{#!wiki tip Keep in mind that we often clump all errors into the ''syntax error'' pigeonhole. Technically, '''syntax errors''' occur only during the parse when an invalid pattern of terminals is detected. The other errors that compilers find for us, and ''most of them'' in this project, are '''semantic''' errors and inconsistencies. }}} {{{#!wiki important Our language does not have "first class" functions, and functions won't be treated as identifiers holding a "value". The language has data types `bool`, `int`, `float` and `string`; and then it has functions. Think C/C++, not Python, !JavaScript, Scheme or Haskell. }}} {{{#!wiki comment We output `domain` information in the same style as syntax and semantic errors simply to facilitate grading. `domain` operators are not syntax or semantic errors. }}} || Type || <>ID || Description || Terminal Location to report (line and column) || || SYNTAX|| SYNTAX || Any syntax error during the parse || Token causing error, or last token successfully processed || || ERROR || EXPR || Expression tree operand error (see the [[#vtypeerror|expression errors table]]) || Location of the `PLUS`, `TIMES`, `mod` or `UNARY` terminal operating on the erroneous value or variable || || WARN || REIDENT || <>Attempting to re-declare an identifier || '''Each''' occurrence of identifier in re-declaration || {{{#!wiki comment SEMESTER_FILTER || ERROR || CALL || Invoking a variable identifier as a function, using the wrong number of arguments in a function call || Location of the function identifier being invoked || || WARN || UNUSED || <>The identifier is not used ('''correctly or incorrectly''') in an expression or assignment '''anywhere''' within its scopes. The `returns` [[#returnsvar|variable of a function]] is '''implicitly''' used (since the function will return the value it stores). In other words, the identifier is not found ''to the right'' of `=`, or in an `emit` or `rand` statement (see [[Assignments/ZOBOS/zlang#emit|emit]] and [[Assignments/ZOBOS/zlang#rand|rand]] details). || Location of identifier in its declaration || || WARN || CONST || Attempting to store a value in a variable with the `const` attribute, or re-defining a function || Location of identifier name on LHS of `assign` terminal '''or''' location of identifier name as an argument of a `rand` statement (see [[Assignments/ZOBOS/zlang#rand|rand]] for details). || || WARN || UNINIT || <>Using a '''declared''' identifier in an expression before it has been initialized with a value. In other words the variable is not found ''to the left'' of `=` '''or''' to the right of a `rand` statement before it is used (see [[Assignments/ZOBOS/zlang#rand|rand]] for details). '''Undeclared identifiers don't have UNINIT warnings.''' || Location of identifier in expression || || ERROR || NOIDENT || Using an undeclared identifier || Location of identifier name, '''each''' occurrence on a separate `OUTPUT` line || || ERROR || CONV || Value conversion error (see the [[#vtypeerror|conversion errors table]] and [[#emitrand|EMIT and RAND details]]) || Location of the `assign` terminal, '''or''' the function identifier location in a function call with an erroneous parameter conversion. || }}} {{{#!wiki tip Many types of semantic checks could be performed, so many in fact that only a few are used for each ZOBOS project per semester. You might read elsewhere in the write-up conditional statements such as ''"If required by the semester write-up, such-and-such would be a FOOBAR warning."'' Depending on the semester these statements are either important or can be safely ignored. The table above is the '''authoritative semester list of warnings and errors your ZOBOS should report'''; use it to guide your implementation. }}} == Symbol Table Contents and Functions as (sort of a) Type == While functions are not first class objects in our language, we will adopt semantics around function prototypes and definitions so that `ERROR` and `WARN` messages, as well as the symbol table contents don't have to be specialized for function types. A symbol table should have the following (minimum) attributes per entry for `ZOBOS` success: * location (line and column reported for the identifier token, see [[Assignments/ZOBOS/zlang#funcloc|here for function definitions and prototypes]]), * identifier (name), or a reference into a ''namespace'' if you've chosen that approach * type (see [[Assignments/ZOBOS/zlang#funcsig|here for function "types"]]), * a `const` flag, * a "used" or "unused" flag, * an "initialized" or "uninitialized" flag. {{{#!wiki tip The last three symbol attributes, `const`-ness, used vs unused, initialized vs uninitialized, may not be needed in your implementation. It depends on the [[#sewtable|types of semantic checks]] required for this semester's ZOBOS project. }}} When an identifier for a function is encountered, its value in the symbol table depends on the source context: 1. When a function prototype is encountered, the symbol is considered '''not `const`''', but '''already initialized'''. 1. When a function definition is encountered the symbol has its `const` flag set ('''or''' is created with its `const` and `initialized` flag true). These semantics mean your `ZOBOS` does not have to have different logic for processing variables versus functions (aside from the initial management of function symbols just described). The lion's share of function specific code producing output is for `CALL` error messages. {{{#!wiki comment SEMESTER_FILTER === Conversion Errors (CONV) === <>Attempting to perform these types of data storage causes a `CONV` `ERROR`. || Value Type || <> Stored into ... || || `string` || `int`, `float` or `bool` || || `float` || `int`, `bool` or `string` || || `bool` || `float` or `string` || || `int` || `bool` or `string` || || `string`, `float`, `bool` or `int`|| A function identifier (see [[#functionstorevar|here for the flip-flop]]) || === <>EMIT and RAND Conversion Errors === The `emit` and `rand` statements in [[Assignments/ZOBOS/zlang|zlang]] have particular type semantics that must be met, otherwise a `CONV` error should be reported. || Statement || Type Semantics || || `emit id, offset, length` || The `id` argument of `emit` must be a `string` type, `offset` and `length` must be `int` types (either variables or expressions). || || `emit (expr)` || `expr` must be of type `bool`, `int` or `float` || || `rand id` || `id` must be a variable of type `bool`, `int` or `float || All other argument type patterns for `emit` and `rand` should generate `CONV` errors, the location value should be the line and column of the offending identifier or the line and column of the root operation in the case of an arithmetic expression. }}} {{{#!wiki comment SEMESTER_FILTER }}} === Expression Value Propagation Rules and Errors (EXPR) === See [[Assignments/ZOBOS/zlang#exprtree|this section]] in [[Assignments/ZOBOS/zlang|zlang]] for a description of how the ''type of an expression'' propagates up an expression tree to its root. The remaining discussion is specific to `EXPR` semantic checks. || Expression Operation Rules (generate `EXPR` `ERROR`). || || unary `PLUS` and binary `PLUS`, `TIMES`, and `BOOLS` (`<`, `<=`, `==`, `>=`, `>`) cannot be used on `string` or `bool` values || || `mod` (modulo arithmetic) operation must be used on `int` values (both operands) || || `compl` (bitwise complement) unary operation must be used on an `int` value || || `not` (logical not) unary operation must be used on a `BEXPR` or `bool` value || || <>treating a function identifier as if it were a value (variable) identifier || Is it ever safe to stop looking for `EXPR` errors in poorly formed expression trees? Evaluation of an expression tree works from the bottom up (recursion is used to get ''down to the bottom''). 1. Each node should look for `EXPR` errors under each child branch. 1. When an `EXPR` error occurs in any child, a parent node can no long evaluate the correctness of its operation and so should not emit any `EXPR` errors. 1. If none of a parent node's children report `EXPR` errors, the parent should evaluate the correctness of its operation. 1. Except for the root, all nodes communicate either an error condition or a resultant value type for its operation to its own parent. 1. `domain` nodes in an expression tree simply propagate their child node's value type (domain) or an `EXPR` error condition to their parent. DOMAIN OUTPUT lines should only be generated when there are no `EXPR` errors from the child node. {{{#!wiki comment SEMESTER_FILTER (comment was important) Keep in mind that all identifiers on the RHS of an assignment (`=`) should be considered "used", whether their use is semantically correct or not, see [[#sewunused|UNUSED]]. Likewise all non-function identifiers on the LHS of an assignment (`=`) should be considered "initialized", whether the RHS is semantically correct or not, see [[#sewunused|UNINIT]]. }}} = OUTPUT = `ZOBOS` will report four types of data to three different files. == Syntax, Warning and Error Messages == <>These will be written to system `stdout` and '''must be''' prefixed by the token `OUTPUT` as described by [[Assignments/Requirements#OUTPUT|the course submission requirements]]. These messages will have four critical parts emitted in this order: 1. the type of message `:TYPE:`, enclosed with colons<> where `TYPE` is one of (all capitals) `SYNTAX`, `ERROR`, or `WARN`. 2. the line number of the error in the source (see ''Terminal Location'' column of [[#sewtable|syntax, error, and warning table]]) 3. the character number of the error in the source (see ''Terminal Location'' column of [[#sewtable|syntax, error, and warning table]]) 4. the [[#sewtable|ID]] for the message type The line and character number come from the `program.tok` '''token stream'''. For a `SYNTAX` error it is the location of the offending token ''or'' the the last good token ''iff'' the syntax error occurs because there are no more tokens to be parsed. For `WARN` and `ERROR` messages, the location to report is detailed [[#sewtable|here]]. === Example Syntax Error Output === Since we abort all tasks on a syntax error, there should only be one `SYNTAX` error ever printed on `stdout`. You '''are permitted to print other information''', just don't put it on the required `OUTPUT` line: ||`danglingbrace.vrb` (the source `danglingbrace.syn` is available in [[#zobosresources|the ZOBOS archive files]])|| ||<>|| {{{#!wiki important You aren't required to print out anything beyond the data on the `OUTPUT` line. }}} == Expression Tree DOMAIN Messages == These will be written to system `stdout` and '''must be''' prefixed by the token `OUTPUT` as described by [[Assignments/Requirements#OUTPUT|the course submission requirements]]. {{{#!wiki comment We output `domain` information in the same style as syntax and semantic messages simply to facilitate grading. `domain` operators '''are not''' syntax or semantic errors. }}} The following are examples of DOMAIN messages from `domain.vrb`: ||DOMAIN lines from `domain.vrb` (the source `domain.vrb` is available in [[#zobosresources|the ZOBOS archive files]])|| ||<>|| The line and column number for `DOMAIN` messages are the location of the `domain` terminal in the source and provided to your parser in the token steam file (`.tok`). == Abstract Syntax Tree == <> {{{#!wiki important You have the option of writing a `dot(1)` instruction file '''instead of''' the format described below. The `grader.sh` script will auto-detect the format you use. }}} `ZOBOS` will write its abstract syntax tree to the filename specified by the second command line parameter. The format of this file is in two parts. The '''first part''' is a line by line declaration of a tree node `id` and `name`, separated by white space. If a name is not provided the `id` is used. An '''empty line''' separates the the first part from the second part. The '''second part''' is a line by line description of tree edges. The first id on the line is the parent, subsequent ids are the children in left to right order. If a parent id is mentioned more than once, children are simply arranged in left to right, file order. The following two AST descriptions in this format yield the same tree graph: || First AST Description ||<:> Second AST Description ||<:> `dot()` Generated Graph || ||<^><>||<^><>||<^:>{{attachment:astfmt1.svg}}|| || Source (helloworld.src) ||<:> AST Description (helloworld.ast) || ||<^><>||<^><>|| || `dot()` Generated Graph || ||<^:>{{attachment:helloworld.svg}}|| {{{#!wiki tip The example above suggests an easy way to create unique identifiers for all AST nodes: the digit patterns are simply the path to each node, and each path element is the child index (with the root node being the exception, its identifier is simply `0`). }}} == Symbol Tables == <>During symbol table generation and the search for semantic warnings and errors, if `ZOBOS` encounters the `emit symtable` statement all the symbols for all scopes should be written to the '''third command line argument'''. The format for this output is as follows: 1. The '''global''' or deepest scope will be considered scope zero (`0`), each nested scope within it is assigned the next sequential scope value. 2. Each symbol should be written on one line, and contain the following '''comma delimited fields''': `scope value`, `type`, and `id`. a. The scope value is a non-negative integer, `0` is used for the deepest or "global" scope. a. `type` is the sequence of grammar terminals associated with a variable's `DECLTYPE` or it is the '''function signature''': {{{ returnType//(param1type(/param2type(/param3type(/...)))) }}} eg, the function signature for `speak` in the `helloworld.src` example above is simply `int//`. a. The `id` is the identifier of the symbol. {{{#!wiki important 1. Yes, it is possible to have scopes without any symbols declared within them. 1. Within the same scope, if an identifier is attempted to be re-used, a. If required by the semester write-up a [[#sewtable|REIDENT]] error should be generated a. Regardless of error generation, the attempted re-declaration '''is ignored'''. }}} All of the `.src` files with `emit symtable` instructions in the [[#zobosresources|zobos resource archives]] have `.sym` files provided as well. Here is an example of program source with multiple scoping levels and the `emit symtable` output. || Source (symtable-2.src) ||<:> Symbol Table Dump (symtable-2.sym) || ||<^>< 9.81 ) {\n c = bool(f*f-int(c));\n emit symtable;\n }\n emit i, 0, 15;\n }\n emit m, int(pi), 5;\n }\n}\n\n}}})>>||<^><>|| = Hints and Testing = 1. You know you're going to dump symbol table debug info during development, so you might as well format it according to the requirements in the first place. == grader.sh == <> I am providing to students the same tarball the grader will use for testing your `ZOBOS`. Here is how to use it: First, download <> to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory. {{{ $ ls zobos-student.* zobos-student.tar.bz2 $ mkdir ~/tmp $ cd ~/tmp $ tar xjf ../zobos-student.tar.bz2 : $ ls zobos/ }}} Second, set the COMPGRADING environmental variable with: {{{ $ source ~khellman/COMPGRADING/setup.sh ~khellman/COMPGRADING }}} Now go to the directory holding your `ZOBOS` and execute the `grader.sh` script from the `zobos-student.tar.bz2` resource. {{{ $ cd ~/compilers/zobosCC $ ls ZOBOS ZOBOS $ ~/tmp/zobos/grader.sh : : : }}} You will need to read any messages from the script carefully, and perhaps hit `ENTER` several times throughout its course. This script checks for: a. missing data files a. truncated data files a. inaccessible data files a. and the difference between your `ZOBOS` results and expected results. The latter test results are displayed on the terminal screen, but they whip by pretty quickly so you may need to scroll up and make sure you see them all. When there is a discrepancy in expected results the script either: * points you to a data file along side your `ZOBOS` where more failure details are available, * or all the details are displayed on screen. Before the `grader.sh` terminates, it will show a summary of some specific rubric line results for the programming project. {{{#!wiki important The grader will use this very same script, possibly with some additional testing `.src` to check your submitted work. If your `ZOBOS` flies through without a hitch, you can likely suspect a good grade for the assignment. }}} = Submit Your Work = <> <> = Rubric = This work is worth 120 points. ||'''Requirements''' || '''Points''' || '''Notes''' || || Meets [[Assignments/Requirements|compilers course project requirements]] || 10 || || || Basic I/O and exit status requirements || 10 || exit status non-zero on unreadable input files '''or''' un-writable output files || || Detect and report `SYNTAX` errors correctly || 20 || exit status should be non-zero when `SYNTAX` errors are detected || || Output of AST graph information || 10 || || || AST simplification of control structures `IF`, `IFELSE`, `WHILE`, `DOWHILE`, `STEPS` || 10 || This '''requires''' valid AST graph output! || || DOMAIN requirements || 20 || || || `emit symtable` and `symtable.dat` requirements || 20 || || || `ERROR` class semantic issues properly detected and reported || 10 || exit status should be non-zero when `ERROR` issues are detected; 75% correct and no extra messages for full credit || || `WARN` class semantic issues properly detected and reported || 10 || exit status should be zero when only `WARN` issues are detected; 75% correct and no extra messages for full credit ||