Differences between revisions 32 and 33
Revision 32 as of 2020-08-25 04:36:35
Size: 29755
Editor: anonymous
Comment:
Revision 33 as of 2021-04-19 06:49:06
Size: 22208
Editor: khellman
Comment: updates for 2021
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
{{{#!wiki comment
FIXME: incorporate an FAQ style section with these:
keith: Question on babylon...why is guess not involved in errors? Seems like on line 2, it should be a NOVAR, and from there, there should be more since it doesn't have a type
me: In zlang, the line float x = guess = 3.14...; is declaring two variables x and guess that are initialized to the same value. DECLLIST →DECLTYPE DECLIDS means all the identifiers right of DECLLIST are being declared of the same type. One could also have written float x = 3.14, guess=3.14; with the same result (save that 3.14 is not the same as the longer floatval in the source).

Don't use `babylon.src` in resource tarball just the babylon-{1,2,3}.src. Clarify in write-up only one `emit symtable` will be used per test.

keith: So is it just me, or did the SYNTAX error tests magically go from 15 to 10?
me: 10 in the rubric is the weight of SYNTAX line item compared to others. The numbers in the grader.sh refer to specific tests performed - they are absolute counts.

keith:The string literal in helloworld.tok is tokenized as ("hello world") with quotes, but the example helloworld.ast on the assignment writeup prints the token as (hello world) without quotes.
Is this intentional?
me: Not intentional, you can include quotes or not.

keith: Do our warnings/errors need to match your output ordering? If so, on UNUSED, what's the order wanted? Order of declaration? Alphabetical?
(If so, it seems like onecast.out is order-of-declaration, but unused.out is alphabetical I think? Def. not order of declaration)
me:Order is not important, they are sorted before comparisons

keith: Question: babylon.out has this line: :ERROR: 7 13 :CONV: (so a conversion error on line 7). However, line 7 of babylon.src is bool gogogo = 1; (setting a bool equal to an int) -- but isn't that allowed? In the writeup, it just says float and String aren't allowed to be assigned to a bool, but int is ok.
Nvm, Rena helped me out. I read the table backwards
me: For those who might have the same question, the row for "bool" isn't saying that you cannot store float & String as a bool, it's the other way around: you can't store a bool value as a float or a string.

keith: are "{" and "}" fine in-place of "open" and "close" for BRACESTMTS in the ast?
me:Yes

jackson.garner:If I had program with just "int x = x" what kind of warnings/errors should occur? Would that just print an UNINIT warning?
Or would it be Novar? If it's NOVAR, would "int x; x = x" be UNINIT?
me: Yes, and the SYNTAX error for a missing sc ;)

keith: And, if you look at the source code...
int a = b = 3;
int t, u;
t = u = w = 2;
int z = q + 1 * x + -q;
u = b = a;
b = t*z;
other: isn't it used on line 6 to declare u?
me: Yes, b is on the rhs of an =, so it is used.

keith: Are int/float type mismatches allowed on operators besides PLUS and TIMES? For instance, 3 < 3.14 would seem to make sense from a semantics point of view, and since the result is a bool, it appears to be consistent with the instructions?
Rephrasing; is one child being of type int and the other of type float allowed for all binary operators? Or just PLUS and TIMES operators?
me: Not for modulo, otherwise yes (note I'm not calling assignment a binary operator). Is the write-up misleading with respect to this, or just not clear enough?

keith: Is conversion error (CONV) only used in the case of int i = 3.14 (i.e. initialization of a different type than declared), or should casts be checked as well? So out of: a) int i = 3.14 b) int i = bool(1) c) int i = int("ZOBOS") Which should throw conversion errors? Just (a) and (b), or should (c) as well?
me:Casts do not cause CONV errors.

keith:Just to make sure: Our simplified parse tree may look a bit different from yours. For instance I don't simplify ASSIGN nodes in the same way yours does because I made some different decisions. This is acceptable, provided we do simplify expression trees and certain other particular constructs you state in the description (of which ASSIGN is not one) in the described fashion.
me:The only requirements for AST are for expressions and control structures (like the wiki says).
Appears to be an area Internet issue so I'll be making lots of typos on my phone ...

keith:Also, if there's a string "hello world", is "x22hellox20worldx22" acceptable or do we need to ensure the string is represented in standard human readable fashion on our output?
me: Strings in ast with encoding ok.

keith:do we have to verify the type of the value on an assignment? For instance, if a line of code was int x = 3.14159, should we throw a warning or just ignore it?
me: Yes, examples should be in conv-1.src

keith:Also, does our symbol table ever have to care about the value of a symbol? It seems that, at least for this assignment, no, because we're only verifying types, not trying to evaluate the result (i.e. we only care about int + int --> int, not that the value of this operation returns say, 5.)
me: Correct.

keith:If there is an EXPR error, should the variable(s) remain uninitialized? (If so this would obviously cause UNINIT warnings)
Also if there is a NOVAR error in an expression tree, this results in a lack of type when the parent operation is evaluated, so should this then cause an EXPR error?
me:EXPR errors don't cause variables to be unitialized, if it says x =, then the programmer's intent was to initialize the variable. This also keeps errors from cascading :)
me:Grrr. You caught me. Yes a NOVAR causes an EXPR ERROR STATE. I'm gonna go double check this is what my compiler does ... back in a tick.
Yes, this is indeed what my ZOBOS does (whew). An example is in the ZOBOS archive file under garrett-1.src.
keith:So to clarify we are not supposed to output the EXPR error?
me:Correct, because it wasn't caused by one of the semantics in the Expression Operation Rules (generate EXPR ERROR) table. It was caused by a NOVAR.

keith: When we run into an error state in knitting needles, how do we pick a token to associate the syntax error with? Would it just be the first terminal on the right hand side of the knitting needles?
me:It is "Token causing error, or last token successfully processed". So yes it is the grammar symbol on the "right-side" needle, iow the symbol for the column in the SLR table where you discover an empty cell instead of a sh-X or r-r action. If it is a syntax error because you've run out of tokens (for instance a missing ; for the one liner "int x") then it is the last valid token processed, so x in the one liner example. HtH

keith:We're checking for type conversion errors on assign nodes. What about on casts?
me:the "output" of casts yes. So bool b = float(3) should fail, but bool c= bool(float) is OK, because casts are by definition magical and deadly :)

keith:When determining what types should be passed up in expressions, should we be using basic programming judgement (ie int plus int = int, float + int = float)?
me:
Yes, this is in the write-up (bullet 3 under Expression Propagation Rules)

keith: should string a, b, c = "wat", d; parse / be valid? Also string a = b = c = "wat", d;
me: Yes, both are valid syntax.
}}}

Line 90: Line 8:
 1. -- [[khellman]] <<DateTime(2020-05-08T10:17:01-0700)>> Yay, my Internet is back (newly rekindled disdain for centurylink); add [[#compgrading|grader.sh]] instructions and link.
 1. -- [[khellman]] <<DateTime(2020-05-06T13:27:15-0700)>> Correct rubric 'WARN' description
 1. -- [[khellman]] <<DateTime(2020-05-05T20:21:54-0700)>> `garrett-1.src` `NOVAR` and `EXPR` error clarification.
 1. -- [[khellman]] <<DateTime(2020-05-05T15:29:15-0700)>> Correct some typos (''conflicts'' instead of ''conflict'', `x0A` instead of `xA0`.
 1. -- [[khellman]] <<DateTime(2020-05-05T12:50:03-0700)>> Added a `README.txt` to the [[#zobosresources|ZOBOS resource archive(s)]] --- read it!
 1. -- [[khellman]] <<DateTime(2020-05-05T12:32:21-0700)>> Add `symtable` example.
 1. -- [[khellman]] <<DateTime(2020-05-05T09:15:31-0700)>> Correct '''word.ast''' and table header descriptors to actually reflect content of table/
 1. -- [[khellman]] <<DateTime(2020-05-05T02:06:56-0700)>> small clarifications to the [[#sewtable|syntax, error and warning]] table, corrected `REDECL` to `REVAR`.
 1. -- [[khellman]] <<DateTime(2020-05-04T15:22:50-0700)>> Add rubric, correct minor typo ("in" for "ins")
 1. -- [[khellman]] <<DateTime(2020-05-03T16:48:11-0700)>> Clarified (hopefully) the simplification of control structures.
 1. -- [[khellman]] <<DateTime(2020-05-03T13:18:07-0700)>> Clarified '''unary''' `PLUS` and added `BOOLS` in [[#vtypeerror|expression operation table]].
Line 107: Line 14:
In this assignment will you will implement a syntax checking LR(1) parser, a symbol table(s), and perform semantic analysis on a simple programming language grammar. The name of your application must be `ZOBOS` (case sensitive). In this assignment will you will implement a syntax checking SLR(0) parser, a symbol table(s), and perform semantic analysis on a simple programming language grammar. The name of your application must be `ZOBOS` (case sensitive).
Line 122: Line 29:
 * It turns out that for `zlang.cfg`, an empty file is a valid program, but `ZOBOS` '''won't be tested with empty source files'''. So you don't need to contemplate "is ∅ a `SYNTAX` error?"
Line 126: Line 32:
     i. leaves are either literals (`intval`s, `floatval`s) or variables
     i. the root and internal nodes are one of the operations associated with the non-terminals `BOOLS`, `PLUS`, `MULT`, `UNARY`, or `CAST`.
     i. leaves are either literals (`intval`s, `floatval`s) or variables (identifiers holding a value, not function names)
     i. the root and internal nodes are either one of the operations associated with the non-terminals `BOOLS`, `PLUS`, `MULT`, `UNARY`, or `CAST`, or a `FUNCALL` node.
Line 136: Line 42:
 1. `ZOBOS` will then use a scope respecting symbol table and visitor pattern(s) to perform several [[#semanticchecks|semantic checks]] on the variable and expression use of the parsed program source.  1. `ZOBOS` will then use a scope respecting symbol table and visitor pattern(s) to perform several [[#semanticchecks|semantic checks]] on the variable, expression and function use within the parsed program source.
Line 149: Line 55:
}}}
{{{#!wiki important
Our language does not have "first class" functions, and functions won't be treated as identifiers holding a "value". The language has data types `bool`, `int`, `float` and `string`; and then it has functions. Think C/C++, not Python, JavaScript, Scheme or Haskell.
Line 150: Line 59:
Line 152: Line 62:
|| ERROR || NOVAR || Using an undeclared variable || Location of variable name, '''each''' occurrence on a separate `OUTPUT` line ||
|| ERROR || CONV || Value conversion error (see the [[#vtypeerror|conversion errors table]]) || Location of the `assign` terminal ||
|| ERROR || NOIDENT || Using an undeclared identifier || Location of identifier name, '''each''' occurrence on a separate `OUTPUT` line ||
|| ERROR || CONV || Value conversion error (see the [[#vtypeerror|conversion errors table]]) || Location of the `assign` terminal, or the function identifier in an erroneous invocation. ||
Line 155: Line 65:
|| WARN || REVAR || <<Anchor(sewrevar)>>Attempting to re-declare a variable || '''Each''' occurance of variable name in redeclaration ||
|| WARN || UNUSED || The variable is not used in an expression or assignment '''anywhere''' within its scopes || Location of variable name in its declaration ||
|| WARN || UNINIT || Using a variable in an expression before it has been initialized with a value || Location of variable name in expression ||
|| WARN || CONST || Attempting to store a value in a variable with the `const` attribute  || Location of variable name on LHS of `assign` terminal ||
|| ERROR || CALL || Invoking variable identifier as a function, using the wrong number of arguments in a function call || Location of the function identifier being invoked ||
|| WARN || REIDENT || <<Anchor(sewrevar)>>Attempting to re-declare an identifier || '''Each''' occurrence of identifier in re-declaration ||
|| WARN || UNUSED || The identifier is not used in an expression or assignment '''anywhere''' within its scopes || Location of identifier in its declaration ||
|| WARN || UNINIT || Using an identifier in an expression before it has been initialized with a value || Location of identifier in expression ||
|| WARN || CONST || Attempting to store a value in a variable with the `const` attribute, or re-defining a function || Location of identifier name on LHS of `assign` terminal ||

FIXME: add function nuances here...
Line 162: Line 75:
||<tableclass="header"> Value Type || <<Anchor(vtypeerror)>> Stored into Variable Type || ||<tableclass="header"> Value Type || <<Anchor(vtypeerror)>> Stored into ... ||
Line 167: Line 80:
|| `string`, `float`, `bool` or `int`|| A function identifier (see [[#functionstorevar|here for the flip-flop]]) ||
Line 171: Line 85:
begins in the leaves and is implicit based on the type of literal value (`intval`, `stringval`, `floatval`) or variable is in the leaf. begins in the leaves or a `FUNCALL` and is implicit based on the type of literal value (`intval`, `stringval`, `floatval`), type of variable or the return type of a function.
Line 183: Line 97:
|| <<Anchor(functionstorevar)>>treating a function identifier as if it were a value (variable) identifier ||
Line 188: Line 103:
  1. When an `EXPR` error has occurs in any child, a parent node can no long evaluate the correctness of its operation and so   1. When an `EXPR` error occurs in any child, a parent node can no long evaluate the correctness of its operation and so
Line 192: Line 107:

[[https://matrix.to/#/!dYtEsTLbatQVNIkdnf:csci498ab.modular.im/$7hOuBhCc3-aA2puSeggfAu0Ms-QZPMNtrSW3V-1CO4k?via=csci498ab.modular.im|NOVAR]] and propagation of `EXPR` error conditions? See `garrett-1.src` in [[#zobosresources|in resource archives]].
Line 203: Line 116:
 2. the line number of error in source (see ''Terminal Location'' column of [[#sewtable|syntax, error, and warning table]])
 3. the character number of error in source (see ''Terminal Location'' column of [[#sewtable|syntax, error, and warning table]])
 2. the line number of the error in the source (see ''Terminal Location'' column of [[#sewtable|syntax, error, and warning table]])
 3. the character number of the error in the source (see ''Terminal Location'' column of [[#sewtable|syntax, error, and warning table]])
Line 213: Line 126:
||<<MiniPage({{{#!plain\nOUTPUT :SYNTAX: 7 1 :SYNTAX:\nstate /1/ input ('rbrace', '}') expected bool,const,emit,float,id,if,int,lbrace,string,while,$\n}}})>>|| ||<<MiniPage({{{#!plain\nOUTPUT :SYNTAX: 2 14 :SYNTAX:\nParsing error in state 64; input: ['sc', ';'],['lbrace', '{'],['if', 'if']; expected=lbrace\n\n}}})>>||
Line 221: Line 134:
After grading the `LG ptvis` learning group check this weekend, I realized many groups are using the [[https://graphviz.org/|graphviz.org]] tools for generating their parse trees. Therefore, you have the option of writing a `dot(1)` instruction file '''instead of''' the format described below. I'll make sure the `grader.sh` script detects these cases correctly. You have the option of writing a `dot(1)` instruction file '''instead of''' the format described below. The `grader.sh` script will auto-detect the format you use.
Line 229: Line 142:
||<^><<MiniPage({{{#!plain\nA A\nB B\nC C\nD D\n\nA B C D}}})>>||<^><<MiniPage({{{#!plain\nA A\nB B\nC C\nD D\n\nA B\nA C D}}})>>||<^:>{{attachment:astfmt1.svg}}|| ||<^><<MiniPage({{{#!plain\nA A\nB B\nC C\nD D\n\nA B C D\n\n}}})>>||<^><<MiniPage({{{#!plain\nA A\nB B\nC C\nD D\n\nA B \nA C D\n\n}}})>>||<^:>{{attachment:astfmt1.svg}}||
Line 232: Line 145:
||<^><<MiniPage({{{#!plain\nstring word = "hello world";\nemit word 0 11;\n}}})>>||<^><<MiniPage({{{#!plain\n0 PROGRAM\n0-0 STATEMENT\n0-0-0 DECLLIST\n0-0-0-0 string\n0-0-0-1 DECLID\n0-0-0-1-0 =\n0-0-0-1-0-0 word\n0-0-0-1-0-1 hello world\n0-1 STATEMENT\n0-1-0 EMIT\n0-1-0-0 word\n0-1-0-1 0\n0-1-0-2 11\n0-2 $\n\n0-0-0-1-0 0-0-0-1-0-0 0-0-0-1-0-1\n0-0-0-1 0-0-0-1-0\n0-0-0 0-0-0-0 0-0-0-1\n0-0 0-0-0\n0-1-0 0-1-0-0 0-1-0-1 0-1-0-2\n0-1 0-1-0\n0 0-0 0-1 0-2\n}}})>>||<^:>{{attachment:wordup.svg}}|| ||<^><<MiniPage({{{#!plain\nstring word = "hello world";\nint speak()\nreturns r = 0\n{\n emit word, 0, 11;\n}\n\n}}})>>||<^><<MiniPage({{{#!plain\n0 MODULE\n0-0 DECLLIST\n0-0-0 type\n0-0-0-0 string\n0-0-1 DECLID\n0-0-1-0 =\n0-0-1-0-0 id\n0-0-1-0-0-0 word\n0-0-1-0-1 stringval\n0-0-1-0-1-0 hello world\n0-1 FUNCTION\n0-1-0 FUNSIG\n0-1-0-0 type\n0-1-0-0-0 int\n0-1-0-1 id\n0-1-0-1-0 speak\n0-1-0-2 PARAMLIST\n0-1-1 =\n0-1-1-0 id\n0-1-1-0-0 r\n0-1-1-1 intval\n0-1-1-1-0 0\n0-1-2 BRACESTMTS\n0-1-2-0 scope\n0-1-2-0-0 open\n0-1-2-1 STMTS\n0-1-2-1-0 STATEMENT\n0-1-2-1-0-0 EMIT\n0-1-2-1-0-0-0 id\n0-1-2-1-0-0-0-0 word\n0-1-2-1-0-0-1 intval\n0-1-2-1-0-0-1-0 0\n0-1-2-1-0-0-2 intval\n0-1-2-1-0-0-2-0 11\n0-1-2-2 scope\n0-1-2-2-0 close\n\n0-0-0 0-0-0-0\n0-0-1-0-0 0-0-1-0-0-0\n0-0-1-0-1 0-0-1-0-1-0\n0-0-1-0 0-0-1-0-0 0-0-1-0-1\n0-0-1 0-0-1-0\n0-0 0-0-0 0-0-1\n0-1-0-0 0-1-0-0-0\n0-1-0-1 0-1-0-1-0\n0-1-0 0-1-0-0 0-1-0-1 0-1-0-2\n0-1-1-0 0-1-1-0-0\n0-1-1-1 0-1-1-1-0\n0-1-1 0-1-1-0 0-1-1-1\n0-1-2-0 0-1-2-0-0\n0-1-2-1-0-0-0 0-1-2-1-0-0-0-0\n0-1-2-1-0-0-1 0-1-2-1-0-0-1-0\n0-1-2-1-0-0-2 0-1-2-1-0-0-2-0\n0-1-2-1-0-0 0-1-2-1-0-0-0 0-1-2-1-0-0-1 0-1-2-1-0-0-2\n0-1-2-1-0 0-1-2-1-0-0\n0-1-2-1 0-1-2-1-0\n0-1-2-2 0-1-2-2-0\n0-1-2 0-1-2-0 0-1-2-1 0-1-2-2\n0-1 0-1-0 0-1-1 0-1-2\n0 0-0 0-1\n\n\n}}})>>||<^:>{{attachment:helloworld.svg}}||
Line 244: Line 157:
  a. `type` is the sequence of grammar terminals associated with the variable's `DECLTYPE`.   a. `type` is the sequence of grammar terminals associated with a variable's `DECLTYPE` or it is the '''function signature''':
  {{{
returnType//(param1type(/param2type(/param3type(/...))))
  }}}
  eg, the function signature for `speak` in the `helloworld.src` example above is simply `int//`.
Line 248: Line 165:
 1. Within the same scope, when a variable identifier is attempted to be re-used, an [[#sewrevar|REVAR]] error should be generated and the attempted variable redeclaration is ignored.  1. Within the same scope, when an identifier is attempted to be re-used, an [[#sewrevar|REIDENT]] error should be generated and the attempted re-declaration '''is ignored'''.
Line 254: Line 171:
||<^><<MiniPage({{{#!plain\nstring m = "helloworld";\nconst float pi = 3.1415;\nint i = 0;\nconst bool b = 1<3;\n{\n string p = m;\n int f = i;\n string i = "eyespymeashadow";\n {\n {\n bool c = b;\n emit p 0 int(b);\n if ( f > 9.81 ) {\n c = bool(f*f-int(c));\n emit symtable;\n }\n emit i 0 15;\n }\n emit m int(pi) 5;\n }\n}\n\n}}})>>||<^><<MiniPage({{{#!plain\n0,const bool,b\n0,int,i\n0,string,m\n0,const float,pi\n1,int,f\n1,string,i\n1,string,p\n3,bool,c\n\n}}})>>|| ||<^><<MiniPage({{{#!plain\nstring m = "helloworld";\nconst float pi = 3.1415;\nint i = 0;\nconst bool b = 1<3;\nint main()\nreturns r=0\n{\n string p = m;\n int f = i;\n string i = "eyespymeashadow";\n {\n {\n bool c = b;\n emit p, 0, int(b);\n if ( f > 9.81 ) {\n c = bool(f*f-int(c));\n emit symtable;\n }\n emit i, 0, 15;\n }\n emit m, int(pi), 5;\n }\n}\n\n}}})>>||<^><<MiniPage({{{#!plain\n0,const bool,b\n0,int,i\n0,const string,m\n0,const int//,main\n0,const float,pi\n1,int,r\n2,int,f\n2,const string,i\n2,const string,p\n4,bool,c\n\n}}})>>||
Line 257: Line 174:
<<DjHR(zlang.cfg)>> '''is not''' an SLR(1) language, nor is it LALR(1). This is due to the `(55) UNARY->PLUS VALUE` production rule, which causes
two '''shift-reduce''' conflicts during LR table generation. Both conflicts can be resolved
by preferring the ''shift'' action over the ''reduce''.
Because of this (and the size of the grammar), we'll use a static fixed
-up LR table that has these conflicts resolved.

<<Anchor(zlanglr)>> The <<DjHR(zlang.lr)>> LR parsing table has 107 lines with 58 comma separated fields per line (that makes '''59''' fields each line). There is no white space except for new line separators (`x0A`).
  1. The first field of the first line is a period (`.`), which serves as a placeholder. The remaining fields on the first line are the column heading grammar symbols. There are 57 grammar symbols in the language (|$T+\Sigma_{\$}|$).
  1. The remaining lines are for the 106 item set states. The first field is the '''item set index.''' Subsequent fields are the actions to be performed on the associated grammar symbol.
<<DjHR(zlang.cfg)>> is an SLR(0) language and your shared learning group codebase for generating the shift-reduce ''item sets'' and ''CFSM graph'' can be used to process `zlang.cfg`. '''However, this is not recommended''' --- this is a big language and generating the CFSM over and over is just going to slow you down. You can certainly make the process more efficient by generating the the shift-reduce table '''one time''' --- or you can use <<Anchor(zlanglr)>> the <<DjHR(zlang.lr,table I'm providing for a jump start)>>:

<<DjH
R(zlang.lr)>> has 152 lines with 70 comma separated fields per line. There is no white space except for new line separators (`x0A`).
  1. The first field of the first line is a period (`.`), which serves as a placeholder. The remaining fields on the first line are the column heading grammar symbols. There are 69 grammar symbols in the language (|$T+\Sigma_{\$}|$).
  1. The remaining lines are for the 151 item set states. The first field is the '''item set index.''' Subsequent fields are the actions to be performed on the associated grammar symbol.
Line 278: Line 193:
 1. You know you're going to dump symbol table debug info during development, so you might as format it according to the requirements in the first place.  1. You know you're going to dump symbol table debug info during development, so you might as well format it according to the requirements in the first place.
Line 328: Line 243:
The grader will use this very same script, along with some additional testing data to check your submitted work. If your `ZOBOS` flies The grader will use this very same script, possibly with some additional testing `.src` to check your submitted work. If your `ZOBOS` flies
Line 331: Line 246:
{{{#!wiki comment
In the `zobos-student.tar.bz2` tarball are also PDFs of initial `.nfa` files as well as their optimized DFAs, I find these much easier
to "visualize" than the textual `.nfa` input files or the transition table output files. You might like them as well.
}}}

  • You can download all the resources used in this writeup: zobos-resources.zip or zobos-resources.tar.bz2.
    The tarball contains source files (.src, .syn), token streams (.tok), AST PDFs, and verbose and OUTPUT filtered stdout listings (.vrb and .out) respectively. Read the README.txt!

  • Here is the change log for this assignment write-up. I will try to be descriptive in my log messages.

  • You can also subscribe to this page and receive Emails when changes are made.

In this assignment will you will implement a syntax checking SLR(0) parser, a symbol table(s), and perform semantic analysis on a simple programming language grammar. The name of your application must be ZOBOS (case sensitive).

Input

ZOBOS will accept three command line parameters:

Argument

Value

Description

1

program.tok

The filename containing a token stream to be parsed, the format is identical to that produced by LUTHER

2

ast.dat

The filename for storing your AST

3

symtable.dat

The filename for storing symbol tables when instructed.

Coding

ZOBOS will do do the following:

  1. Your compiler will parse zlang.cfg using the LR knitting needles algorithm (pseudo code here, using this SLR table (the format is described here). Syntax error messages will be produced in a consistent manner to facilitate grading.

    • If program source has a syntax error, the remaining steps for ZOBOS are not to be performed and ZOBOS should exit() with a non-zero exit status.

  2. Either during the parse (using syntax directed translation procedures during production rule reduction), after the parse using visitor patterns, or a combination of these methods; ZOBOS will construct an abstract syntax tree with the following properties

    1. all EXPR (sub)trees will be simplified so that

      1. leaves are either literals (intvals, floatvals) or variables (identifiers holding a value, not function names)

      2. the root and internal nodes are either one of the operations associated with the non-terminals BOOLS, PLUS, MULT, UNARY, or CAST, or a FUNCALL node.

    2. control structures IF, IFELSE, WHILE, STMTS, and BRACESTMTS will be simplified in the spirit of textbook figure 7.15 (lga-sdt.pdf) --- essentially lose the terminals, keep the functional children (blocks of statements, predicate expressions).

    Simplification of all other grammar constructs are at the student's discretion and inclination. They will not be graded.
    There are example .src and AST PDFs in the zobos-resources archives for download.

    Outside of the rules listed above, you are not required to duplicate the ASTs provided, they are provided as examples!

  3. ZOBOS will write to disk a simple tree representation of its AST.

  4. ZOBOS will then use a scope respecting symbol table and visitor pattern(s) to perform several semantic checks on the variable, expression and function use within the parsed program source.

    1. When the special EMIT form of symtable is encountered during this process, your ZOBOS will write the symbol tables for all scopes to disk in a simple format to facilitate grading.

    2. Warnings or error messages will be emitted in a consistent and simple manner to facilitate grading.

  5. Lastly, ZOBOS will exit(0) if there were no errors detected in the program, and exit(1) otherwise.

    WARNING messages are not ERROR messages.

Syntax and Semantic Checks

Your ZOBOS compiler will inspect the program.tok for these types of errors and inconsistencies.

Keep in mind that we often clump all errors into the syntax error, pigeonhole. Technically, syntax errors occur only during the parse when an invalid pattern of terminals is detected. The other errors that compilers find for us, and most of them in this project, are semantic errors and inconsistencies.

Our language does not have "first class" functions, and functions won't be treated as identifiers holding a "value". The language has data types bool, int, float and string; and then it has functions. Think C/C++, not Python, JavaScript, Scheme or Haskell.

Type

ID

Description

Terminal Location to report (line and column)

SYNTAX

SYNTAX

Any syntax error during the parse

Token causing error, or last token successfully processed

ERROR

NOIDENT

Using an undeclared identifier

Location of identifier name, each occurrence on a separate OUTPUT line

ERROR

CONV

Value conversion error (see the conversion errors table)

Location of the assign terminal, or the function identifier in an erroneous invocation.

ERROR

EXPR

Expression tree operand error (see the expression errors table)

Location of the PLUS, TIMES, mod or UNARY terminal operating on the erroneous value or variable

ERROR

CALL

Invoking variable identifier as a function, using the wrong number of arguments in a function call

Location of the function identifier being invoked

WARN

REIDENT

Attempting to re-declare an identifier

Each occurrence of identifier in re-declaration

WARN

UNUSED

The identifier is not used in an expression or assignment anywhere within its scopes

Location of identifier in its declaration

WARN

UNINIT

Using an identifier in an expression before it has been initialized with a value

Location of identifier in expression

WARN

CONST

Attempting to store a value in a variable with the const attribute, or re-defining a function

Location of identifier name on LHS of assign terminal

FIXME: add function nuances here...

Conversion Errors (CONV)

Attempting to perform these types of data storage causes a CONV ERROR.

Value Type

Stored into ...

string

int, float or bool

float

int, bool or string

bool

float or string

int

bool or string

string, float, bool or int

A function identifier (see here for the flip-flop)

Expression Value Propagation Rules and Errors (EXPR)

The type of value result in expression trees is always one of bool, float, int or string. An expression tree value type begins in the leaves or a FUNCALL and is implicit based on the type of literal value (intval, stringval, floatval), type of variable or the return type of a function.

  • The value type may be altered in expression trees (as the calculation progresses up) by CAST operations or through int and float arithmetic combinations.

  • UNARY operations do not change the value type.

  • PLUS and TIMES operations between int and float values yield float results

Expression Operation Rules (generate EXPR ERROR).

unary PLUS and binary PLUS, TIMES, and BOOLS cannot be used on string or bool values

mod (modulo arithmetic) operation must be used on int values (both operands)

compl (bitwise complement) unary operation must be used on an int value

not (logical not) unary operation must be used on a BEXPR or bool value

treating a function identifier as if it were a value (variable) identifier

Is it ever safe to stop looking for EXPR errors in poorly formed expression trees? Evaluation of an expression tree works from the bottom up (recursion is used to get down to the bottom).

  1. Each node should look for EXPR errors under each child branch.

  2. When an EXPR error occurs in any child, a parent node can no long evaluate the correctness of its operation and so should not emit any EXPR errors.

  3. If none of a parent node's children report EXPR errors, the parent should evaluate the correctness of its operation.

  4. Except for the root, all nodes communicate either an error condition or a resultant value type for its operation to its own parent.

OUTPUT

ZOBOS will report three types of data to three different files.

Syntax, Warning and Error Messages

These will be written to system stdout and must be prefixed by the token OUTPUT as described by the course submission requirements. These messages will have four critical parts emitted in this order:

  1. the type of message :TYPE:, enclosed with colons1 where TYPE is one of (all capitals) SYNTAX, ERROR, or WARN.

  2. the line number of the error in the source (see Terminal Location column of syntax, error, and warning table)

  3. the character number of the error in the source (see Terminal Location column of syntax, error, and warning table)

  4. the ID for the message type

The line and character number come from the program.tok token stream. For a SYNTAX error it is the location of the offending token or the the last good token iff the syntax error occurs because there are no more tokens to be parsed. For WARN and ERROR messages, the location to report is detailed here.

Example Syntax Error Output

Since we abort all tasks on a syntax error, there should only be one SYNTAX error ever printed on stdout. You are permitted to print other information, just don't put it on the required OUTPUT line:

danglingbrace.vrb (the source danglingbrace.syn is available in the ZOBOS archive files)

OUTPUT :SYNTAX: 2 14 :SYNTAX:
Parsing error in state 64; input: ['sc', ';'],['lbrace', '{'],['if', 'if']; expected=lbrace

You aren't required to print out anything beyond the data on the OUTPUT line.

Abstract Syntax Tree

You have the option of writing a dot(1) instruction file instead of the format described below. The grader.sh script will auto-detect the format you use.

ZOBOS will write its abstract syntax tree to the filename specified by the second command line parameter. The format of this file is in two parts. The first part is a line by line declaration of a tree node id and name, separated by white space. If a name is not provided the id is used. An empty line separates the the first part from the second part.

The second part is a line by line description of tree edges. The first id on the line is the parent, subsequent ids are the children in left to right order. If a parent id is mentioned more than once, children are simply arranged in left to right, file order. The following two AST descriptions in this format yield the same tree graph:

First AST Description

Second AST Description

dot() Generated Graph

A A
B B
C C
D D

A B C D

A A
B B
C C
D D

A B 
A C D

astfmt1.svg

Source (helloworld.src)

AST Description (helloworld.ast)

dot() Generated Graph

string word = "hello world";
int speak()
returns r = 0
{
    emit word, 0, 11;
}

0 MODULE
0-0 DECLLIST
0-0-0 type
0-0-0-0 string
0-0-1 DECLID
0-0-1-0 =
0-0-1-0-0 id
0-0-1-0-0-0 word
0-0-1-0-1 stringval
0-0-1-0-1-0 hello world
0-1 FUNCTION
0-1-0 FUNSIG
0-1-0-0 type
0-1-0-0-0 int
0-1-0-1 id
0-1-0-1-0 speak
0-1-0-2 PARAMLIST
0-1-1 =
0-1-1-0 id
0-1-1-0-0 r
0-1-1-1 intval
0-1-1-1-0 0
0-1-2 BRACESTMTS
0-1-2-0 scope
0-1-2-0-0 open
0-1-2-1 STMTS
0-1-2-1-0 STATEMENT
0-1-2-1-0-0 EMIT
0-1-2-1-0-0-0 id
0-1-2-1-0-0-0-0 word
0-1-2-1-0-0-1 intval
0-1-2-1-0-0-1-0 0
0-1-2-1-0-0-2 intval
0-1-2-1-0-0-2-0 11
0-1-2-2 scope
0-1-2-2-0 close

0-0-0 0-0-0-0
0-0-1-0-0 0-0-1-0-0-0
0-0-1-0-1 0-0-1-0-1-0
0-0-1-0 0-0-1-0-0 0-0-1-0-1
0-0-1 0-0-1-0
0-0 0-0-0 0-0-1
0-1-0-0 0-1-0-0-0
0-1-0-1 0-1-0-1-0
0-1-0 0-1-0-0 0-1-0-1 0-1-0-2
0-1-1-0 0-1-1-0-0
0-1-1-1 0-1-1-1-0
0-1-1 0-1-1-0 0-1-1-1
0-1-2-0 0-1-2-0-0
0-1-2-1-0-0-0 0-1-2-1-0-0-0-0
0-1-2-1-0-0-1 0-1-2-1-0-0-1-0
0-1-2-1-0-0-2 0-1-2-1-0-0-2-0
0-1-2-1-0-0 0-1-2-1-0-0-0 0-1-2-1-0-0-1 0-1-2-1-0-0-2
0-1-2-1-0 0-1-2-1-0-0
0-1-2-1 0-1-2-1-0
0-1-2-2 0-1-2-2-0
0-1-2 0-1-2-0 0-1-2-1 0-1-2-2
0-1 0-1-0 0-1-1 0-1-2
0 0-0 0-1

helloworld.svg

The example above suggests an easy way to create unique identifiers for all AST nodes: the digit patterns are simply the path to each node, and each path element is the child index.

Symbol Tables

During symbol table generation and the search for semantic warnings and errors, if ZOBOS encounters the emit symtable statement all the symbols for all scopes should be written to the third command line argument. The format for this output is as follows:

  1. The global or deepest scope will be considered scope zero (0), each nested scope within it is assigned the next sequential scope value.

  2. Each symbol should be written on one line, and contain the following comma delimited fields: scope value, type, and id.

    1. The scope value is a non-negative integer, 0 is used for the deepest or "global" scope.

    2. type is the sequence of grammar terminals associated with a variable's DECLTYPE or it is the function signature:

      returnType//(param1type(/param2type(/param3type(/...))))

      eg, the function signature for speak in the helloworld.src example above is simply int//.

    3. The id is the identifier of the symbol.

    1. Yes, it is possible to have scopes without any symbols declared within them.
    2. Within the same scope, when an identifier is attempted to be re-used, an REIDENT error should be generated and the attempted re-declaration is ignored.

All of the .src files with emit symtable instructions in the zobos resource archives have .sym files provided as well. Here is an example of program source with multiple scoping levels and the emit symtable output.

Source (symtable-2.src)

Symbol Table Dump (symtable-2.sym)

string  m = "helloworld";
const float  pi = 3.1415;
int     i = 0;
const bool    b = 1<3;
int main()
returns r=0
{
    string p = m;
    int    f = i;
    string i = "eyespymeashadow";
    {
        {
            bool c = b;
            emit p, 0, int(b);
            if ( f > 9.81 ) {
                c = bool(f*f-int(c));
                emit symtable;
            }
            emit i, 0, 15;
        }
        emit m, int(pi), 5;
    }
}

0,const bool,b
0,int,i
0,const string,m
0,const int//,main
0,const float,pi
1,int,r
2,int,f
2,const string,i
2,const string,p
4,bool,c

zlang.lr

zlang.cfg is an SLR(0) language and your shared learning group codebase for generating the shift-reduce item sets and CFSM graph can be used to process zlang.cfg. However, this is not recommended --- this is a big language and generating the CFSM over and over is just going to slow you down. You can certainly make the process more efficient by generating the the shift-reduce table one time --- or you can use the table I'm providing for a jump start:

zlang.lr has 152 lines with 70 comma separated fields per line. There is no white space except for new line separators (x0A).

  1. The first field of the first line is a period (.), which serves as a placeholder. The remaining fields on the first line are the column heading grammar symbols. There are 69 grammar symbols in the language (|$T+\Sigma_{\$}|$).

  2. The remaining lines are for the 151 item set states. The first field is the item set index. Subsequent fields are the actions to be performed on the associated grammar symbol. The actions are written in the same notation used in the text and lectures slides with one exception:

    1. sh-X means "shift the symbol to the parsing stack and goto to state X.

    2. r-N means reduce by rule number N (not index!) and push the resulting subtree back to the input queue.

    3. R-N means by rule number N, which has the grammar's goal symbol on the LHS --- accept the token stream as valid!

      In the text and slides R-N is written out across the entire row of the LR table, as shown here in item set 10.

zlang-pure.cfg does not contain #comments and should be immediately readable by your learning group code for grammars. zlang-rules.lis is a text file with numbered grammar rules, zlang-lr.pdf is a typeset LR table formatted in the same manner as lectures slides. And for the very curious, here is the CFSM item set graph of the language.

You may either hard code the LR parsing table and grammar rules into your ZOBOS application or read zlang.lr and the grammar rules from disk on each invocation. If the latter, include zlang.lr and any needed files in your submitted archive along side Build.sh or the Makefile. See to recall these submission details.

Hints and Testing

  1. You know you're going to dump symbol table debug info during development, so you might as well format it according to the requirements in the first place.

grader.sh

I am providing to students the same tarball the grader will use for testing your ZOBOS. Here is how to use it:

First, download this tarball to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory.

$ ls zobos-student.*
zobos-student.tar.bz2
$ mkdir ~/tmp
$ cd ~/tmp
$ tar xjf ../zobos-student.tar.bz2
:
$ ls
zobos/

Second, set the COMPGRADING environmental variable with:

$ source ~khellman/COMPGRADING/setup.sh ~khellman/COMPGRADING

Now go to the directory holding your ZOBOS and execute the grader.sh script from the zobos-student.tar.bz2 resource.

$ cd ~/compilers/zobosCC
$ ls ZOBOS
ZOBOS
$ ~/tmp/zobos/grader.sh
:
:
:

You will need to read any messages from the script carefully, and perhaps hit ENTER several times throughout its course. This script checks for:

  1. missing data files
  2. truncated data files
  3. inaccessible data files
  4. and the difference between your ZOBOS results and expected results.

The latter test results are displayed on the terminal screen, but they whip by pretty quickly so you may need to scroll up and make sure you see them all.

When there is a discrepancy in expected results the script either:

  • points you to a data file along side your ZOBOS where more failure details are available,

  • or all the details are displayed on screen.

Before the grader.sh terminates, it will show a summary of some specific rubric line results for the programming project.

The grader will use this very same script, possibly with some additional testing .src to check your submitted work. If your ZOBOS flies through without a hitch, you can likely suspect a good grade for the assignment.

Submit Your Work

Partners

You may complete this assignment in a group of two or three other students from the course (you may also work solo if you choose). Exams may have questions specific to course programming assignments — so be sure your group is truly working as a team and have a solid understanding of your submission's design and algorithms.

If you decide to complete this assignment as a group, you must tell your instructor one week before the assignment due date. Only one of you should submit the assignment for grading. You will all be able to see the grading result from your own logins.

Rubric

This work is worth 95 points.

Requirements

Points

Notes

Meets compilers course project requirements

10

Basic I/O and exit status requirements

10

exit status non-zero on unreadable input files or un-writable output files

Detect and report SYNTAX errors correctly

20

exit status should be non-zero when SYNTAX errors are detected

AST simplification of "raw" parse tree, output of AST graph information

20

emit symtable and symtable.dat requirements

15

ERROR class semantic issues properly detected and reported

10

exit status should be non-zero when ERROR issues are detected

WARN class semantic issues properly detected and reported

10

exit status should be zero when only `WARN' issues are detected

Assignments/ZOBOS (last edited 2024-04-21 14:01:43 by khellman)