## page was renamed from Assignments/FromNFAtoScanning
## vim: nowrap tabstop=2 shiftwidth=2 expandtab textwidth=10000
#acl _Students:read
#acl All:read


<<LaTeXMathSetup()>>

{{{#!wiki comment
FIXME: Correct formatting of the NFA transition row table (last row)
FIXME: Use alphabet encoding, so NFAMATCH is immediately compatible to WRECK.
}}}

{{{#!wiki important
 1. The [[#compgrading|grader tarball]] is not yet available, but it will be soon.  If there are changes to this write-up they will just be clarifications; nothing substantial will change.
 1. Here is [[#|the change log|&action=info]] for this assignment write-up.  I will try to be descriptive in my
 log messages.
 1. You can also [[#|subscribe|&action=subscribe]] to this page and receive Emails when changes are made.
}}}


In this assignment will you will construct some core portions of a scanner and demonstrate your implementation's correctness.  The name of
your application must be `NFAMATCH` (case sensitive).

= Input =

`NFAMATCH` will accept two or more arguments:
||<tableclass="header"> Argument || Value     ||
|| 1                             || Path to the [[#NFAdef|NFA definition file]] ||
|| 2                             || Path to the optimized DFA transition table output file ||
|| 3, 4, $\ldots$             || Tokens to test for matching against the NFA ||

`NFAMATCH` should:
 1. read the NFA definition file,
 1. convert the NFA to a DFA,
 1. optimize the DFA (including dead state and unreachable state pruning),
 1. output the optimized DFA transition table ($T$) to the path name of the command line second argument.
 1. and finally test the remaining tokens on the command line to determine if they are an element of the DFA's '''regular set'''

 If a token '''is part of the DFA's regular set''', your `NFAMATCH` should emit an (`:M:`), otherwise `NFAMATCH` should
 emit the first '''character position''' ($1,2,3,\ldots$, ''not an index!'') that fails the DFA.  If the token is not long enough
 to match the DFA, emit '''one more than the length of the token'''.  If the token is an empty string which does not match the DFA, then 
 a `0` should be emitted.  These last few "special cases" are consistent with reporting the character position of match failure.

== NFA Definition File ==
An NFA definition file is line oriented, all fields are white space delimited. The first line contains three or more fields:

||<tableclass="header"> Field || Description || Format ||
|| 1     || Number of states in the NFA || Simple decimal positive integer ||
|| 2     || $\lambda$ character  || The special character representing a $\lambda$ transition in file ||
|| 3,4,$\ldots$ || NFA alphabet $\Sigma$ || Whitespace separated simple printable ASCII characters in '''ascending ASCII order''' ||

{{{#!wiki important
The '''order''' of characters in the alphabet's definition is '''important'''.  Your optimized DFA's transition table ''must use the same ordering
for its columns!''
}}}

Subsequent lines represent transitions between two states; these lines contain 3 or more fields.  State $0$ is implicitely the '''starting state'''.

||<tableclass="header"> Field || Description || Format ||
|| 1  || Normal (`-`) or accepting state `+` flag for the '''From''' field || Simple decimal non-negative integer ||
|| 2  || '''From''' state id || Simple decimal non-negative integer ||
|| 3  || '''To''' state id || Simple decimal non-negative integer ||
|| 4,5,$\ldots$ || White space delimited transition characters from $\Sigma$ or the $\lambda$ character for this NFA ||

There is only one transition line permitted between any two states.  Empty lines in an NFA definition file should be ignored,
there is no commentary symbols or syntax.  All transition lines must have at least 3 fields.

{{{#!wiki tip
Your code is '''not expected''' to detect flaws or inconsistencies in an NFA definition file.  Your code should abort with an exit status of `1`
if an NFA file cannot be opened or is devoid of data (all lines are empty lines).  Otherwise you may assume the NFA file is good to go.
}}}

=== NFA Definition Example ===
If we let `P` be $\Pi=\Sigma - \{ / \ast \}$, then this is an NFA definition file for the <<DjHR(show_chl-comment.pdf,C/C++ block comment RE)>> RE we've discussed in lecture:

{{{
13 # * / P
- 0 1 /
- 1 2 *
- 2 3 #
- 2 5 #
- 2 7 #
- 2 10 #
- 3 4 /
- 4 10 #
- 5 6 P
- 6 10 #
- 7 8 *
- 8 8 *
- 8 9 P
- 9 10 #
- 10 2 #
- 10 11 *
- 11 11 *
- 11 12 /
+ 12 12
}}}

{{attachment:cblock-inputx.svg|C/C++ block comment|width="70%", align="right"}}

== Tokens ==
A token will be a sequence of non-whitespace containing characters from the NFA's alphabet ($\Sigma$).  All the characters of a token must be matched
in order for it to be considered a successful match against the DFA.

= Output =
You will report the following values from your application `NFAMATCH` on the process `stdout` (in this order, and according to the course [[Assignments/Requirements|submission requirements]]):

  1. The match results (non-negative integers) for command line provided arguments 3, 4, $\ldots$

An example of a successful run of your `NFAMATCH` might look like this at a Linux console (the … represents arbitrary console messages that will be ignored
by the grader script):
{{{
$ ls
NFAMATCH  cblock.nfa
$ ./NFAMATCH cblock.nfa _ttable.dat  '/*PPPPPP*/'  '*/**/'  ""  '/*******/'  '/*P/*/'  '/*PP'
…
OUTPUT :M:
…
OUTPUT 1
…
OUTPUT 0
… 
OUTPUT :M:
… 
OUTPUT :M:
… 
OUTPUT 5
…
$ cat _ttable.dat
- 0 E 1 E
- 1 2 E E
- 2 3 2 2
- 3 3 4 2
+ 4 E E E
}}}

The match results could also be written on one line:
{{{
OUTPUT :M: 1 0 :M: :M: 5
}}}

(And merely for completeness sake)

{{attachment:cblock-optdfax.svg|C/C++ block optimized|width="70%", align="center"}}

= Examples =
<<Anchor(examples)>>Data file tarballs of this <<DjHR(cblock.tar.bz2,C/C++ block comment)>> (<<DjHR(show_chl-comment.pdf,group question)>>) and other examples (<<DjHR(3_id.tar.bz2,three underscored identifiers)>>,
<<DjHR(float.tar.bz2,floating points)>>) that have been part of previous LGAs and we ''may have'' discussed in lecture are available.


= grader.sh =
<<Anchor(compgrading)>> I am providing to students the same tarball the grader will use for testing your `NFAMATCH`.  
is how to use it (a [[https://youtu.be/i6CnkWa8CcAHere|video is posted here]]):

First, download <<DjHR(nfamatch-student.tar.bz2,this tarball)>> to your Mines Linux account ("alamode" machines!) and unroll it in a temporary directory.
{{{
$ ls nfamatch-student.*
nfamatch-student.tar.bz2
$ mkdir ~/tmp
$ cd ~/tmp
$ tar xjf ../nfamatch-student.tar.bz2
:
$ ls
nfamatch/
}}}

Second, set the COMPGRADING environmental variable with:
{{{
$ source ~khellman/COMPGRADING/setup.sh ~khellman/COMPGRADING
}}}

Now go to the directory holding your `NFAMATCH` and execute the `grader.sh` script from the `nfamatch-student.tar.bz2` resource.
{{{
$ cd ~/compilers/nfaMatch
$ ls NFAMATCH
NFAMATCH
$ ~/tmp/nfamatch/grader.sh
:
:
:
}}}

You will need to read any messages from the script carefully, and hit `ENTER` several 
times throughout its course.  This script checks for:
 a. missing data files
 a. truncated data files
 a. and the difference between your `NFAMATCH` results and expected results.
The latter test results are displayed on the terminal screen, but they whip by pretty quickly so you 
may need to scroll up and make sure you see them all.  

When there is a discrepancy in expected results the script either:
 * points you to a data file along side your `NFAMATCH` where more failure details are available, 
 * or all the details are displayed on screen.

Before the `grader.sh` terminates, it will show a summary of some specific rubric line results for the programming project.

{{{#!wiki important
The grader will use this very same script, along with some additional testing data to check your submitted work.  If your `NFAMATCH` flies
through without a hitch, you can likely suspect a good grade for the assignment.
}}}
{{{#!wiki tip
In the `nfamatch-student.tar.bz2` tarball are also PDFs of initial `.nfa` files as well as their optimized DFAs, I find these much easier
to "visualize" than the textual `.nfa` input files or the transition table output files.  You might like them as well.
}}}


= Submit Your Work =
<<Include(../PartnersAllowed)>>

<<Include(../SubmitOnDjComp)>>

= Rubric =
This work is worth 60 points.
/* See assignments/nfamatch/.AddRubricSubcats.sh */
||<tableclass="rubric">'''Requirements'''                            || '''Points''' || '''Notes''' ||
|| Meets [[Assignments/Requirements|compilers course project requirements]]  ||  10 ||  ||
|| Exit status of `1` when NFA definition file not found                     ||  5  ||  ||
|| Exit status of `1` if NFA definition does not contain data                ||  5  ||  ||
|| DFAs pruned of dead states                                                ||  5  ||  ||
|| DFAs pruned of unreachable states                                         ||  5  ||  ||
|| DFAs optimized to minimal number of nodes                                 ||  10 ||  ||
|| Token matching                                                            ||  20 ||  ||