Scanning represents a common class of algorithms that is widely used by programmers. In this assignment you will write a scanner. The project will utilize your knowledge of finite state machines and text files.
Scanners process test by translating a stream of input characters into tokens. Each token represents a contiguous group of characters. For example, the scanner in a compiler would input the following line of Java
while (myInt > 1000)
and discover the following six tokens:
1) the keyword while
2) the left paren
3) the variable name myInt
4) the less than symbol
5) the integer 1000
6) the right paren
Write a scanner that translates logical expressions. There are many different notations for logical expressions. For example, Java uses the constants "true" and "false", while philosophers often abbreviate these as "T" and "F" and Boolean algebraists use "1" and "0". Your scanner is going to produce logical expressions in a compact form that uses one symbol per constant and one symbol per operator. Below is a table of all of the allowable output symbol tokens, and associated logical semantics, for your scanner.
Token
|
Meaning
|
T
|
logical value of TRUE
|
F
|
logical value of FALSE
|
~
|
NOT operator
|
^
|
AND operator
|
V
|
inclusive OR operator
|
x
|
exclusive or (XOR) operator
|
>
|
IMPLIES operator
|
=
|
EQUIVALENCE operator
|
(
|
a left paren
|
)
|
a right paren
|
?
|
this symbol denotes a syntax error
|
The program needs to be more flexible in the acceptable form of the syntax that is translated. FOr example, unlimited blanks are permitted (but not required) before, after or between tokens. No blanks will result in the output. The permissible input tokens prior to translation are explained below:
TRUE value -- any unsigned integer with an odd value
FALSE value -- any unsigned integer with an even value
NOT -- #
!
AND -- &
*
OR -- +
XOR -- <>
!=
IMPLIES -- =>
any string of zero or more consecutive dashes (-) followed by a > symbol
EQUIVALENCE -- =
==
left paren - (
right paren -- )
Note that all of the preceding notations are used in some settings for the associated tokens, but these particular tokens were selected in order to work better with FSMs. Anything other than what was described should be considered incorrect syntax and result in a ? symbol for a token.
You will be graded largely on how effectively you utilize a finite state machine and the associated table-driven code to solve this problem. You should view the problem as an FSM that scans one character at a time in order to transition from state to state. Each state transition has one of three associated actions:
1) add another token to the output string
2) read another character
3) add another token to the output string AND read another character.
Since the "add another token" action does not always add the same token, it is helpful to view your FSM as using three, rather than two, tables -- namely, a next state table, an action table and an output token table. Also, when building these tables you can take advantage of the fact that all valid input characters fall in the range from a blank to the greater than character.
Your final submission also needs to follow these guidelines:
1) You are reuired to submit a graphical representation (states with labeled arcs) to describe your FSM.
2) The FSM tables must be read from a text file that is included with your src folder files when you submit the program.
3) The input string to be tokenized must be read from a text file and you must use a JFileChooser object to allow the user to select this input file.
4) The output sequence of tokens must be displayed in a JFrame that pops upt when your program completes the scan.