Task
Your task is to write a program, words, that reports
information about the number of words read from its standard
input. For example, if the file qbf.txtcontains the text
the quick brown fox jumps
over the lazy dog
then running the program with is input redirected to come
from that file will report 9 words
$ words < qbf.txt
Total: 9
Automatic Testing
This task is available for automatic testing using the name words. You can run the demo using
demo wordsand you can test your work using try words <>.
Background
Use the following skeleton for the program:
int main(int argc, char** argv) {
enum { total, unique } mode = total;
for (int c; (c = getopt(argc, argv, "tu")) != -1;) {
switch(c) {
case ''t'': mode = total; break;
case ''u'': mode = unique; break;
}
}
argc -= optind;
argv += optind;
string word;
int count = 0;
while (cin >> word) {
count += 1;
}
switch (mode) {
case total: cout << "Total: " << count << endl; break;
case unique: cout << "Unique: " << "** missing **" << endl; break;
}
}
The getoptfunction (#include ) provides a standard way of handling option
values in command-line arguments to programs. It analyses the command-line parameters argc
and argv, looking for arguments that begin with ''-''. It then examines all such arguments for
specified option letters, returning individual letters on successive calls and adjusting the variable
optindto indicate which arguments it has processed. Consult getoptdocumentation for details.
In this case, the option processing code is used to optionally modify a variable that determines what
output the program should produce. By default, modeis set to total(indicating that it should
display the total number of words read). The getoptcode looks for the tand uoptions (which
would be specified on the command line as -tor -u) and overwrites the modevariable
accordingly. When there are no more options (indicated by getoptreturning -1), argcand
argvare adjusted to remove the option arguments that getopthas processed.
Computer Programming 2 Prac 4
Flinders University / School of Computer Science, Engineering, and Mathematics 1
Level 1: Unique words
Extend the program so that if run with the uoption (specified by the command-line argument
-u), it displays the number of unique words. To count unique words, use the STL vector
class to keep track of which words you''ve already seen. When each word is read, check to see
if it''s already in the vector; if it''s not, insert it. The number of unique words will then be the
size of the vector.
Level 2: Your Own Vector
Modify your code so that it does not use the STL classes. You''ll need to write your own class
to keep track of the list of words. At this level, you can assume that there will be no more that
1000 unique words, so you can use a statically allocated array to store the items.
A minimal STL-compliant class (which will minimise the changes you need to make to your
main program) would have an interface like this:
template class Vector {
public:
typedef T* iterator;
Vector();
iterator begin();
iterator end();
int size();
iterator insert(iterator position, const T& item);
private:
T items[1000];
int used;
};
Level 3: Individual Word Counts
Extend the program so that if run with the ioption it displays
the counts of individual words in alphabetical order. For
example the command
words-i < qbf.txt
should result in the output shown.
Your program will need to store two pieces of information for each
word. Define a struct called WordInfothat contains a string,
text, for the word''s text, and an int, count, for the number of times it''s been seen. Then
create a vector of WordInfoinstead of a vector of string. When you add a new word,
give it a count of 1. When you see a word that''s already in the vector, increment its count.
The simplest way to display the result in alphabetic order is to keep the vector in sorted order
as words are added, then simply iterate through the vector at the end. The insertion sort
algorithm is a suitable way to find where a new word should be inserted. Iterate through the
list until you find a word that''s alphabetically after the new word, then insert at that position.
Level 4: Large Data Sets
Make sure that your program works correctly (and efficiently!) even if it is run with large data
sets. Since you don''t know how large the collection of words might become, you''ll need to
make the vector grow dynamically. A suitable strategy is to allocate space for a small number
of items initially, and then check at each insert whether or not there''s still enough space.
When the space runs out, allocate a new block that''s twice as large, copy all of the old values
into the new space, then delete the old block.