Overview
For the first programming assignment you will work with your assigned partner to create a program that is able to "learn" human language and generate new sentences in the language. Its concept of language is somewhat limited: it knows only an estimate that the probability that a word wt will follow a sequence of (n 1) words wt-n+1, w2, ..., wt-1. In the natural language processing community, these are known as "n-gram" models, and they have proven to be very useful. You will observe that the order of the model, n, has a big effect on the kinds of "sentences" it generates. When it learns from a collection of Grimm's fairy tales, it produces things like:
- When n = 2: i not upon the cat 'and as she was properly heated the peasants heard it grieves me to his nose to the tree but all that was about the stairs
- When n = 3: and the horse and rode into the jug at his cap on with the sack of meal that we should at least you might take a draught the maid 'what a maypole!' said she and was forced at last the boy 'or take yourself off out of her
- When n = 4: where have you been' said his father 'i have been travelling so long that i should like very well to find out where she is however' said the star-gazer as he looked through his maps but the castle was a garden and around it was a great deal better
- When n = 5: worse still she neglected to make the old woman's bed properly and forgot to shake it so that the boar jumped up and grunted and ran away roaring out 'look up in the tree there sits the one who is to blame' so they looked up and espied the wolf sitting amongst the branches and they called him a cowardly rascal and would not suffer him to come down till he was heartily ashamed of himself and had promised to be good friends again with old sultan the straw the coal and the bean in a village dwelt a poor old woman who had dim eyes could not see it and thought it very beautiful and said to himself 'i will not lose her this time' but however she again slipped away from him unawares and ran o
towards home and as the prince followed her she jumped up into the pigeon-house and shut the door behind it and then the raging beast which was much too heavy and awkward to leap out of the window was made of fine black ebony and as she sat looking out upon the snow she pricked her finger and three drops of blood fell upon it
(Here denotes a special \start-of-sentence" token and a special end-of-sentence token.). The program \learns" by reading and gathering statistics from an input training text file.
You will implement this functionality in the LanguageModel class; a skeleton .java file with several dummy methods has been provided to you. The pre- and post-conditions for each method are specified; it will be your job to complete all of the empty methods according to those conditions. You are free to add as many additional private helper methods as you would like. Now that you have a sense of what you will be doing, let's take a look at how you will be working on the assignment.
Program 1 Specifications
File and Directory Naming Requirements-
- The writeup and all of your source code should be found directly in yourArchive/prog1, where yourArchive is replaced with the name of your actual zip archive.
- Your source code will be written in a Java file named LanguageModel.java (a skeleton of this file has been provided to you).
- Your writeup must be a plain-text file named writeup.txt.
Attachment:- assigment.zip
Attachment:- texts (1).zip