Implement a letter bigram model, which learns letter bigram probabilities from the training data. A separate bigram model has to be learned for each language.
Apply the models to determine the most likely language for each sentence in the test file (that is, determine the probability associated with each sentence in the test file, using each of the three language models). Compare your output file with the solution file (LangId.sol). How many times was your program correct?