Programming and Data Structures Asignment: Remove Duplicated Records
1 Introduction
You will write a C++ program to read records in a plain text file, then remove duplicated records, and write unique ones to a plain text file. You can use any algorithm and data structures that you prefer, as long as the results are correct. It is preferred, but not necessary, that your algorithm is as efficient as possible, both in processing time as well as memory management.
2 Input and Output Specification
The input is one text file with 0 - 10000 records. The content of each record is confined to a pair of {} and doesn't contain '{' or '}', but may have other symbols like space, comma, colon, single or double quote, and so on. In the input and output file, each line ends in a '\n' character. One record may not be necessary in one line in the input file. But you should output each unique record in one line without any spaces. The output records can be in any orders. It doesn't matter if you have or don't have one empty line at the end of the output file.
Example of input files (between the lines)
input1.txt
{id:1234567,first:Mary,last:Green} {id:1234568, first:Peter, last:Morgan} {id:1234567, first:Mary, last:Green}
input2.txt
{id:1234567,
first:Mary,last:Green,GPA:4.0} {id:1234568, first:Peter,
last:White , GPA:3.8}
{id:1234567, first:Mary, last:Green, GPA:3.9}
output1.txt
{id:1234567,first:Mary,last:Green} {id:1234568,first:Peter,last:Morgan}
output2.txt
{id:1234567,first:Mary,last:Green,GPA:4.0} {id:1234568,first:Peter,last:White,GPA:3.8} {id:1234567,first:Mary,last:Green,GPA:3.9}
3 Program specification
The main program should be called "removeduplicated". Call syntax is as follows (from the OS prompt):
./removeduplicated input=input1.txt output=output1.txt
Notice that the file name will not necessarily be the same every time. Therefore, your program will have to take that into account.
4 Requirements
• Homework is individual. Your homework will be automatically screened for code plagiarism against code from the other students and code from external sources. If you copy/download source code from the Internet or a book it is better you acknowledge it in your comments, instead of the TAs detecting it. Code that is detected to be copied from another student (for instance, renaming variables, changing for and while loops, changing indentation, etc) will result in "Fail" in the course and being reported to UH upper administration.
• You can develop your program on any C++ compiler (MS Visual C++, Borland C++, Intel C++), BUT you must test your program in GNU C++. The TAs have no obligation to test your program on any compiler other than GNU C++.
• don't forget to comment your code, especially complicated code. You must include a short summary of your main functions in the main cpp file.
• In input files, lines might begin with spaces, there might be empty lines, lines with only spaces, and so on. Your program need to robustly deal with these cases. Your program should not crash, halt unexpectedly or produce unhandled exceptions.
• A program that can be compiled count 10 points. Your program will be tested with 9 test cases (each counts 10 points), going from easy to difficult.
• Correctness is more important than speed. You should always err on the side of caution submitting a slow program that works correctly than a fast one that fails in many cases.
Attachment:- Input-Output-Files.rar