The implementation has to be in C or C++ as these are the most recommended (for combined performance and user convenience) programming languages to implement cryptography solutions in the real world. The project comes with a minimal assignment and requires a submission of both software and a report being graded mainly by the TAs according to scoring criteria defined below; any additional work you perform, including a project presentation, will be considered extra credit work (specifically, Extra Credit 3) if clearly identified in the report.
Project 1 Cryptanalysis of substitution ciphers:
This cryptanalysis project consists of a software implementation of an algorithm that tries to decrypt an L-symbol challenge ciphertext using a plaintext dictionary (containing a number q of English words or plaintexts obtained as a sequence of English words), using only partial knowledge of the encryption algorithm used, no knowledge of any keys involved. Your program's goal is to find the plaintext (as one of the dictionary plaintexts or one sequence of English words from those used for the dictionary) and within a reasonable amount of time.
Your program should take as input from stdin:
- the number t of key symbols,
- an L-symbol challenge ciphertext,
where each symbol is either a space or one of the 26 lower-case letters from the English alphabet (thus, not a special character, punctuation symbol or upper-case letter). Your program should return as output a guess for which L-symbol plaintext was encrypted, where again each symbol is either a space or one of the 26 lower-case letters from the English alphabet (thus, not a special character, punctuation symbol or upper-case letter). A text file Dictionary1 containing a number u of L-symbol candidate plaintexts will be provided to you (as an attachment at the top of this page), and you should feel free to use its content as part of your code. A text file Dictionary2 containing a number v of English words will be provided to you (as an attachment at the top of this page), and you should feel free to use its content as part of your code.
Your executable file should be named "--decrypt". Upon execution, it should obtain the above inputs 1,2 from stdin, and finally return the output plaintext on stdout within x minutes (or else it will be declared to default to an incorrect guess); most likely, we will choose x = 2.
Your program will be run using different parameters (most likely: L between 50 and 150, u between 100 and 150, v between 100 and 200, and t between 1 and 40), and on a number of challenge ciphertexts, each computed using a potentially different encryption scheme. Each ciphertext will be computed from a plaintext selected in one of the following two ways:
- randomly and independently choosing one of the L-symbol plaintexts in Dictionary1 or
- concatenating words randomly and independently chosen from Dictionary2 (any two words being separated by a space, until one has an L-symbol plaintext).
All the encryption schemes used have the following common features:
- The message space and ciphertext space are the set {,a,..,z}^L. In other words the message m can be written as m[1],...,m[L], where each m[i] is in {(space>,a,..,z}, and the ciphertext c can be written as c[1],...,c[L], where each c[i] is in {,a,..,z}
- The key space is the set {0,..,26}^t. In other words the key k can be written as k[1],...,k[t], where each k[j] is in {0,..,26}, for j=1,..,t.
- The encryption algorithm computes each c[i] as equal to the (lexicographic) shift of m[i] by k[j(i)] positions, where the computation of each j(i) is left unspecified and may depend on i,t,L. In other words, each ciphertext symbol c[i] is the shift of the plaintext symbol m[i] by a number of position equal to one of the key symbols, which symbol being chosen according to an undisclosed, deterministic, and not key-based, scheduling algorithm that is a function of i, t and L.
Your accompanying report should at least include the following sections:
- team member names; the list of project tasks performed by each student in the team; any modifications you made with respect to the above specifications
- a detailed explanation of the cryptanalysis approach used in your program
Allowed extensions (to be considered as extra credit) include any one among the following:
- a report section containing a brief (i.e., <= 2 pages) survey on substitution ciphers
- a report section containing a brief (i.e., <= 2 pages) survey on cryptanalysis approaches for substitution ciphers
- an approach to decrypt an arbitrary random English plaintext (that is, an L-letter sequence of random English words with spaces, but no punctuation, or upper-case letter), possibly using English words not in the dictionary files;
- anything else you want to add