Foundations of Cybersecurity Project: Anti-virus
Description and Deliverables - In this project, you will gain hands-on experience with a core technique in defensive cybersecurity: signature matching. You will develop a simple anti-virus that (1) create signatures that match known malware, and then (2) examines unknown binaries to determine if they contain a malware signature. You will be provided with malware and benign binaries to help train your anti-virus.
To receive full credit for this project, you will turn in (at least) three things:
1. A program named av-train that analyzes some given binaries and produces signatures of malware.
2. A program named av-detect that analyzes some given binaries and determines if each one matches a malware signature or not
3. A Makefile that compiles your two programs (or is empty and does nothing, if you're using a language that doesn't require compilation).
Goals and Datasets - In this assignment, your goal is to develop a complete anti-virus system that maximizes true positives (malware detections) and true negatives (not detecting benign binaries), while also minimizing false negatives (malware that is missed) and false positives (benign binaries that are mistaken for malware). You will develop two programs: avtrain and av-detect, the former of which creates signatures from known binaries, and the latter of which uses the signatures to classify unknown binaries.
To achieve these goals, we have produced four datasets:
- safe_pub.tar.gz: 3673 benign binaries (true negatives). Your anti-virus should never detect one of these binaries as malware (false positive).
- malware_pub.tar.gz: 1360 malware binaries. Your anti-virus should create signatures from these binaries. It should also be able to detect all of them as malware (true positives) and miss none of them (false negatives).
- safe_priv.tar.gz: An unknown number of benign binaries that we will use to evaluate your anti-virus.
- malware_priv.tar.gz: An unknown number of malware binaries that we will use to evaluate your anti-virus.
In other words, you will use the two public datasets to develop, debug, and test your anti-virus system. In turn, we will evaluate and grade your system based on the two private datasets.
av-train -
The first program you will develop is av-train. This program takes three parameters as input: (1) a directory containing malware binaries, (2) a directory containing benign binaries, and (3) the name of a file that will contain the set of malware signatures that you derive from the given directory of malware. Obviously, your goal is to produce signatures that maximize true positives and true negatives, while minimizing false positives and false negatives.
av-detect -
The second program you will develop is av-detect. This program takes at least one, and possibly more, command line parameters:
$ ./av-detect [unknown binary 1] [unknown binary 2] ... [unknown binary n]
The first parameter is the signature file produced by your av-train program. All of the other parameters are unknown binaries: for each given unknown binary, your av-detect program should print to STDOUT (1) the name of the file and (2) whether it is "MALWARE" or "SAFE". Note that the first parameter (the signature file) is required; the list of unknown binaries is not required, and can be of any length.
Attachment:- Assignment File.rar