One of the biggest challenges with log systems, like the ones used for Intrusion Detection Systems data, is the amount of disk space that they require. For example, storing a message like: "User1 login success", contains 19 characters, which yield 19 x 8 = 152 bytes. Hence, storing this kind of record hourly could yield Gigabytes of data, making log files hard to archive.
In contrast, a numeric encoding is proposed, such that information may be better compacted. For example, the previous example could be represented as UserId = 1, Event = 0, and Outcome = 1. Thus a record in file could be seen as = <1, 0, 1>, such that the record is represented as a bit stream of 0's and 1's.
Given the following definition:
•User = {id | 0 <= id <= 500}
•Event = {login, access, logout, download, upload}
•Outcome = {Fail, Success}
Use the minimum number of bits to represent the data. Also give the proper procedure to read and write a record (hint: use and, or, and shift operations). Finally, use the approach to:
•Interpret (write the textual event) the record seen as (15B7)16 (hint: remember the previous example: "User1 login success")
•Convert the phrase "User157 login fail" to the proposed approach
•Compare (compute) also the size of 1 million records using the previous format (around 20 bytes per record) and the proposed format.