Read the following case study. Then, write an essay responding to the question prompts.
• The essay should be in (APA format, i.e. cover page, in-text citations, references page.... etc.).
• You need at least two sources (the book counts!).
• Make sure to provide correct in-text citations when necessary.
• Essays should be between approximately two pages in length.
Case Study: Fail Away with Dynamo, Bigtable, and Cassandra
During its holiday season, Amazon.com receives nearly 200 order items per second! To support such an enormous workload, it processes customer transactions on tens of thousands of servers. Unfortunately, with that many computers, failure is inevitable. Even if the probability of any one server failing is .0001, the likelihood that not one of them fails is .9999 raised to the 10,000 power, which is about .37. Thus, for these assumptions, the likelihood of at least one failure is 63 percent. For reasons that go beyond the scope of this discussion, the likelihood of failure is actually much greater.
Amazon.com must be able to thrive in the presence of such constant failure. Or, as Amazon.com engineers stated: "Customers should be able to view and add items to their shopping cart even if disks are failing, network mutes are flapping, or data centers are being destroyed by tornados.
The only way to deal with such failure is to replicate the data on multiple servers. When a customer stores a Wish List, for example, that Wish List needs to be stored on different, geographically separated servers. Then, when (notice when, not if) a server with one copy of the Wish List fails, Amazon.com applications obtain it from another server.
Such data replication solves one problem but introduces another. Suppose that the customer's Wish List is stored on servers A, B, and C and server A fails. While server A is down, server B or C can provide a copy of the Wish List, but if the customer changes it, that Wish List can only be rewritten to servers B and C. It cannot be written to A because A is not running. When server A comes back into service, it will have the old copy of the Wish List. The next day, when the customer reopens his or her Wish List, two different versions exist: the most recent one on servers B and C and an older one on server A. The customer wants the most current one. How can Amazon.com ensure that it will be delivered? Keep in mind that 9 million orders are being shipped while this goes on.
None of the current relational DBMS products was designed for problems like this. Consequently, Amazon.com engineers developed Dynamo, a specialized data store for reliably processing massive amounts of data on tens of thousands of servers. Dynamo provides an always-open experience for Amazon.com's retail customers; Amazon.com also sells Dynamo store services to others via its S3 Web Services product offering.
Meanwhile, Google was encountering similar problems that could not be met by commercially available relational DBMS products. In response, Google created Bigtable, a data store for processing petabytes of data on hundreds of thousands of servers.8 Bigtable supports a richer data model than Dynamo, which means that it can store a greater variety of data structures.
Both Dynamo and Bigtable are designed to be elastic; this term means that the number of servers can dynamically increase and decrease without disrupting performance.
In 2007, Facebook encountered similar data storage problems: massive amounts of data, the need to be elastically scalable, tens of thousands of servers, and high volumes of traffic. In response to this need, Facebook began development of Cassandra, a data store that provides storage capabilities like Dynamo with a richer data model like Bigtable 9.1° Initially, Facebook used Cassandra to power its Inbox Search. By 2008, Facebook realized that it had a bigger project on its hands than it wanted and gave the source code to the open source community. As of 2011, Cassandra is used by Facebook, Witter, Digg, Reddit, Cisco, and many others.
Cassandra, by the way, is a fascinating name for a data store. In Greek mythology, Cassandra was so beautiful that Apollo fell in love with her and gave her the power to see the future. Alas, Apollo's love was unrequited and he cursed her so that no one would ever believe her predictions. The name was apparently a slam at Oracle.
Cassandra is elastic and fault-tolerant; it supports massive amounts of data on thousands of servers and provides durability, meaning that once data is committed to the data store, it won't be lost, even in the presence of failure. One of the most interesting characteristics of Cassandra is that clients (meaning the programs that run Facebook. Witter, etc.) can select the level of consistency that they need. If a client requests that all servers always be current, Cassandra will ensure that happens, but performance will be slow. At the other end of the trade-off spectrum, clients can require no consistency, whereby performance is maximized. In between, clients can require that a majority of the servers that store a data item be consistent.
Cassandra's performance is vastly superior to relational DBMS products. In one comparison, Cassandra was found to be 2,500 times faster than MySQL for write operations and 23 times faster for read operations" on massive amounts of data on hundreds of thousands of possibly failing computers!
Questions
4-9. Clearly, Dynamo, Bigtable, and Cassandra are critical technology to the companies that created them. Why did they allow their employees to publish academic papers about them? Why did they not keep them as proprietary secrets?
4-10. What do you think this movement means to the existing DBMS vendors? How serious is the NoSQL threat? Justify your answer.
4-11. Search the Web to determine what existing vendors such as Oracle and Microsoft are doing with regard to NoSQL databases. Also, search to see what support such vendors are providing for Big Data data stores. Summarize your findings.
4-12. Amazon.com offers cloud services known as Amazon Web Services (AWS). Within the AWS offering is a set of services for accessing Dynamo. Search the Web for the term AWS Dynamo cases and find two examples of companies that are using AWS Dynamo. Why did those companies choose AWS Dynamo? Note that the answer to this question has two parts: Why did they use a cloud service? And why did they choose NoSQL rather than a relational database?
4-13. The text describes how organizations need to create in-formation from Big Data data stores but are challenged to do so because NoSQL and Hadoop are difficult to use. Search the Web for easier-to-use query and reporting products for Big Data data stores. Investigate the top two products and determine if they are for you. Summarize your findings.