The Problem
Big data could be helpful to generate signi?cant value in the ?eld of infectious disease deeming the amount of data available in the world. Researchers claim that "the increased use of social media provides an opportunity to improve public health surveillance systems and to develop predictive models" (National Academies of Sciences & Medicine, 2016).
Furthermore, "advances in machine learning and crowdsourcing may also o?er the possibility to gather information about disease dynamics, such as contact patterns and the impact of the social environment.
New, rapid, point-of-care diagnostics may make it possible to capture not only diagnostic information but also other potentially epidemiologically relevant information in real time. With a wide range of data available for analysis, decision-making and policy-making processes could be improved" (National Academies of Sciences & Medicine, 2016).
Potash (2016) explains that Big data derived from electronic health records, social media, the internet have the potential to provide more timely and detailed information on infectious disease threats or outbreaks than traditional surveillance methods.
Potash also says "traditional infectious disease surveillance, typically based on laboratory tests and other epidemiological data collected by public health institutions, is the gold standard." Nevertheless, Potash claims that the standard used together epidemiological data include time lags; furthermore, it is expensive to produce and typically lacks the resolution needed for accurate monitoring.
Moreover, that approach can be cost-prohibitive in low-income countries. On the contrary, big data streams from internet queries, for example, are available in real time and can track disease activity locally but have their own biases. Hybrid tools that combine traditional surveillance and big data sets may provide a way forward, the scientist suggests, serving to complement, rather than replace, existing methods.
From Potash‘s perspective, the ultimate goal is to be able to forecast the size, peak or trajectory of an outbreak weeks or months in advance to better respond to infectious disease threats. Therefore, the use of big data in surveillance could be considered the ?rst toward this long-term goal.
The challenges associated with the use of big data for infectious diseases surveillance are linked to data heterogeneity. A big data system can be fed with medical encounter ?les, such as records from health care facilities and insurance claim forms. Crowdsourced data collected from volunteers who self-report symptoms in near real-time (part of the "citizen science" movement). Finally, data generated by the use of social media, the internet and mobile phones, which may include self-reporting of health, behavior and travel information.
The Questions
Based on the NoSQL technology reviewed so far (e.g., HBase, HBase, RDB, Cloudera, MemcacheDB) please answer What and why should your application use for a big data model for infectious disease surveillance?
To answer your question, please evaluate the criteria presented in the articles listed below, which can be retrieved from Google.
For example, if you select a document or graph database, you must justify your answer from a problem domain perspective, processing capabilities, and other relevant aspects.
• 35+ Use Cases For Choosing Your Next NoSQL Database
• Types of NoSQL databases and key criteria for choosing them