Mining for Big Data in the Health Enterprise

Dwight Raum, CTO, Johns Hopkins Medicine
1085
1695
316
Dwight Raum, CTO, Johns Hopkins Medicine

Dwight Raum, CTO, Johns Hopkins Medicine

Much has been said on the topic of Big Data and Healthcare IT. The Human Genome Project, completed in 2003, successfully mapped the human genome by sequencing over 3 billion base pairs. This massive effort took the better part of 13 years and was coordinated amongst twenty research institutes and universities. As the seminal Big Data in healthcare event, its achievement has been compared to putting a man on the moon. Now, with the widespread availability of open source and vendor tools, similar scale efforts can be applied to a wide variety of endeavors. The democratization of Big Data technology promises to revolutionize healthcare with new discoveries in 3D imaging, genomics and epidemiology. Big Data Analytics also promises to revolutionize population health as it is now possible to analyze risk cohorts and interventions at a granular level. While much of this is exciting and important, there is often a gap, and sometimes a chasm, between the research bench and the patient bedside. It is a perennial problem in all areas of research, yet fortunately it is being addressed at the policy level. Most of the federal mandate investments in Healthcare IT are focused not on new discovery, but on the application of Information Technology to consistently deliver quality care, share medical records and integrate ancillary services. Federal Meaningful Use incentives have driven technology adoption with near single-mindedness.  

Yet demographics and quality measures do not tell the whole story. Healthcare has substrata of data that is only now being excavated, and this could provide an even greater benefit to organizations, clinicians and, ultimately, patients. Hospitals are bristling with network connected devices. Our clinical hallways and rooms are clogged with workstations, monitors and devices. Clinicians carry all kinds of smartphones, pagers, tablets and devices. Patients carry smart phones connected to guest WiFi networks. CT scanners, mobile infusion pumps, mobile monitoring units, pulse-oximeters, wheelchairs, and patient beds are increasingly mobile and network-enabled. Data is flowing to and from devices and the scores of clinical applications that are common in hospitals. The data flow contains orders, physiological measurements, events, and transactions; each is interfaced using the HL7 messaging protocol and recorded into the Electronic Health Record (EHR). As the system of record for patients, modern and interconnected EHRs allow providers access to an unprecedented volume of patient data with relative ease. EHR vendors continue develop new data connections and techniques for interpreting and presenting data.

“Text messages, emails, pages, phone calls and HER communications leave behind digital metadata recorded in logs”

However, EHRs and care delivery organizations are largely ignoring another source of potentially valuable information: system log. Nearly every network connected device emits a log. These logs are effectively a distributed sensor network, containing time-stamped metadata about events of all types. Administrators can vary the level of logging from almost nothing to painfully verbose, although the general practice is to log only what is needed for occasional troubleshooting, which is usually not enough to support continuous analysis. Logs are a hidden resource IT leaders rarely tap. Individually, these logs tell us about the immediate system and are typically exploited only by those with technical responsibility. When log data is combined with information from the EHR, human resources, facilities and other Information Technologies, context is provided to data that would be mostly otherwise meaningless. Only by shifting to a program of promiscuous log collection and aggressive correlation, can care delivery organizations hope to quantitatively answer complex usability and workflow questions.  

Google, Facebook, Amazon and the other leading e-commerce firms have created entire markets on the collection of metadata. The current technology boom is in large part built on metadata analytics and management and has spawned new tools such as Hadoop and NoSQL databases. It is commonly understood that logs are ‘noisy,” that is, often fragmentary and ambiguous. Yet when combined with time correlated data sources a surprisingly complete activity picture can emerge from the noise. Most private industry purveyors of big data use this picture to advance marketing and sales. Consumer privacy implications of these technologies are disconcerting, however, when modeled and applied in healthcare this hyperawareness can be used to discern activity patterns that impact efficiency, patient safety and provider satisfaction.

From EHR access logs it is possible to infer which providers are involved with the care of a specific patient. Joined with wireless access point logs and device inventory data, we can arrive at an accurate chronology of one or more care episodes. Situational awareness speeds response in emergencies, improves workflow routing and can be used retrospectively. Issues with a patient could be detected in advance based on the deviations from statistic norms and personnel proactively alerted of an impending concern. This example demonstrates how seemingly disconnected data can be brought together, and collectively used to improve patient safety and care.  

Good communication amongst the many providers and departments providing care is critical too. Increasingly, these conversations are not in person, and in many cases individual providers may not be personally acquainted. Fortunately, text messages, emails, pages, phone calls and EHR communications leave behind digital metadata recorded in logs. Of course one of the challenges is wading through the metadata and matching the appropriate parts, with application information from personnel, department and EHR systems. Yet out of these disparate data streams, the whole analysis often exceeds the sum of its parts. We can begin to identify not just the flows of information but of collaboration and shared practice. With an established baseline, it is possible to design interventions and deploy them simultaneously in trial runs. The ability to quickly measure the impact of an intervention against the baseline can indicate the best solution, and fosters an iterative design approach.

In all of the discussions of Big Data and healthcare, it’s important to not ignore the data we all already have and fail to exploit to the maximum. Now is the time to review what data you have, what you could have and to design a program collect, analyze, monitor and report

Read Also

Clinical Informatics and the Promise of Advanced Technologies

Clinical Informatics and the Promise of Advanced Technologies

Michelle Woodley, Chief Nursing Information Officer, St. Joseph Health
No Wrong Door: Connecting the Dots in Health and Human Services

No Wrong Door: Connecting the Dots in Health and Human Services

Mouhanad Hammami, MD, MHSA, Director & County Health Officer, Department of Health, Veterans and Community Wellness, Wayne County, Michigan
Cyber security- A Proactive Approach to Securing Information

Cyber security- A Proactive Approach to Securing Information

Chad Wilson, Director, IT Security, Children's National Health System
Technology to Proactively Run a Healthcare Organization

Technology to Proactively Run a Healthcare Organization

Roni H. Amiel, Co-Founder and CTO, Pinscriptive