When Is An Alarm Not An Alarm?

What is the sound of one hand clapping?  If a tree falls in a forest, does it make a sound?  If a healthcare alarm is in no fashion alarming, what judgement ought we make of its existence?

The authors of this study, from UCSF, compose a beautiful, concise introduction to their study, which I will simply reproduce, rather than unimpressively paraphrase:

“Physiologic monitors are plagued with alarms that create a cacophony of sounds and visual alerts causing ‘alarm fatigue’ which creates an unsafe patient environment because a life-threatening event may be missed in this milieu of sensory overload.“

We all, intuitively, know this to be true.  Even the musical mating call of the ventilator, the “life support” of the critically ill, barely raises us from our chairs until such sounds become insistent and sustained.  But, these authors quantified such sounds – and look upon such numbers, ye Mighty, and despair:

2,558,760 alarms on 461 adults over a 31-day study period.

Most alarms – 1,154,201 of them – were due to monitor detection of “arrhythmias”, with the remainder split between vital sign parameters and other technical alarms.  These authors note, in efforts to combat alert fatigue, audible alerts were already restricted to those considered clinically important – which reduced the overall burden to a mere 381,050 audible alarms, or, only 187 audible alarms per bed per day.

Of course, this is the ICU – many of these audible alarms may, in fact, have represented true positives.  And, many did – nearly 60% of the ventricular fibrillation alarms were true positives.  However, next up was asystole at 33% true positives, and it just goes downhill from there – with a mere 3.3% of the 1,299 reviewed ventricular bradycardia alarms classified as true positives.

Dramatic redesign of healthcare alarms is clearly necessary as not to detract from high-quality care.  Physicians are obviously tuning out vast oceans of alerts, alarms, and reminders – and some of them might even be important.

“Insights into the Problem of Alarm Fatigue with Physiologic Monitor Devices: A Comprehensive Observational Study of Consecutive Intensive Care Unit Patients”
http://www.ncbi.nlm.nih.gov/pubmed/25338067

Clinical Informatics Exam Post-Mortem

I rarely break from literature review in my blog posts (although, I used to make the occasional post about Scotch).  However, there are probably enough folks out there in academia planning on taking this examination, or considering an Emergency Medicine Clinical Informatics fellowship – like the ones at Mt. Sinai, BIDMC, and Arizona – to make this diversion of passing interest to a few.

Today is the final day of the 2015 testing window, so everyone taking the test this year has already sat for it or is suffering through it at this moment.  Of course, I’m not going to reveal any specific questions, or talk about a special topic to review (hint, hint), but more my general impressions of the test – as someone who has taken a lot of tests.

The day started out well, as the Pearson Vue registration clerk made a nice comment that I’d gone bald since my picture at my last encounter with them, presumably for USMLE Step 3.  After divesting myself of Twitter-enabled devices, the standard computer-based multiple-choice testing commenced.

First of all, for those who aren’t aware, this is only the second time the American Board of Preventive Medicine has administered the Clinical Informatics board examination.  Furthermore, there are few – probably zero – clinicians currently taking this examination who have completed an ACGME Clinical Informatics fellowship.  They simply don’t exist.  Thus, there is a bit of a perfect storm in which none of us have undergone a specific training curriculum preparing us for this test, plus minimal hearsay/experience from folks who have taken the test, plus a test which is essentially still experimental.

Also, the majority (>90%) of folks taking the test use one of AMIA’s review courses – either the in-person session or the online course and assessment.  These courses step through the core content areas describe for the subspecialty of Clinical Informatics, and, in theory, review the necessary material to obtain a passing score.  After all, presumably, the folks at AMIA designed the subspecialty and wrote most of the questions – they ought to know how to prep for it, right?

Except, as you progress through the computer-based examination, you find the board review course has given you an apparently uselessly superficial overview of many topics.  Most of us taking the examination today, I assume, are current or former clinicians, with some sort of computer science background, and are part-time researchers in a subset of clinical informatics.  This sort of experience gets you about half the questions on the exam in the bag.  Then, about a quarter of the course – if you know every detail of what’s presented in the review course regarding certification organizations, standards terminologies, process change, and leadership – that’s another 50 out of 200 questions you can safely answer.  But, you will need to really have pointlessly memorized a pile of acronyms and their various relationships to get there.  Indeed, the use of acronyms is pervasive enough it’s almost as though their intention is more to weed out those who don’t know some secret handshake of Clinical Informatics, rather than truly assess your domain expertise.

The last quarter of the exam?  The ABPM study guide for the examination states 40% of the exam covers “Health Information Systems” and 20% covers “Leading and Managing Change”.  And, nearly every question I was trying to make useful guesses towards came from those two areas – and covered details either absent from or addressed in some passing vagueness in the AMIA study course.  And, probably as some consequence of this being one of the first administrations of this test, I wasn’t particularly impressed the questions – which were heavy on specific recall, and not hardly on application of knowledge or problem solving.  I’m not sure exactly what resources I’d use to study prior to retaking if I failed, but most of the difference would come down to just rote memorization.

However, because the pass rate was 92% last year, and nearly everyone taking the test used the AMIA course, an average examinee with the average preparation ought yet to be in good shape.  So, presumably, despite my distasteful experience overall – one likely shared by many – we’ll all receive passing scores.

Check back mid-December for the exciting conclusion to this tale.

Update (as noted in comments below):  Passed!

Hopefully future editions of prep courses will gradually attune themselves to the board content, once a few iterations have progressed.  Individuals taking this exam, in the meantime, will need to rely heavily on their medical or prior technical experience, particularly as the curricula for fellowships are fleshed out.  Additionally, the CI exam content is so broad, fellowship trainees will need to specifically target their coursework to areas they lack – for, example, “Leading and Managing Change” as a major content area of the examination will definitely force many informaticians into a knowledge gap.

Interesting times!

Using Patient-Similarity to Predict Pulmonary Embolism

Topological data analysis is one of the many “big data” buzzphrases being thrown about, with roots in non-parametric statistical analysis, and promoted by the Palo Alto startup, Ayasdi.  I’ve done a little experimentation with it, and used it mostly to show the underlying clustering and heterogeneity of the PECARN TBI data set.  My ultimate hypothesis, based on these findings, would be that patient-similarity is a more useful predictor of individual patient risk than the partition analysis used in the original PECARN model.  This technique is similar to the “attribute matching” demonstrated by Jeff Kline in Annals, but of much greater granularity and sophistication.

So, I should be excited to see this paper – using the TDA output to train a neural network classifier for suspected pulmonary embolism.  Using 152 patients, 101 of which were diagnosed with PE, the authors develop a topological network with clustered distributions of diseased and non-diseased individuals, and compare the output from this network to the Wells and Revised Geneva Scores.

The AUC for the neural network was 0.8911, for Wells was 0.74, and Revised Geneva was 0.55. And this sounds fabulous – until it’s noted the neural network is being derived and tested on the same, tiny sample.  There’s no validation set, and, given such a small sample, the likelihood of overfitting is substantial.  I expect performance will degrade substantially when applied to other data sets.

However, even simply as scientific curiosity – I hope to see further testing and refinement of potentially greater value.

“Using Topological Data Analysis for diagnosis pulmonary embolism”
http://arxiv.org/abs/1409.5020
http://www.ayasdi.com/_downloads/A_Data_Driven_Clinical_Predictive_Rule_CPR_for_Pulmonary_Embolism.pdf

How Electronic Health Records Sabotage Care

Our new information overlords bring many benefits to patient care.  No, really, they do.  I’m sure you can come up with one or two aspects of patient  safety improved by modern health information technology.  However, it’s been difficult to demonstrate benefits associated with electronic health records in terms of patient-oriented outcomes because, as we are all well aware, many EHRs inadvertently detract from efficient processes of care.

However, while we intuitively recognize the failings of EHRs, there is still work to be done in cataloguing these errors.  To that end, this study is a review of 100 consecutive closed patient safety investigations in the Veterans Health Administration relating to information technology.  The authors reviewed each case narrative in detail, and divided the errors up into sociotechnical classification of EHR implementation and use.  Unsurprisingly, the most common failures of EHRs are related to failures to provide the correct information in the correct context.  Following that, again, unsurprisingly, were simple software malfunctions and misbehaviors.  Full accounting and examples are provided in Table 2:

Yes, EHRs – the solution to, and cause of, all our problems.

“An analysis of electronic health record-related patient safety concerns”
http://jamia.bmj.com/content/early/2014/05/20/amiajnl-2013-002578.full

Build a New EDIS, Advertise it in Annals for Free

As everyone who has switched from paper to electronic charting and ordering has witnessed, despite some improvements, many processes became greatly more inefficient.  And – it doesn’t matter which Emergency Department information system you use.  Each vendor has its own special liabilities.  Standalone vendors have interoperability issues.  Integrated systems appear to have been designed as an afterthought to the inpatient system.  We have, begrudgingly, learned to tolerate our new electronic masters.

This study, in Annals of Emergency Medicine, describes the efforts of three authors to design an alternative to one of the vendor systems:  Cerner’s FirstNet product.  I have used this product.  I feel their pain.  And, I am in no way surprised these authors are able to design alternative, custom workflows that are faster (as measured in seconds) and more efficient (as measured in clicks) for their prototype system.  It is, essentially, a straw man comparator – as any thoughtful, user-centric, iterative design process could improve upon the current state of most EDIS.

With the outcome never in doubt, the results demonstrated are fundamentally unremarkable and of little scientific value.  And, it finally all makes sense as the recurrent same sad refrain rears its ugly head in the conflict-of-interest declaration:

Dr. Patrick and Mr. Besiso are employees of iCIMS, which is marketing the methodology described in this article.

Cheers to Annals for enabling these authors to use the pages of this journal as a vehicle to sell their consulting service.

“Efficiency Achievements From a User-Developed Real-Time Modifiable Clinical Information System”
http://www.ncbi.nlm.nih.gov/pubmed/24997563

Shared Decision-Making to Reduce Overtesting

Medicine, like life, is full of uncertainty.  Every action or inaction has costs and consequences, both anticipated and unintended.  Permeating through medical culture for many reasons, with the proliferation of tests available, has been a decreased tolerance for this uncertainty and the rise of “zero-miss” medicine.  However, there are some tests that carry with them enough cost and risk, the population harms of the test outweigh the harms of the missed diagnoses.  CTPA for pulmonary embolism is one of those tests.

In this study, these authors attempt to reduce testing for pulmonary embolism by creating a shared decision-making framework to discuss the necessity of testing with patients.  They prospectively enrolled 203 patients presenting to the Emergency Department with dyspnea and, independent of their actual medical evaluation, attempted to ascertain their hypothetical actions were they to be evaluated for PE.  Specifically, they were interested in the “low clinical probability” population whose d-Dimer was elevated above the abnormal threshold – but still below twice-normal the threshold.  For these “borderline” abnormal d-Dimers, the authors created a visual decision tool describing their estimate of the benefit and risk of undergoing CTPA given this specific clinical scenario.

After viewing the benefits and risks of CTPA, 36% of patients in this study stated they would hypothetically decline testing for PE.  Most of the patients (85%) who planned to follow-through with the CTPA did so because they were concerned regarding a possible missed diagnosis of PE, while the remaining hoped the CT would at least also provide additional information regarding their actual diagnosis.  The authors conclude, based on a base case of 2.6 million possible PE evaluations annually, this strategy might save 100,000 CTPAs.

I think the approach these authors promote is generally on the right track.  The challenge, however, is the data used to discuss risks with patients.  From their information graphics, the risks of CTPA – cancer, IV contrast reaction, kidney injury and false positives – are all fair to include, but can be argued greatly regarding their clinical relevance.  Is a transient 25% increase in serum creatinine in a young, healthy person clinically significant?  Is it the same as a cancer diagnosis?  Is it enough to mention there are false-positives from the CTPA without mentioning the risk of having a severe bleeding event from anticoagulation?  Then, in their risk of not having the CTPA information graphic, they devote the bulk of that risk to a 15% chance of the CT identifying a diagnosis that would have otherwise been missed.  I think that significantly overstates the number of additional, clinically important findings requiring urgent treatment that might be identified.  Finally, the risks presented are for the “average” patient – and may be entirely inaccurate across the heterogenous population presenting for dyspnea.

But, any quibbles over the information graphic, limitations, and magnitude of effect are outweighed by the importance of advancing this approach in our practice.  Paternalism is dead, and new tools for communicating with patients will be critical to the future of medicine.

“Patient preferences for testing for pulmonary embolism in the ED using a shared decision-making model”
http://www.ncbi.nlm.nih.gov/pubmed/24370071

Strep Throat? Stay Home!

NBC News covered this useful-seeming innovation last week – a predictive score to help patients decide whether their sore throat might be caused by Group A Strep.  It seems a quite reasonable proposition on the surface – if patients can receive guidance on their pretest likelihood of disease, they might rather not seek unnecessary medical care.  For the 12 million physician visits every year for sore throat, putting a dent in this would account for sizable cost savings.

This study describes retrospective development of a “Home Score” for use by patients, based on a MinuteClinic database of 71,000 sore throat presentations for which a strep swab was performed.  The authors split the data into a derivation and a validation set, and produced a complex mathematic scoring system, from 1 to 100, based off age, fever, cough, and biosurveillance data.  Using a score of 10 as a cut-off, the validation set sensitivity was 99%, specificity was 1%, and the prevalence data used resulted in a validation negative predictive value of 87%.  This NPV, the authors say, is the important number regarding advising patients whether they ought to seek care for GAS.

There are a few issues with this derivation, of course.  First of all, the derivation population is subject to selection bias – as only patients with strep swabs are included.  Then, the MinuteClinic data has to be generalizable to the remaining adult population.  The use of the Home Score also depends on the availability of biosurveillance data for their specialized algorithm.  Finally, their NPV cut-off of 90% would theoretically obviate clinic visits for only 230,000 of the 12 million patients seeking care for sore throat – a large drop, but only a drop in the bucket, nonetheless.

And, the elephant in the room: Group A Strep doesn’t need antibiotics in the United States.  The likelihood of adverse reactions to treatment of GAS exceeds the chance of benefit – whether progression to peritonsilar abscess or rheumatic fever is considered.  A few folks on Twitter chimed in to echo this sentiment when this story was discussed:

@embasic @DrLeanaWen @MDaware @NBCNewsHealth just need to redesign app to say “no you don’t” regardless of sx
— Anand Swaminathan (@EMSwami) November 10, 2013

There are legitimate reasons to visit a physician for sore throat – but, in the U.S., nearly all uncomplicated pharyngitis can safely stay home, GAS or not.

“Participatory Medicine: A Home Score for Streptococcal Pharyngitis Enabled by Real-Time Biosurveillance”
http://www.ncbi.nlm.nih.gov/pubmed/24189592

Another Taste of the Future

Putting my Emergency Informatics hat back on for a day, I’d like to highlight another piece of work that brings us, yet again, another step closer to being replaced by computers.

Or, at the minimum, being highly augmented by computers.

There are multitudinous clinical decision instruments available to supplement physician decision-making.  However, the general unifying element of most instruments is the necessary requirement of physician input.  This interruption of clinical flow reduces acceptability of use, and impedes knowledge translation through the use of these tools.

However, since most clinicians are utilizing Electronic Health Records, we’re already entering the information required for most decision instruments into the patient record.  Usually, this is a combination of structured (click click click) and unstructured (type type type) data.  Structured data is easy for clinical calculators to work with, but has none of the richness communicated by freely typed narrative.  Therefore, clinicians much prefer to utilize typed narrative, at the expense of EHR data quality.

This small experiment out of Cincinnati implemented a natural-language processing and machine-learning automated method to collect information from the EHR.  Structured and unstructured data from 2,100 pediatric patients with abdominal pain were analyzed to extract the elements to calculate the Pediatric Appendicitis Score.  Appropriateness of the Pediatric Appendicitis Score aside, their method performed reasonably well.  It picked up about 87% of the elements of the Score from the record, and was correct when doing so about 86%, as well.  However, this was performed retrospectively – and the authors state this processing would still be substantially delayed by hours following the initial encounter.

So, we’re not quite yet at the point where a parallel process monitors system input and provides real-time diagnostic guidance – but, clearly, this is a window into the future.  The theory:  if an automated process could extract the data required to calculate the score, physicians might be more likely to integrate the score into their practice – and thusly lead to higher quality care through more accurate risk-stratification.

I, for one, welcome our new computer overlords.

“Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department”

Replace Us With Computers!

In a preview to the future – who performs better at predicting outcomes, a physician, or a computer?

Unsurprisingly, it’s the computer – and the unfortunate bit is we’re not exactly going up against Watson or the hologram doctor from the U.S.S. Voyager here.

This is Jeff Kline, showing off his rather old, not terribly sophisticated “attribute matching” software.  This software, created back in 2005-ish, is based off a database he created of acute coronary syndrome and pulmonary embolism patients.  He determined a handful of most-predictive variables from this set, and then created a tool that allows physicians to input those specific variables from a newly evaluated patient.  The tool then finds the exact matches in the database and spits back a probability estimate based on the historical reference set.

He sells software based on the algorithm and probably would like to see it perform well.  Sadly, it only performs “okay”.  But, it beats physician gestalt, which is probably better ranked as “poor”.  In their prospective evaluation of 840 cases of acute dyspnea or chest pain of uncertain immediate etiology, physicians (mostly attendings, then residents and midlevels) grossly over-estimated the prevalence of ACS and PE.  Physicians had a mean and median pretest estimate for ACS of 17% and 9%, respectively, and the software guessed 4% and 2%.  Actual retail price:  2.7%.  For PE, physicians were at mean 12% and median 6%, with the software at 6% and 5%.  True prevalence: 1.8%.

I don’t choose this article to highlight Kline’s algorithm, nor the comparison between the two.  Mostly, it’s a fascinating observational study of how poor physician estimates are – far over-stating risk.  Certainly, with this foundation, it’s no wonder we’re over-testing folks in nearly every situation.  The future of medicine involves the next generation of similar decision-support instruments – and we will all benefit.

“Clinician Gestalt Estimate of Pretest Probability for Acute Coronary Syndrome and Pulmonary Embolism in Patients With Chest Pain and Dyspnea.”
http://www.ncbi.nlm.nih.gov/pubmed/24070658

Death From a Thousand Clicks

The modern physician – one of the most highly-skilled, highly-compensated data-entry technicians in history.

This is a prospective, observational evaluation of physician activity in the Emergency Department, focusing mostly the time spent in interaction with the electronic health record.  Specifically, they counted mouse clicks during various documentation, order-entry, and other patient care activities.  The observations were conducted for 60-minute time periods, and then extrapolated out to an entire shift, based on multiple observations.

The observations were taken from a mix of residents, attendings, and physician extenders, and offer a lovely glimpse into the burdensome overhead of modern medicine: 28% of time was spent in patient contact, while 44% was spent performing data-entry tasks.  It requires 6 clicks to order an aspirin, 47 clicks to document a physical examination of back pain, and 187 clicks to complete an entire patient encounter for an admitted patient with chest pain.  This extrapolates out, at a pace of 2.5 patients per hour, to ~4000 clicks for a 10-hour shift.

The authors propose a more efficient documentation system would result in increased time available for patient care, increased patients per hour, and increased RVUs per hour.  While the numbers they generate from this sensitivity analysis for productivity increase are essentially fantastical, the underlying concept is valid: the value proposition for these expensive, inefficient electronic health records is based on maximizing reimbursement and charge capture, not by empowering providers to become more productive.

The EHR in use in this study is McKesson Horizon – but, I’m sure these results are generalizable to most EHRs in use today.

4000 Clicks: a productivity analysis of electronic medical records in a community hospital ED”
http://www.ncbi.nlm.nih.gov/pubmed/24060331