Oh, The Things We Can Predict!

Philip K. Dick presented us with a short story about the “precogs”, three mutants that foresaw all crime before it could occur. “The Minority Report” was written in 1956 – and, now, 60 years later we do indeed have all manner of digital tools to predict outcomes. However, I doubt Steven Spielberg will be adapting a predictive model for hospitalization for cinema.

This is a rather simple article looking at a single-center experience at using multivariate logistic regression to predict hospitalization. This differs, somewhat, from the existing art in that it uses data available at 10, 60, and 120 minutes from the arrival to the Emergency Department as the basis for its “progressive” modeling.

Based on 58,179 visits ending in discharge and 22,683 resulting in hospitalization, the specificity of their prediction method was 90% with a sensitivity or 96%,for an AUC of 0.97. Their work exceeds prior studies mostly on account of improved specificity, compared with the AUCs of a sample of other predictive models generally between 0.85 and 0.89.

Of course, their model is of zero value to other institutions as it will overfit not only on this subset of data, but also the specific practice patterns of physicians in their hospital. Their results also conceivably could be improved, as they do not actually take into account any test results – only the presence of the order for such. That said, I think it is reasonable to suggest similar performance from temporal models for predicting admission including these earliest orders and entries in the electronic health record.

For hospitals interested in improving patient flow and anticipating disposition, there may be efficiencies to be developed from this sort of informatics solution.

“Progressive prediction of hospitalisation in the emergency department: uncovering hidden patterns to improve patient flow”
http://emj.bmj.com/content/early/2017/02/10/emermed-2014-203819

Excitement and Ennui in the ED

It goes without saying some patient encounters are more energizing and rewarding than others.  As a corollary, some chief complaints similarly suck the joy out of the shift even before beginning the patient encounter.

This entertaining study simply looks for any particular time differential relating to physician self-assignment on the electronic trackboard between presenting chief complaints.  The general gist of this study would be that time-to-assignment reflects a surrogate of a composite of prioritization and/or desirability.

These authors looked at 30,382 presentations unrelated to trauma activations, and there were clear winners and losers.  This figure of the shortest and longest 10 complaints is a fairly concise summary of findings:

door to eval times

Despite consistently longer self-assignment times for certain complaints, the absolute difference in minutes is still quite small.  Furthermore, there are always issues with relying on these time stamps, particularly for higher-acuity patients; the priority of “being at the patient’s bedside” always trumps such housekeeping measures.  I highly doubt ankle sprains and finger injuries are truly seen more quickly than overdoses and stroke symptoms.

Vaginal bleeding, on the other hand … is deservedly pulling up the rear.

“Cherry Picking Patients: Examining the Interval Between Patient Rooming and Resident Self-assignment”
http://www.ncbi.nlm.nih.gov/pubmed/26874338

Informatics Trek III: The Search For Sepsis

Big data!  It’s all the rage with tweens these days.  Hoverboards, Yik Yak, and predictive analytics are all kids talk about now.

This “big data” application, more specifically, involves the use of an institutional database to derive predictors for mortality in sepsis.  Many decision instruments for various sepsis syndromes already exist – CART, MEDS, mREMS, CURB-65, to name a few – but all suffer from the same flaw: how reliable can a rule with just a handful of predictors be when applied to the complex heterogeneity of humanity?

Machine-learning applications of predictive analytics attempt to create, essentially, Decision Instruments 2.0.  Rather than using linear statistical methods to simply weight a small handful of different predictors, most of these applications utilize the entire data set and some form of clustering.  Most generally, these models replace typical variable weighted scoring with, essentially, a weighted neighborhood scheme, in which similarity to other points helps predict outcomes.

Long story short, this study out of Yale utilized 5,278 visits for acute sepsis and a random forest model to create a training set and a validation set.  The random forest model included all available data points from the electronic health record, while other models used up to 20 predictors based on expert input and prior literature.  For their primary outcome of predicting in-hospital death, the AUC for the random forest model was 0.86 (CI 0.82-0.90), while none of the rest of the models exceeded an AUC of 0.76.

This still simply at the technology demonstration phase, and requires further development to become actionable clinical information.  However, I believe models and techniques like this are our next best paradigm in guiding diagnostic and treatment decisions for our heterogenous patient population.  Many challenges yet remain, particularly in the realm of data quality, but I am excited to see more teams engaged in development of similar tools.

“Prediction of In-hospital Mortality in Emergency Department Patients with Sepsis: A Local Big Data Driven, Machine Learning Approach”
http://www.ncbi.nlm.nih.gov/pubmed/26679719

Hi Ur Pt Has AKI For Totes

Do you enjoy receiving pop-up alerts from your electronic health record?  Have you instinctively memorized the fastest series of clicks to “Ignore”?  “Not Clinically Significant”?  “Benefit Outweighs Risk”?  “Not Sepsis”?

How would you like your EHR to call you at home with more of the same?

Acute kidney injury, to be certain, is associated with poorer outcomes in the hospital – mortality, dialysis-dependence, and other morbidities.  Therefore, it makes sense – if an automated monitoring system can easily detect changes and trends, why not alert clinicians to such changes, and nephrotoxic therapies could be avoided.

Interestingly – for both good and bad – the outcomes measured were patient-oriented, randomizing 2393 patients to either “usual care” or text message alerts for changes in serum creatinine.  The goal, overall, was detection of reductions in death, dialysis, or progressive AKI.  While patient-oriented outcomes are, after all, the most important outcomes in medicine – it’s only plausible to improve outcomes if clinicians improve care.  Therefore, measuring the most direct consequence of the intervention might be a better outcome – renal-protective changes in clinician behavior.

Because, unfortunately, despite sending text messages and e-mails directly to responsible clinicians and pharmacists – the only notable change in behavior between the “alert” group and “usual care group” was increased monitoring of serum creatinine.  Chart documentation of AKI, avoidance of intravenous contrast, avoidance of NSAIDs, and other renal-protective behaviors were unchanged, excepting a non-significant trend towards decreased aminoglycoside use.

No change in behavior, no change in outcomes.  Text messages and e-mails alerts!  Can shock collars be far behind?

“Automated, electronic alerts for acute kidney injury: a single-blind, parallel-group, randomised controlled trial”
http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)60266-5/fulltext

Replace Us With Computers!

In a preview to the future – who performs better at predicting outcomes, a physician, or a computer?

Unsurprisingly, it’s the computer – and the unfortunate bit is we’re not exactly going up against Watson or the hologram doctor from the U.S.S. Voyager here.

This is Jeff Kline, showing off his rather old, not terribly sophisticated “attribute matching” software.  This software, created back in 2005-ish, is based off a database he created of acute coronary syndrome and pulmonary embolism patients.  He determined a handful of most-predictive variables from this set, and then created a tool that allows physicians to input those specific variables from a newly evaluated patient.  The tool then finds the exact matches in the database and spits back a probability estimate based on the historical reference set.

He sells software based on the algorithm and probably would like to see it perform well.  Sadly, it only performs “okay”.  But, it beats physician gestalt, which is probably better ranked as “poor”.  In their prospective evaluation of 840 cases of acute dyspnea or chest pain of uncertain immediate etiology, physicians (mostly attendings, then residents and midlevels) grossly over-estimated the prevalence of ACS and PE.  Physicians had a mean and median pretest estimate for ACS of 17% and 9%, respectively, and the software guessed 4% and 2%.  Actual retail price:  2.7%.  For PE, physicians were at mean 12% and median 6%, with the software at 6% and 5%.  True prevalence: 1.8%.

I don’t choose this article to highlight Kline’s algorithm, nor the comparison between the two.  Mostly, it’s a fascinating observational study of how poor physician estimates are – far over-stating risk.  Certainly, with this foundation, it’s no wonder we’re over-testing folks in nearly every situation.  The future of medicine involves the next generation of similar decision-support instruments – and we will all benefit.

“Clinician Gestalt Estimate of Pretest Probability for Acute Coronary Syndrome and Pulmonary Embolism in Patients With Chest Pain and Dyspnea.”
http://www.ncbi.nlm.nih.gov/pubmed/24070658

Death From a Thousand Clicks

The modern physician – one of the most highly-skilled, highly-compensated data-entry technicians in history.

This is a prospective, observational evaluation of physician activity in the Emergency Department, focusing mostly the time spent in interaction with the electronic health record.  Specifically, they counted mouse clicks during various documentation, order-entry, and other patient care activities.  The observations were conducted for 60-minute time periods, and then extrapolated out to an entire shift, based on multiple observations.

The observations were taken from a mix of residents, attendings, and physician extenders, and offer a lovely glimpse into the burdensome overhead of modern medicine: 28% of time was spent in patient contact, while 44% was spent performing data-entry tasks.  It requires 6 clicks to order an aspirin, 47 clicks to document a physical examination of back pain, and 187 clicks to complete an entire patient encounter for an admitted patient with chest pain.  This extrapolates out, at a pace of 2.5 patients per hour, to ~4000 clicks for a 10-hour shift.

The authors propose a more efficient documentation system would result in increased time available for patient care, increased patients per hour, and increased RVUs per hour.  While the numbers they generate from this sensitivity analysis for productivity increase are essentially fantastical, the underlying concept is valid: the value proposition for these expensive, inefficient electronic health records is based on maximizing reimbursement and charge capture, not by empowering providers to become more productive.

The EHR in use in this study is McKesson Horizon – but, I’m sure these results are generalizable to most EHRs in use today.

4000 Clicks: a productivity analysis of electronic medical records in a community hospital ED”
http://www.ncbi.nlm.nih.gov/pubmed/24060331

New South Wales Dislikes Cerner

The grass is clearly greener on the other side for these folks at Nepean Hospital in New South Wales, AUS.  This study details the before-and-after Emergency Department core measures as they transitioned from the EDIS system to Cerner’s FirstNet.  As they state in their introduction, “Despite limited literature indicating that FirstNet has decreased performance” and “reports of problems with Cerner programs overseas”, FirstNet was foisted upon them – so it’s clear they have an agenda with this publication.


And, a retrospective, observational study is the perfect vehicle for an agenda.  You pick the criteria you want to measure, the most favorable time period, and voilà!  These authors picked a six month pre-intervention period and a six-month post-intervention period.  Triage categories were similar for that six month period.  And then…they present data on a three-month subset.  Indeed, all their descriptive statistics are of only a three-month subset excepting ambulance offload waiting time – for which they have full six month data.  Why choose a study period fraught with missing data?

Then, yes, by every measure they are less efficient at seeing patients with the Cerner product.  The FirstNet system had been in place for six months by the time they report data – but, it’s still not unreasonable to suggest they’re somewhat suffering the growing pains of inexperience.  Then, they also understaff the ED by 3.2 resident shifts and 3.5 attending shifts per week.  An under-staffed ED for a relatively new implementation of a product with low physician acceptance?  

As little love I have for Cerner FirstNet, I’m not sure this study gives it a fair shot.


Effect of an electronic medical record information system on emergency department performance”
www.ncbi.nlm.nih.gov/pubmed/23451963

TPA is Dead, Long Live TPA

I’m sure this saturating the medical airwaves this morning, but yesterday’s NEJM published a study which they succinctly summarize on Twitter as “In trial of 75 pts w/ acute ischemic #stroke, tenecteplase assoc w/ better reperfusion, clin outcomes than alteplase.”


Well, that’s very exciting!  It’s still smashing a teacup with a sledgehammer, but it does appear to be a more functional sledgehammer.  Particularly encouraging were the rates of sustained complete recanalization – which were 36% at 24 hours for alteplase and 58% for tenecteplase – and the rates of intracranial hemorrhage – which were 20% for alteplase and 6% for tenecteplase.


However, the enthusiasm promoted by NEJM, and likely the rest of the internet, should be tempered by the fact that there were only 25 patients in each arm, and there is enough clinical variability between groups that it is not yet practice changing.  This was a phase 2B trial, and it is certainly reasonable evidence to proceed with a phase III trial.


Unfortunately, in a replay of prior literature, the authors are all affiliated with Boehringer Ingelheim, the manufacturer of tenecteplase.


A Randomized Trial of Tenecteplase versus Alteplase for Acute Ischemic Stroke”
http://www.nejm.org/doi/full/10.1056/NEJMoa1109842

Addendum:  As Andy Neil appropriately points out, tenecteplase has been studied before – 112 patients over several years, terminated early due to slow enrollment – without seeing a significant advantage.

ED Geriatric CPOE Intervention – Win?

It does seem as though this intervention had a measure of success – based on their primary outcome – but there’s more shades of grey throughout the article.

This is a prospective, controlled trial of a contextual computer decision-support (CDS) incorporated into the computerized provider order entry (CPOE) system of their electronic health record (EHR).  They do a four-phase On/Off intervention where the CPOE either suggests alternative medications or dose reductions in patients >65 years of age.  They look at whether the intervention changed the rate at which medication ordering was compliant with medication safety in the elderly, and then, secondarily, at the rate of 10-fold errors, medication cancellations, and adverse drug event reports.

The oddest part of this study is their choice of primary outcome measure.  Ideally, the most relevant outcome is the patient-oriented outcome – which, in this case, ought to be a specific decrease in adverse drug events in the elderly.  However, and I can understand where they’re coming from, they chose to specifically evaluate the usability/acceptability of the CDS intervention to verify the mechanism of intervention.  There are lots of studies out there documenting “alert fatigue”, resulting in either no change or even increasing error rates.

As far as the main outcome measure goes, they had grossly positive findings – 31% of orders were compliant during the intervention periods vs. 23% of orders during the control periods.  But, 92.5% of recommendations for alternative medications were ignored during the intervention periods – most commonly triggered by diazepam, clonazepam, and indomethacin.  The intervention was successful in reducing doses for NSAIDs and for opiates, but had no significant effect on benzodiazepine or sedative-hypnotic dosing.

However, bizarrely, even though there was just a small difference in guideline-concordant ordering, there was a 4-fold reduction in adverse drug events – most of which occurred during the initial “off” period.  As a secondary outcome, there’s much to say about it other than “huh”.  None of their other secondary outcomes demonstrated any differences.

So, it’s an interesting study.  It is consistent with a lot of previous studies – most alerts are ignored, but occasionally small positive effect sizes are seen.  Their primary outcome measure is one of mostly academic interest – it would be better if they had chosen more clinically relevant outcomes.  But, no doubt, if you’re not already seeing a deluge of CDS alerts, just wait a few more years….

“Guided medication dosing for elderly emergency patients using real-time, computerized decision support”
http://www.ncbi.nlm.nih.gov/pubmed/22052899