Quick Hit: Elders Risk Assessment

A few words regarding an article highlighted in one of my daily e-mails – a report regarding the Elders Risk Assessment tool (ERA) from the Mayo Clinic.

The key to the highlight is the assertion this score can be easily calculated and presented in-context to clinicians during primary care visits, allowing patients with higher scores to be easily identified for preventive interventions. With an AUC of 0.84, the authors are rather chuffed about the overall performance. In fact, they close their discussion with this rosy outlook:

The adoption of a proactive approach in primary care, along with the implementation of a predictive clinical score, could play a pivotal role in preventing critical ill- nesses, benefiting patients and optimizing healthcare resource allocation.

Completely missed by their limitations is that prognostic scores are not prescriptive. The ERA is based on age, recent hospitalizations, and chronic illness. The extent to which the management of any of these issues can be addressed “proactively” in the current primary care environment, and demonstrate a positive impact on patient-oriented outcomes, remains to be demonstrated.

To claim a scoring system is going to better the world, it is necessary to compare decisions made with formal prompting by the score to decisions made without – several steps removed from performing a retrospective evaluation to generate an AUC. It ought also be appreciated some decisions based on high ERA scores will increase resource utilization without a corresponding beneficial effect on health, while lower scores may likewise inappropriately bias clinical judgement.

This article has only passing applicability to emergency medicine, but the same issues regarding the disutility of “prognosis” apply widely.

“Individualized prediction of critical illness in older adults: Validation of an elders risk assessment model”
https://agsjournals.onlinelibrary.wiley.com/doi/abs/10.1111/jgs.18861

The United Colors of Sepsis

Here it is: sepsis writ Big Data.

And, considering it’s Big Data, it’s also a big publication: a 15 page primary publication, plus 90+ pages of online supplement – dense with figures, raw data, and methods both routine and novel for the evaluation of large data sets.

At the minimum, to put a general handle on it, this work primarily demonstrates the heterogeneity of sepsis. As any clinician knows, “sepsis” – with its ever-morphing definition – ranges widely from those generally well in the Emergency Department to those critically ill in the Intensive Care Unit. In an academic sense, this means the patients enrolled and evaluated in various trials for the treatment of sepsis may be quite different from one another, and results seen in one trial or setting may generalize poorly to another. This has obvious implications when trying to determine a general set of care guidelines from these disparate bits of data, and resulting in further issues down the road when said guidelines become enshrined in quality measures.

Overall, these authors ultimately define four phenotypes of sepsis, helpfully assigned descriptive labels using the letters of the greek alphabet. These four phenotypes of sepsis are derived from retrospective administrative data, then validated on additional retrospective administrative data, and finally the raw data from several prominent clinical trials in sepsis, including ACCESS, PROWESS, and ProCESS. The four phenotypes were derived by clustering and refinement, and are described by the authors as effectively: a mild type with low mortality; a cohort of those with chronic illness; a cohort with systemic inflammation and pulmonary disease; and a final cohort with liver dysfunction, shock, and high mortality.

We are quite far, however, from needing to apply these phenotypes in a clinical fashion. Any classification model is highly dependent upon the inputs, and in this study the inputs are the sorts of routine clinical data available from the electronic health record: vital signs, demographics, and basic labs. Missing data was common, including, for example, lactate levels, which was not obtained on 80% of patients in their model. These inputs then dictate how many different clusters you obtain, how the relative accuracy of classification diminishes with greater numbers of clusters, as well whether the model begins to overfit the derivation data set.

Then, this is a little bit of a fuzzy application in the sense these data represent as much different types of patients with sepsis, as it represents different types of sepsis. Consider the varying etiologies of sepsis, including influenza pneumonia, streptococcal toxic shock, or gram-negative bacteremia. These different etiologies would obviously result in different host responses depending on individual patient features. These phenotypes derived here effectively mash up causative agent with the underlying host, muddying clinical application.

If clinical utility is limited, then what might the best utility for this work? Well, this goes back to the idea above regarding translating work from clinical trials to different settings. A community Emergency Department might primarily see alpha-sepsis, a community ICU might see a lot of beta-sepsis, while an academic ICU might see predominantly delta-sepsis. These are important concepts to consider – and potentially subgroup-analyses to perform – when evaluating the outcomes of clinical trials. These authors do several simulations of clinical trials while varying the composition of phenotypes of sepsis, and note potentially important effects on primary outcomes. Pathways of care or resuscitation protocols could potentially be more readily compared between trial populations if these phenotypes were calculated.

This is a challenging work to process – but an important first step in better recognizing the heterogeneity in potential benefits and harms resulting from various interventions. The accompanying editorial does really a very excellent job of describing their methods, outcomes, and utility, as well.

“Derivation, Validation, and Potential Treatment Implications of Novel Clinical Phenotypes for Sepsis”
https://jamanetwork.com/journals/jama/fullarticle/2733996

“New Phenotypes for Sepsis”
https://jamanetwork.com/journals/jama/fullarticle/2733994

OK, Google: Discharge My Patient

Within my electronic health record, I have standardized discharge instructions in many languages. Many of these, I can read or edit with some fluency – such as Spanish – and those of which I have no facility whatsoever – such at Vietnamese. These function adequately for general reading material regarding any specific diagnosis made in the Emergency Department.

However, frequently, additional free text clarification is necessary regarding a treatment plan – whether it be time until suture removal, specifics about follow-up, or clarifications relevant to an individual patient. This level of language art is beyond my capacity in Spanish, let alone any sort of logographic or morphographic writing.

These authors performed a simple study in which they processed 100 free-text Emergency Department discharge instructions through the Google Translate blender to produce Spanish- and Chinese-language editions. The accuracy of the Spanish translation was 92%, as measured by the number of sentences preserving meaning and readability. Chinese fared less well, at 81%. Finally, authors assessed the errors for clinically relevant and potential harm – and found 2% of Spanish instructions and 8% of Chinese met their criteria.

Of course, there are a couple potential strategies to mitigate these potential issues – including back-translating the text from the foreign language back into English, as they did as part of these methods, or spending time verbally confirming the clarity of the written instructions with the patient. Instructions can also be improved prior to instruction by avoiding abbreviations and utilizing simple sentence structures.

Imperfect as they may be, using a translation tool is still likely better than giving no written instruction at all.

“Assessing the Use of Google Translate for Spanish and Chinese Translations of Emergency Department Discharge Instructions”
https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2725080

Don’t Rely on the EHR to Think For You

“The Wells and revised Geneva scores can be approximated with high accuracy through the automated extraction of structured EHR data elements in patients who underwent CTPA in the emergency department.”

Can it be done? Can the computer automatically discern your intent and extract pulmonary embolism risk-stratification from the structured data? And, with “high accuracy” as these authors tout in their conclusion?

IFF:  “High accuracy” means ~90%. That means one out of every ten in their sample was misclassified as low- or high-risk for PE. This is clinically useless.

The Wells classification, of course, depends highly upon the 3 points assigned for “PE is most likely diagnosis.” So, these authors assigned 3 points positive for every case.  This sort of probably works in a population that was selected explicitly because they underwent CTPA in the ED, but is obviously a foundationally broken kludge.  Revised Geneva does not have a “gestalt” element, but there are still subjective examination features that may not make it into structured data – and, obviously, it performed just as well (poorly) as the Wells tool.

To put it mildly, these authors are overselling their work a little bit. The electronic health record will always depend on the data entered – and it’s setting itself up for failure if it depends on specific elements entered by the clinician contemporaneously during the evaluation. Tools such as these have promise – but perhaps not this specific application.

“Automated Pulmonary Embolism Risk Classification and Guideline Adherence for Computed Tomography Pulmonary Angiography Ordering”
https://onlinelibrary.wiley.com/doi/abs/10.1111/acem.13442

It’s Sepsis-Harassment!

The computer knows all in modern medicine. The electronic health record is the new Big Brother, all-seeing, never un-seeing. And it sees “sepsis” – a lot.

This is a report on the downstream effects of an electronic sepsis alert system at an academic medical center. Their sepsis alert system was based loosely on the systemic inflammatory response syndrome for the initial warning to nursing staff, followed by additional alerts triggered by hypotension or elevated lactate. These alerts prompted use of sepsis order sets or triggering of internal “sepsis alert” protocols. Their outcomes of interest in their analysis were length-of-stay and in-hospital mortality.

At first glance, the alert appears to be a success – length of stay dropped from 10.1 days to 8.6, and in-hospital mortality from 8.5% to 7.0%. It would have been quite simple to stop there and trumpet these results as favoring the alerts, but the additional analyses performed by these authors demonstrate otherwise. In the case of both length-of-stay and mortality, both of those measures were trending downward independently regardless of the intervention, and in their adjusted analyses, none of the improvements could be conclusively tied to the sepsis alerts – and some relating to diagnoses of less-severe cases of sepsis probably prompted by the alert itself.

What is not debatable, however, is the burden on clinicians and staff. During their ~2.5 year study period, the sepsis alerts were triggered 97,216 times – 14,207 of which in the 2,144 subsequently receiving a final diagnosis of sepsis. The SIRS-based alerts comprised most (83,385) of these alerts, but only captured 73% of those with an ultimate diagnosis of sepsis, while having only a 13% true positive rate. The authors’ conclusion gets it right:

Our results suggest that more sophisticated approaches to early identification of sepsis patients are needed to consistently improve patient outcomes.

“Impact of an emergency department electronic sepsis surveillance system on patient mortality and length of stay”
https://academic.oup.com/jamia/article-abstract/doi/10.1093/jamia/ocx072/4096536/Impact-of-an-emergency-department-electronic

No Change in Ordering Despite Cost Information

Everyone hates the nanny state. When the electronic health record alerts and interrupts clinicians incessantly with decision-“support”, it results in all manner of deleterious unintended consequences. Passive, contextual decision-support has the advantage of avoiding this intrusiveness – but is it effective?

It probably depends on the application, but in this trial, it was not. This is the PRICE (Pragmatic Randomized Introduction of Cost data through the Electronic health record) trial, in which 75 inpatient laboratory tests were randomized to display of usual ordering, or ordering with contextual Medicare cost information. The hope and study hypothesis was the availability of this financial interest would exert a cultural pressure of sorts on clinicians to order fewer tests, particularly those with high costs.

Across three Philadelphia-area hospitals comprising 142,921 hospital admissions in a two-year study period, there were no meaningful differences in lab tests ordered per patient day in the intervention or the control. Looking at various subgroups of patients, it is also unlikely there were particularly advantageous effects in any specific population.

Interestingly, one piece of feedback the authors report is the residents suggest most of their routine lab test ordering resulted from admission order sets. “Routine” daily labs are set in motion at the time of admission, not part of a daily assessment of need, and thus a natural impediment to improving low-value testing. However, the authors also note – and this is probably most accurate – because the cost information was displayed ubiquitously, physicians likely became numb to the intervention. It is reasonable to expect substantially more selective cost information could have focused effects on an adea of particularly high cost or low-value.

“Effect of a Price Transparency Intervention in the Electronic Health Record on Clinician Ordering of Inpatient Laboratory Tests”

http://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2619519

Oh, The Things We Can Predict!

Philip K. Dick presented us with a short story about the “precogs”, three mutants that foresaw all crime before it could occur. “The Minority Report” was written in 1956 – and, now, 60 years later we do indeed have all manner of digital tools to predict outcomes. However, I doubt Steven Spielberg will be adapting a predictive model for hospitalization for cinema.

This is a rather simple article looking at a single-center experience at using multivariate logistic regression to predict hospitalization. This differs, somewhat, from the existing art in that it uses data available at 10, 60, and 120 minutes from the arrival to the Emergency Department as the basis for its “progressive” modeling.

Based on 58,179 visits ending in discharge and 22,683 resulting in hospitalization, the specificity of their prediction method was 90% with a sensitivity or 96%,for an AUC of 0.97. Their work exceeds prior studies mostly on account of improved specificity, compared with the AUCs of a sample of other predictive models generally between 0.85 and 0.89.

Of course, their model is of zero value to other institutions as it will overfit not only on this subset of data, but also the specific practice patterns of physicians in their hospital. Their results also conceivably could be improved, as they do not actually take into account any test results – only the presence of the order for such. That said, I think it is reasonable to suggest similar performance from temporal models for predicting admission including these earliest orders and entries in the electronic health record.

For hospitals interested in improving patient flow and anticipating disposition, there may be efficiencies to be developed from this sort of informatics solution.

“Progressive prediction of hospitalisation in the emergency department: uncovering hidden patterns to improve patient flow”
http://emj.bmj.com/content/early/2017/02/10/emermed-2014-203819

Excitement and Ennui in the ED

It goes without saying some patient encounters are more energizing and rewarding than others.  As a corollary, some chief complaints similarly suck the joy out of the shift even before beginning the patient encounter.

This entertaining study simply looks for any particular time differential relating to physician self-assignment on the electronic trackboard between presenting chief complaints.  The general gist of this study would be that time-to-assignment reflects a surrogate of a composite of prioritization and/or desirability.

These authors looked at 30,382 presentations unrelated to trauma activations, and there were clear winners and losers.  This figure of the shortest and longest 10 complaints is a fairly concise summary of findings:

door to eval times

Despite consistently longer self-assignment times for certain complaints, the absolute difference in minutes is still quite small.  Furthermore, there are always issues with relying on these time stamps, particularly for higher-acuity patients; the priority of “being at the patient’s bedside” always trumps such housekeeping measures.  I highly doubt ankle sprains and finger injuries are truly seen more quickly than overdoses and stroke symptoms.

Vaginal bleeding, on the other hand … is deservedly pulling up the rear.

“Cherry Picking Patients: Examining the Interval Between Patient Rooming and Resident Self-assignment”
http://www.ncbi.nlm.nih.gov/pubmed/26874338

Informatics Trek III: The Search For Sepsis

Big data!  It’s all the rage with tweens these days.  Hoverboards, Yik Yak, and predictive analytics are all kids talk about now.

This “big data” application, more specifically, involves the use of an institutional database to derive predictors for mortality in sepsis.  Many decision instruments for various sepsis syndromes already exist – CART, MEDS, mREMS, CURB-65, to name a few – but all suffer from the same flaw: how reliable can a rule with just a handful of predictors be when applied to the complex heterogeneity of humanity?

Machine-learning applications of predictive analytics attempt to create, essentially, Decision Instruments 2.0.  Rather than using linear statistical methods to simply weight a small handful of different predictors, most of these applications utilize the entire data set and some form of clustering.  Most generally, these models replace typical variable weighted scoring with, essentially, a weighted neighborhood scheme, in which similarity to other points helps predict outcomes.

Long story short, this study out of Yale utilized 5,278 visits for acute sepsis and a random forest model to create a training set and a validation set.  The random forest model included all available data points from the electronic health record, while other models used up to 20 predictors based on expert input and prior literature.  For their primary outcome of predicting in-hospital death, the AUC for the random forest model was 0.86 (CI 0.82-0.90), while none of the rest of the models exceeded an AUC of 0.76.

This still simply at the technology demonstration phase, and requires further development to become actionable clinical information.  However, I believe models and techniques like this are our next best paradigm in guiding diagnostic and treatment decisions for our heterogenous patient population.  Many challenges yet remain, particularly in the realm of data quality, but I am excited to see more teams engaged in development of similar tools.

“Prediction of In-hospital Mortality in Emergency Department Patients with Sepsis: A Local Big Data Driven, Machine Learning Approach”
http://www.ncbi.nlm.nih.gov/pubmed/26679719

Hi Ur Pt Has AKI For Totes

Do you enjoy receiving pop-up alerts from your electronic health record?  Have you instinctively memorized the fastest series of clicks to “Ignore”?  “Not Clinically Significant”?  “Benefit Outweighs Risk”?  “Not Sepsis”?

How would you like your EHR to call you at home with more of the same?

Acute kidney injury, to be certain, is associated with poorer outcomes in the hospital – mortality, dialysis-dependence, and other morbidities.  Therefore, it makes sense – if an automated monitoring system can easily detect changes and trends, why not alert clinicians to such changes, and nephrotoxic therapies could be avoided.

Interestingly – for both good and bad – the outcomes measured were patient-oriented, randomizing 2393 patients to either “usual care” or text message alerts for changes in serum creatinine.  The goal, overall, was detection of reductions in death, dialysis, or progressive AKI.  While patient-oriented outcomes are, after all, the most important outcomes in medicine – it’s only plausible to improve outcomes if clinicians improve care.  Therefore, measuring the most direct consequence of the intervention might be a better outcome – renal-protective changes in clinician behavior.

Because, unfortunately, despite sending text messages and e-mails directly to responsible clinicians and pharmacists – the only notable change in behavior between the “alert” group and “usual care group” was increased monitoring of serum creatinine.  Chart documentation of AKI, avoidance of intravenous contrast, avoidance of NSAIDs, and other renal-protective behaviors were unchanged, excepting a non-significant trend towards decreased aminoglycoside use.

No change in behavior, no change in outcomes.  Text messages and e-mails alerts!  Can shock collars be far behind?

“Automated, electronic alerts for acute kidney injury: a single-blind, parallel-group, randomised controlled trial”
http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(15)60266-5/fulltext