The Precogs Take On Sepsis

It seems like every week there’s another publicized instance of our impending replacement by artificial intelligence. Big Data, they say, is going to free us of the cognitive burdens of complex thought while maximizing healthcare outcomes. This latest entry is the “AI Clinician”, which has been created as a demonstration for the treatment of sepsis.  Or, rather more narrowly, the AI Clinician tries to prescribe the balance of fluids and vasopressors.

In this predictive feat of strength, decision models were created based on retrospective data sets comprised of tens of thousands of patients meeting Sepsis-3 criteria. Each patient’s clinical trajectory was described by their receipt of intravenous fluids or vasopressors in four-hour blocks, and the ultimate outcome of 90-day survival designated as the reward or penalty for their model. It’s rather beyond the scope of my statistical expertise to precisely describe their value comparison between the AI and clinicians, but suffice to say their results favor their models.

We are rather far from this sort of software being validated as a management adjunct in sepsis, but what’s most interesting is their incidental description of how deviations from their model affected mortality. Effectively by definition, of course, they find patients receiving IV fluids or vasopressors in doses most similar to the AI model had the lowest mortality. Greater variance from these optimal doses tended to increase mortality – most prominently excesses of IV fluids, rather than restrictive IV fluids. Vasopressors, on the other hand, showed a more symmetric distribution of poor outcomes with deviation from the optimal model:

The implication here mostly ties into the oft-repeated concern that high-volume fluid resuscitation is not necessarily the magic bullet in sepsis, and there is likely a point at which returns diminish, or turn harmful. This is virtually the exact hypothesis addressed by the CLOVERS trial. It will be quite interesting to see if these model findings are validated by the trial.

“The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care”
https://www.nature.com/articles/s41591-018-0213-5

Who Wants to Be Famous!

And by fame, I mean Twitter fame.

Based on SCIENCE!

This is a retrospective study of you, all you, you lab rats you, spinning your wheels about on the Twitter. This lovely study took four years of #FOAMed, chopped up all the tweets into little pieces, and ran the pieces through R. Specifically, they chopped up the tweets of a cluster of 238 heavily-retweeted tweeters (twits?), and analyzed their various attributes and flavors to determine those with the greatest likelihood of being retweeted.

And, the #1 determinant of whether a tweet would be retweeted and make you famous …

… was to be famous already, specifically, a tweet from one of 21 accounts with large (mostly >15,000) amounts of Twitter followers.

For the rest of us on the fringe of fame or worse, here were predictors of high retweet volume:

  • Tweets on resuscitation, trauma, neurology, infectious disease, pulmonary topics, and ultrasound.
  • Tweets with images, advertisements, or research critiques.
  • Avoidance of mental health topics, blog links, or “questions”.

The appendices are full of entertaining nuggets, including the top tweets of the study period – dominated by @EM_RESUS. Splendid work!

“Trends and Predictors of Retweets in Free Open Access Medical Education (#FOAMed) on Twitter (2013-7)”
https://www.ncbi.nlm.nih.gov/pubmed/30343518

Don’t Rely on the EHR to Think For You

“The Wells and revised Geneva scores can be approximated with high accuracy through the automated extraction of structured EHR data elements in patients who underwent CTPA in the emergency department.”

Can it be done? Can the computer automatically discern your intent and extract pulmonary embolism risk-stratification from the structured data? And, with “high accuracy” as these authors tout in their conclusion?

IFF:  “High accuracy” means ~90%. That means one out of every ten in their sample was misclassified as low- or high-risk for PE. This is clinically useless.

The Wells classification, of course, depends highly upon the 3 points assigned for “PE is most likely diagnosis.” So, these authors assigned 3 points positive for every case.  This sort of probably works in a population that was selected explicitly because they underwent CTPA in the ED, but is obviously a foundationally broken kludge.  Revised Geneva does not have a “gestalt” element, but there are still subjective examination features that may not make it into structured data – and, obviously, it performed just as well (poorly) as the Wells tool.

To put it mildly, these authors are overselling their work a little bit. The electronic health record will always depend on the data entered – and it’s setting itself up for failure if it depends on specific elements entered by the clinician contemporaneously during the evaluation. Tools such as these have promise – but perhaps not this specific application.

“Automated Pulmonary Embolism Risk Classification and Guideline Adherence for Computed Tomography Pulmonary Angiography Ordering”
https://onlinelibrary.wiley.com/doi/abs/10.1111/acem.13442

All Sepsis Is Not the Same

This is a fairly dense informatics evaluation of sepsis, but it boils down to a general hypothesis with some face validity: all sepsis is not the same! This is abundantly obvious from the various clinical manifestations of response to infection, with a spectrum ranging from Group A Streptococcal pharyngitis to gram-negative bacteremia and distributive shock.

This analysis uses genetic expression sampling from whole blood to perform unsupervised machine learning analyses and clustering, and they identify three subtypes the authors term “Inflammopathic, Adaptive, and Coagulopathic”. Whether these are terribly illustrative of the underlying pathology is unclear, but, if you want to be in one of these clusters, you want to be in “Adaptive” with its 8.1% mortality – compared to 29.8% in Inflammopathic and 25.4% in Coagulopathic.

Validity of this specific analysis aside, it’s an interesting example of what may ultimately be a useful approach to treating sepsis – targeting the specific underlying genetic expressions associated with dysregulated immune response or underlying end-organ dysfunction. The best thing about this paper, however, are the acronyms reported for some of the statistical methods: “COmbined Mapping of Multiple clUsteriNg ALgorithms” or COMMUNUAL, and “COmbat CO-Normalization Using conTrols” or COCONUT.

“Unsupervised Analysis of Transcriptomics in Bacterial Sepsis Across Multiple Datasets Reveals Three Robust Clusters”
https://www.ncbi.nlm.nih.gov/pubmed/29537985

It’s Sepsis-Harassment!

The computer knows all in modern medicine. The electronic health record is the new Big Brother, all-seeing, never un-seeing. And it sees “sepsis” – a lot.

This is a report on the downstream effects of an electronic sepsis alert system at an academic medical center. Their sepsis alert system was based loosely on the systemic inflammatory response syndrome for the initial warning to nursing staff, followed by additional alerts triggered by hypotension or elevated lactate. These alerts prompted use of sepsis order sets or triggering of internal “sepsis alert” protocols. Their outcomes of interest in their analysis were length-of-stay and in-hospital mortality.

At first glance, the alert appears to be a success – length of stay dropped from 10.1 days to 8.6, and in-hospital mortality from 8.5% to 7.0%. It would have been quite simple to stop there and trumpet these results as favoring the alerts, but the additional analyses performed by these authors demonstrate otherwise. In the case of both length-of-stay and mortality, both of those measures were trending downward independently regardless of the intervention, and in their adjusted analyses, none of the improvements could be conclusively tied to the sepsis alerts – and some relating to diagnoses of less-severe cases of sepsis probably prompted by the alert itself.

What is not debatable, however, is the burden on clinicians and staff. During their ~2.5 year study period, the sepsis alerts were triggered 97,216 times – 14,207 of which in the 2,144 subsequently receiving a final diagnosis of sepsis. The SIRS-based alerts comprised most (83,385) of these alerts, but only captured 73% of those with an ultimate diagnosis of sepsis, while having only a 13% true positive rate. The authors’ conclusion gets it right:

Our results suggest that more sophisticated approaches to early identification of sepsis patients are needed to consistently improve patient outcomes.

“Impact of an emergency department electronic sepsis surveillance system on patient mortality and length of stay”
https://academic.oup.com/jamia/article-abstract/doi/10.1093/jamia/ocx072/4096536/Impact-of-an-emergency-department-electronic

Even the Best EHR Still Causes Pain

Paper is gone; there’s no going back. We’re all on electronic health record systems (cough, Epic) now, with all the corresponding frustrations and inefficiencies. Some have said, however, the blame lay not with the computer – but with the corporate giant whose leviathan was not designed to meet the needs of physicians in the Emergency Department, but rather support the larger hospital and primary-care enterprise. Unfortunately, as we see here, even a “custom” design doesn’t solve all the issues.

These authors report on their experience with their own homegrown system, eDoc, designed to replace their paper system, and built using feedback from health technology experts and their own emergency medicine clinicians. Their hypothesis, in this case, was that throughput would be maintained – ED length-of-stay as a proxy for operational efficiency. The interrupted time series analyses performed before-and-after the transition are rather messy, with various approaches and adjustments, including “coarsened exact matching”, but the outcome is consistent across all their models: the computer made things worse. The estimated difference per patient is small: about 6 additional minutes, but, as the authors note, in a mid-size ED handling about 165 patients a day, this adds 16 hours of additional boarding time – or the effect of shrinking your ED in size by up to 2/3rds of a room.

It is probably erroneous to simply blame “computers” as the culprit for our woes. Rather, it is the computer-as-vehicle for other onerous documentation requirements and regulatory flaming hoops. If the core function of the EHR were solely to meet the information and workflow needs of physicians, rather than the entire Christmas buffet of modern administrative and billing workflow, it is reasonable to expect a moderation in the level of suffering.

But, I think that ship has sailed.

“A Custom-Developed Emergency Department Provider Electronic Documentation System Reduces Operational Efficiency”

https://www.ncbi.nlm.nih.gov/pubmed/28712608

What Does a Sepsis Alert Gain You?

The Electronic Health Record is no longer simply that – a recording of events and clinical documentation.  Decision-support has, for good or ill, morphed it into a digital nanny vehicle for all manner of burdensome nagging.  Many systems have implemented a “sepsis alert”, typically based off vital signs collected at initial assessment. The very reasonable goal is early detection of sepsis, and early initiation of appropriately directed therapy. The downside, unfortunately, is such alerts are rarely true positives for severe sepsis in broadest sense – alerts far outnumber the instances in a change of clinical practice results in a change in outcome.

So, what to make of this:

This study describes a before-and-after performance of a quality improvement intervention to reduce missed diagnoses of sepsis, part of which was introduction of a triage-based EHR alert. These alerts fired during initial assessment based on abnormal vital signs and the presence of high-risk features. The article describes baseline characteristics for a pre-intervention phase of 86,037 Emergency Department visits, and then a post-intervention phase of 96,472 visits. During the post-intervention phase, there were 1,112 electronic sepsis alerts, 265 of which resulted in initiation of sepsis protocol after attending physician consultation.  The authors, generally, report fewer missed or delayed diagnoses during the post-intervention period.

But, the evidence underpinning conclusions from these data – as relating to improvements in clinical care or outcomes, or even the magnitude of process improvement highlighted in the tweet above – is fraught. The alert here is reported as having a sensitivity of 86.2%, and routine clinical practice picked up nearly all of the remaining cases that were alert negative.  The combined sensitivity is reported to be 99.4%.  Then, the specificity appears to be excellent, at 99.1% – but, for such an infrequent diagnosis, even using their most generous classification for true positives, the false alerts outnumbered the true alerts nearly 3 to 1.

And, that classification scheme is the crux of determining the value of this approach. The primary outcome was defined as either treatment on the ED sepsis protocol or pediatric ICU care for sepsis. Clearly, part of the primary outcome is directly contaminated by the intervention – an alert encouraging use of a protocol will increase initiation, regardless of appropriateness. This will not impact sensitivity, but will effectively increase specificity and directly inflate PPV.

This led, importantly, for the authors to include a sensitivity analysis looking at their primary outcome. This analysis looks at the differences in overall performance if stricter rules for a primary outcome might be entertained. These analyses evaluate the predictive value of the protocol if true positives are restricted to those eventually requiring vasoactive agents or pediatric ICU care – and, unsurprisingly, even this small decline in specificity results in dramatic drops in PPV – down to 2.4% for the alert alone.

This number better matches the face validity we’re most familiar with for these simplistic alerts – the vast majority triggered have no chance of impacting clinical care and improving outcomes. It should further be recognized the effect size of early recognition and intervention for sepsis is real, but quite small – and becomes even smaller when the definition broadens to cases of lower severity. With nearly 100,000 ED visits in both the pre-intervention and post-intervention periods, there is no detectable effect on ICU admission or mortality. Finally, the authors focus on their “hit rate” of 1:4 in their discussion – but, I think it is more likely the number of alerts fired for each each case of reduced morbidity or mortality is on the order of hundreds, or possibly thousands.

Ultimately, the reported and publicized magnitude of the improvement in clinical practice likely represents more smoke and mirrors than objective improvements in patient outcomes, and in the zero-sum game of ED time and resources, these sorts of alerts and protocols may represent important subtractions from the care of other patients.

“Improving Recognition of Pediatric Severe Sepsis in the Emergency Department: Contributions of a Vital Sign–Based Electronic Alert and Bedside Clinician Identification”

http://www.annemergmed.com/article/S0196-0644(17)30315-3/abstract

You’ve Got (Troponin) Mail

It’s tragic, of course, no one in this generation will understand the epiphany of logging on to America Online and being greeted by its almost synonymous greeting “You’ve got mail!” But, we and future generations may bear witness to the advent of something almost as profoundly uplifting: text-message troponin results.

These authors conceived and describe a fairly simple intervention in which test results – in this case, troponin – were pushed to clinicians’ phones as text messages. In a pilot and cluster-randomized trial with 1,105 patients for final analysis, these authors find the median interval from troponin result to disposition decision was 94 minutes in a control group, as compared with 68 minutes in the intervention cohort. However, a smaller difference in median overall length of stay did not reach statistical significance.

Now, I like this idea – even though this is clearly not the study showing generalizable definitive benefit. For many patient encounters, there is some readily identifiable bottleneck result of greatest importance for disposition. If a reasonable, curated list of these results are pushed to a mobile device, there is an obvious time savings with regard to manually pulling these results from the electronic health record.

In this study, however, the median LOS for these patients was over five hours – and their median LOS for all patients receiving at least one troponin was nearly 7.5 hours. The relative effect size, then, is really quite small. Next, there are always concerns relating to interruptions and unintended consequences on cognitive burden. Finally, it logically follows if this text message derives some of its beneficial effect by altering task priorities, then some other process in the Emergency Department is having its completion time increased.

I expect, if implemented in a typically efficient ED, the net result of any improvement might only be a few minutes saved across all encounter types – but multiplied across thousands of patient visits for chest pain, it’s still worth considering.

“Push-Alert Notification of Troponin Results to Physician Smartphones Reduces the Time to Discharge Emergency Department Patients: A Randomized Controlled Trial”
http://www.annemergmed.com/article/S0196-0644(17)30317-7/abstract

Correct, Endovascular Therapy Does Not Benefit All Patients

Unfortunately, that headline is the strongest takeaway available from these data.

Currently, endovascular therapy for stroke is recommended for all patients with a proximal arterial occlusion and can be treated within six hours. The much-ballyhooed “number needed to treat” for benefit is approximately five, and we have authors generating nonsensical literature with titles such as “Endovascular therapy for ischemic stroke: Save a minute—save a week” based on statistical calisthenics from this treatment effect.

But, anyone actually responsible for making decisions for these patients understands this is an average treatment effect. The profound improvements of a handful of patients with the most favorable treatment profiles obfuscate the limited benefit derived by the majority of those potentially eligible.

These authors have endeavored to apply a bit of precision medicine to the decision regarding endovascular intervention. Using ordinal logistic regression modeling, these authors used the MR CLEAN data to create a predictive model for good outcome (mRS score 0-2 at 90 days). These authors subsequently used the IMS-III data as their validation cohort. The final model displayed a C-statistic of 0.69 for the ordinal model and 0.73 for good functional outcome – which is to say, the output is closer to a coin flip than a informative prediction for use in clinical practice.

More importantly, however, is whether the substrate for the model is anachronistic, limiting its generalizability to modern practice. Beyond MR CLEAN, subsequent trials have demonstrated the importance of underlying tissue viability using either CT perfusion or MRI-based selection criteria when making treatment decisions. Their model is limited in its inclusion of just a measure of collateral circulation on angiogram, which is only a surrogate for potential tissue viability. Furthermore, the MR CLEAN cohort is comprised of only 500 patients, and the IMS-III validation only 260. This sample is far too small to properly develop a model for such a heterogenous set of patients as those presenting with proximal cerebrovascular occlusion. Finally, the choice of logistic regression can be debated, simply from a model standpoint, given its assumptions about underlying linear relationships in the data.

I appreciate the attempt to improve outcomes prediction for individual patients, particularly for a resource-intensive therapy such as endovascular intervention in stroke. Unfortunately, I feel the fundamental limitations of their model invalidate its clinical utility.

“Selection of patients for intra-arterial treatment for acute ischaemic stroke: development and validation of a clinical decision tool in two randomised trials”
http://www.bmj.com/content/357/bmj.j1710

No Change in Ordering Despite Cost Information

Everyone hates the nanny state. When the electronic health record alerts and interrupts clinicians incessantly with decision-“support”, it results in all manner of deleterious unintended consequences. Passive, contextual decision-support has the advantage of avoiding this intrusiveness – but is it effective?

It probably depends on the application, but in this trial, it was not. This is the PRICE (Pragmatic Randomized Introduction of Cost data through the Electronic health record) trial, in which 75 inpatient laboratory tests were randomized to display of usual ordering, or ordering with contextual Medicare cost information. The hope and study hypothesis was the availability of this financial interest would exert a cultural pressure of sorts on clinicians to order fewer tests, particularly those with high costs.

Across three Philadelphia-area hospitals comprising 142,921 hospital admissions in a two-year study period, there were no meaningful differences in lab tests ordered per patient day in the intervention or the control. Looking at various subgroups of patients, it is also unlikely there were particularly advantageous effects in any specific population.

Interestingly, one piece of feedback the authors report is the residents suggest most of their routine lab test ordering resulted from admission order sets. “Routine” daily labs are set in motion at the time of admission, not part of a daily assessment of need, and thus a natural impediment to improving low-value testing. However, the authors also note – and this is probably most accurate – because the cost information was displayed ubiquitously, physicians likely became numb to the intervention. It is reasonable to expect substantially more selective cost information could have focused effects on an adea of particularly high cost or low-value.

“Effect of a Price Transparency Intervention in the Electronic Health Record on Clinician Ordering of Inpatient Laboratory Tests”

http://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2619519