It’s Sepsis-Harassment!

The computer knows all in modern medicine. The electronic health record is the new Big Brother, all-seeing, never un-seeing. And it sees “sepsis” – a lot.

This is a report on the downstream effects of an electronic sepsis alert system at an academic medical center. Their sepsis alert system was based loosely on the systemic inflammatory response syndrome for the initial warning to nursing staff, followed by additional alerts triggered by hypotension or elevated lactate. These alerts prompted use of sepsis order sets or triggering of internal “sepsis alert” protocols. Their outcomes of interest in their analysis were length-of-stay and in-hospital mortality.

At first glance, the alert appears to be a success – length of stay dropped from 10.1 days to 8.6, and in-hospital mortality from 8.5% to 7.0%. It would have been quite simple to stop there and trumpet these results as favoring the alerts, but the additional analyses performed by these authors demonstrate otherwise. In the case of both length-of-stay and mortality, both of those measures were trending downward independently regardless of the intervention, and in their adjusted analyses, none of the improvements could be conclusively tied to the sepsis alerts – and some relating to diagnoses of less-severe cases of sepsis probably prompted by the alert itself.

What is not debatable, however, is the burden on clinicians and staff. During their ~2.5 year study period, the sepsis alerts were triggered 97,216 times – 14,207 of which in the 2,144 subsequently receiving a final diagnosis of sepsis. The SIRS-based alerts comprised most (83,385) of these alerts, but only captured 73% of those with an ultimate diagnosis of sepsis, while having only a 13% true positive rate. The authors’ conclusion gets it right:

Our results suggest that more sophisticated approaches to early identification of sepsis patients are needed to consistently improve patient outcomes.

“Impact of an emergency department electronic sepsis surveillance system on patient mortality and length of stay”
https://academic.oup.com/jamia/article-abstract/doi/10.1093/jamia/ocx072/4096536/Impact-of-an-emergency-department-electronic

Predicting Poor Outcomes After Syncope

Syncope is a classic good news/bad news presenting complaint. It can be highly distressing to patients and family members, but rarely does it relate to an acutely serious underlying cause. That’s the good news. The bad news, however, is that for those with the worst prognosis, most of the poor prognostic features are unmodifiable.

This is a prospective, observational study of patients presenting with syncope to Emergency Departments in Canada, with the stated goal of developing a risk model for poor outcomes after syncope. The composite outcome of interest was death, arrhythmia, or interventions to treat arrhythmias within 30 days of ED disposition. Follow-up was performed by structured telephone interview, networked hospital record review, and Coroner’s Office record search.

To achieve a lower bound of the 95% confidence interval for sensitivity of 96.4%, these authors targeted a sample size of 5,000 patients, and ultimately enrolled 5,010 with complete outcome assessments. The mean age was 53.4, had a low incidence of comorbid medical conditions, and only 9.5% were admitted to the hospital. Within 30 days, 22 had died, 15 from unknown causes and the others from the pool of 91 patients diagnosed with a “serious arrhythmia” – sinus node dysfunction, atrial fibrillation, AV block, ventricular arrhythmia, supraventricular tachycardia, or requiring a pacemaker insertion.

These authors ride the standard merry-go-round of statistical analysis, bootstrapping, and logistic regression to determine a prediction rule – the Canadian Syncope Arrhythmia Risk Score – an eight element additive and subtractive scoring system to stratify patients into one of eleven expected risk categories. They report the test characteristics of their proposed clinically useful threshold, greater than 0, to be a sensitivity of 97.1% and a specificity of 53.4% – a weak positive predictive value of 4.4% considering the low incidence of the composite outcome.

This is yet another product of obviously excellent work from the risk model machines in Canada, but, again, of uncertain clinical value. The elements of the risk model are frankly those that are quite obvious: elevated troponin and conduction delays on EKG, along with an absence of classic vasovagal features. These are patients whose cardiac function is obviously impaired, but short a time machine to go back and fix those hearts before they became sick, it’s a bit difficult to see the path forward. These authors feel their prediction rule aids in safe discharge of patients with syncope, although these patients are already infrequently admitted to the hospital in Canada. The various members of their composite outcome are not equally serious, preventable, or treatable, limiting the potential management options for even those falling into their high-risk group.

As with any decision instrument, its value remains uncertain until it is demonstrated the clinical decisions supplemented by this rule lead to better patient-oriented outcomes and/or resource utilization than our current management in this cohort.

“Predicting Short-Term Risk of Arrhythmia among Patients with Syncope: The Canadian Syncope Arrhythmia Risk Score”

https://www.ncbi.nlm.nih.gov/pubmed/28791782

Let’s Get Together and Ignore PERC

The “Pulmonary Embolism Rule-Out Criteria” does not, as it implies, “rule out” PE.  It does, however, generally carve out a cohort for whom objective testing may be obviated, with the implication the costs and harms from false-positives and from anticoagulation outweigh the morbidity from missed PE. It is fairly well popularized and incorporated into guidelines for PE – and, well, at the least, physicians in an academic center, on the cutting edge of medical knowledge and education, should be applying appropriately.

Or not.

This is a prospective study enrolling undifferentiated Emergency Department patients with chest pain and shortness of breath. Research staff approached patients with these general chief complaints and collected the baseline variables needed for PERC, Wells, and other baseline clinical and historical data.  They collected data on 3,204 patients, 17.5% of whom were PERC-negative. Of these, 25.5% underwent some testing for pulmonary embolism – inclusive of D-dimer, CTPA, or V/Q scanning. Then, two – 0.4% – PERC-negative patients were ultimately diagnosed with a PE. The authors also present comparative data for the PERC-positive population, with the expected higher-frequency of testing and diagnosis associated with the absence of low-risk features.

PERC is, of course, an imperfect tool, an unavoidable consequence of any decision instrument narrowing a complex clinical decision down to a handful of variables. But, at the least, patients meeting PERC ought nearly all fall into the bucket of “why were you really considering PE in the first place?”, with few exceptions. For nearly a quarter of these to start down the rabbit hole of testing for PE is low-value and harmful medical practice at a population level, regardless of the potential magnitude of individual benefit for those true positives ultimately identified.

AOr, more concisely, this is nuts.

“Pulmonary Embolism Testing among Emergency Department Patients who are Pulmonary Embolism Rule-out Criteria Negative”

http://onlinelibrary.wiley.com/doi/10.1111/acem.13270/full

Is The Road to Hell Paved With D-Dimers?

Ah, D-dimers, the exposed crosslink fragments resulting from the cleaving of fibrin mesh by plasmin. They predict everything – and nothing, with poor positive likelihood ratios for scads of pathologic diagnoses, and limited negative likelihood ratios for others.  Little wonder, then, routine D-dimer assays were part of the PESIT trial taking the diagnosis of syncope off the rails. Now, does the YEARS study threaten to make a similar kludge out of the diagnosis of pulmonary embolism?

On the surface, this looks like a promising study. We are certainly inefficient at the diagnosis of PE. Yield for CTPA in the U.S. is typically below 10%, and some of these diagnoses are likely insubstantial enough to be false positives. This study implements a standardized protocol for the evaluation of possible PE, termed the YEARS algorithm. All patients with possible PE are tested using D-dimer. Patients are also risk-stratified for pretest likelihood of PE by three elements: clinical signs of deep vein thrombosis, hemoptysis, or “pulmonary embolism the most likely diagnosis”. Patients with none of those “high risk” elements use a D-dimer cut-off of 1000 ng/mL to determine whether they proceed to CTPA or not. If a patient has one of more high-risk features, a traditional D-dimer cut-off of 500 ng/mL is used. Of note, this study was initiated prior to age-adjusted D-dimer becoming commonplace.

Without going into interminable detail regarding their results, their strategy works. Patients ruled out solely by the the D-dimer component of this algorithm had similar 3 month event rates to those ruled out following a negative CTPA. Their strategy, per their discussion, reduces the proportion managed without CTPA by 14% over a Wells’-based strategy (CTPA in 52% per-protocol, compared to 66% based on Wells’) – although less-so against Wells’ plus age-adjusted D-dimer. Final yield for PE per-protocol with YEARS was 29%, which is at the top end of the range for European cohorts and far superior, of course, to most U.S. practice.

There are a few traps here. Interestingly, physicians were not blinded to the D-dimer result when they assigned the YEARS risk-stratification items. Considering the subjectivity of the “most likely” component, foreknowledge of this result and subsequent testing assignment could easily influence the clinician’s risk assessment classification. The “most likely” component also has a great deal of inter-physician and general cultural variation that may effect the performance of this rule. The prevalence of PE in all patients considered for the diagnosis was 14% – a little lower than the average of most European populations considered for PE, but easily twice as high as those considered for possible PE in the U.S. It would be quite difficult to generalize any precise effect size from this study to such disparate settings. Finally, considering the D-dimer assay continuous likelihood ratios, we know the +LR for a test result of 1000 ± ~500 is probably around 1. This suggests using a cut-off of 1000 may hinge a fair bit of management on a test result representing zero informational value.

This ultimately seems as though the algorithm might have grown out of a need to solve a problem of their own creation – too many potentially actionable D-dimer results being produced from an indiscriminate triage-ordering practice. I remain a little wary the effect of poisoning clinical judgment with the D-dimer result, and expect it confounds the overall generalizability of this study. As robust as this trial was, I would still recommend waiting for additional prospective validation prior to adoption.

“Simplified diagnostic management of suspected pulmonary embolism (the YEARS study): a prospective, multicentre, cohort study”
http://thelancet.com/journals/lancet/article/PIIS0140-6736(17)30885-1/fulltext

PECARN, CATCH, CHALICE … or None of the Above?

The decision instrument used to determine the need for neuroimaging in minor head trauma essentially a question of location. If you’re in the U.S., the guidelines feature PECARN. In Canada, CATCH. In the U.K., CHALICE. But, there’s a whole big world out there – what ought they use?

This is a prospective observational study from two countries out in that big remainder of the world – Australia and New Zealand. Over approximately 3.5 years, these authors enrolled patients with non-trivial mild head injuries (GCS 13-15) and tabulated various rule criteria and outcomes. Each rule has slightly different entry criteria and purpose, but over the course of the study, 20,317 patients were gathered for their comparative analysis.

And, the winner … is Australian and New Zealand general practice. Of these 20,000 patients included, only 2,106 (10%) underwent CT. It is hard to read between the lines and determine how many of the injuries included in this analysis were missed on the initial presentation, but if rate of neuroimaging is the simplest criteria for winning, there’s no competition. Applying CHALICE to their analysis cohort would have increased their CT rate to approximately 22%, and CATCH would raise the rate to 30.2%. Application of PECARN would place 46% of the cohort into CT vs. observation – an uncertain range, but certainly higher than 10%.

Regardless, in their stated comparison, the true winner depends on the value-weighting of sensitivity and resource utilization. PECARN approached 100% or 99% sensitivity, missing only 1 patient with clinically important traumatic brain injury out of ~10,000. Contrawise, CATCH and CHALICE missed 13 and 12 out of ~13,000 and ~14,000, respectively. Most of these did not undergo neurosurgical intervention, but a couple missed by CHALICE and CATCH would. However, as noted above, PECARN is probably substantially less specific than both CATCH and CHALICE, which has relatively profound effect on utilization for a low-frequency outcome.

Ultimately, however, any of these decision instruments is usable – as a supplement to your clinical reasoning. Each of these rules simplifies a complex decision into one less so, with all its inherent weaknesses. Fewer than 1% of children with mild head injury need neurosurgical intervention and these are certainly rarely missed by any typical practice. In settings with high CT utilization rates, any one of these instruments will likely prove beneficial. In Australia and New Zealand – as well as many other places around the world – potentially not so much.  This is probably a fine example of the need to compare decision instruments to clinician gestalt.

“Accuracy of PECARN, CATCH, and CHALICE head injury decision rules in children: a prospective cohort study”

http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(17)30555-X/abstract

D-Dimer, It’s Not Just a Cut-Off

It’s certainly simpler to have a world where everything is black or white, right or wrong, positive or negative. Once upon a time, positive cardiac biomarkers meant acute coronary syndrome – now we have more information and shades of grey in between. The D-dimer, bless its heart, is probably like that, too.

This is a simple study that pooled patients from five pulmonary embolism studies to evaluate the diagnostic performance characteristics of the D-dimer assay. Conventional usage is simply to deploy the test as a dichotomous rule-out – a value below our set sensitivity threshold obviates further testing, while above consigns us to the bitter radiologic conclusion. These authors, perhaps anticipating a more sophisticated diagnostic strategy, go about trying to calculate interval likelihood ratios for the test.

Using over 6,000 patients as their substrate for analysis, these authors determine the various likelihood ratios for D-dimer levels between 250 ng/mL and greater than 5,000 ng/mL, and identify intervals of gradually increasing width, starting at 250 and building up to 2,500. Based on logistic regression modeling, the fitted and approximate iLR range from 0.0625 for those with D-dimer less 250 ng/mL and increasing to 8 for levels greater than 5,000. Interestingly, a D-dimer between 1,000 and 1,499 had an iLR of roughly 1 – meaning those values basically have no effect on the post-test likelihood of PE.

The general implication of these data would be to inform more precise accounting of the risk for PE involving the decision to proceed to CTPA. That said, with our generally inexact tools for otherwise estimating pretest likelihood of disease (Wells, Geneva, gestalt), these data are probably not quite ready for clinical use. I expect further research to develop more sophisticated individual risk prediction models, for which these likelihood ratios may be of value.

“D-Dimer Interval Likelihood Ratios for Pulmonary Embolism”
https://www.ncbi.nlm.nih.gov/pubmed/28370759

The Failing Ottawa Heart

Canada! So many rules! The true north strong and free, indeed.

This latest innovation is the Ottawa Heart Failure Risk Scale – which, if you treat it explicitly as titled, is accurate and clinically interesting. However, it also masquerades as a decision rule – upon which it is of lesser standing.

This is a prospective observational derivation of a risk score for “serious adverse events” in an ED population diagnosed with acute heart failure and potential candidates for discharge. Of these 1,100 patients, 170 (15.5%) suffered an SAE – death, myocardial infarction, hospitalization. They used the differences between the groups with and without SAEs to derive a predictive risk score, the elements of which are:

• History of stroke or TIA (1)
• History of intubation for respiratory distress (2)
• Heart rate on ED arrival ≥110 (2)
• Room are SaO2 <90% on EMS or ED arrival (1)
• ECG with acute ischemic changes (2)
• Urea ≥12 mmol/L (1)

This scoring system ultimately provided a prognostic range from 2.8% for a score of zero, up to 89.0% at the top of the scale. This information is – at least within the bounds of generalizability from their study population – interesting from an informational standpoint. However, they then take it to the next level and use this as a potential decision instrument for admission versus discharge – projecting a score ≥2 would decrease admission rates while still maintaining a similar sensitivity for SAEs.

However, the foundational flaw here is the presumption admission is protective against SAEs – both here in this study and in our usual practice. Without a true, prospective validation, we have no evidence this change in and its potential decrease in admissions improves any of many potential outcome measures. Many of their SAEs may not be preventable, nor would the protections from admission be likely durable out to the end of their 14-day follow-up period. Patients were also managed for up to 12 hours in their Emergency Department before disposition, a difficult prospect for many EDs.

Finally, regardless, the complexity of care management and illness trajectory for heart failure is not a terribly ideal candidate for simplification into a dichotomous rule with just a handful of criteria. There were many univariate differences between the two groups – and that’s simply on the variables they chose to collect The decision to admit a patient for heart failure is not appropriately distilled into a “rule” – but this prognostic information may yet be of some value.

“Prospective and Explicit Clinical Validation of the Ottawa Heart Failure Risk Scale, With and Without Use of Quantitative NT-proBNP”

http://onlinelibrary.wiley.com/doi/10.1111/acem.13141/abstract

Outsourcing the Brain Unnecessarily

Clinical decision instruments are all the rage, especially when incorporated into the electronic health record – why let the fallible clinician’s electrical Jello make life-or-death decisions when the untiring, unbiased digital concierge can be similarly equipped? Think about your next shift, and how frequently you consciously or unconsciously use or cite a decision instrument in your practice – HEART, NEXUS, PERC, Well’s, PECARN, the list is endless.

We spend a great deal of time deriving, validating, and comparing decision instruments – think HEART vs. TIMI vs. GRACE – but, as this article points out, very little time actually examining their performance compared to clinician judgment.

These authors reviewed all publications in Annals of Emergency Medicine concerned with the performance characteristics of a decision instrument. They identified 171 articles to this effect, 131 of which performed a prospective evaluation. Of these, the authors were able to find only 15 which actually bothered to compare the performance of the objective rule with unstructured physician assessment. With a little extra digging, these authors then identified 6 additional studies evaluating physician assessment in other journals relevant to their original 171.

Then, of these 21 articles, two favored the decision instrument: a 2003 assessment of the Canadian C-Spine Rule, and a 2002 neural network for chest pain. In the remainder, the comparison either favored clinician judgment or was a “toss up” in the sense the performance characteristics were similar and the winner depended on a value-weighting of sensitivity or specificity.

This should not discourage the derivation and evaluation of further decision instruments, as yes, the conscious and unconscious biases of human beings are valid concerns.  Neither should it be construed from these data that many common decision instruments are of lesser value than our current usage places in them, only that they have not yet been tested adequately. However, many of these simple models are simply that – and the complexity of many clinical questions will at least favor the more information-rich approach of practicing clinicians.

“Structured Clinical Decision Aids Are Seldom Compared With Subjective Physician Judgment, and are Seldom Superior”
http://www.annemergmed.com/article/S0196-0644(16)31520-7/fulltext

Ottawa, the Land of Rules

I’ve been to Canada, but I’ve never been to Ottawa. I suppose, as the capital of Canada, it makes sense they’d be enamored with rules and rule-making. Regardless, it still seems they have a disproportionate burden of rules, for better or worse.

This latest publication describes the “Ottawa Chest Pain Cardiac Monitoring Rule”, which aims to diminish resource utilization in the setting of chest pain in the Emergency Department. These authors posit the majority of chest pain patients presenting to the ED are placed on cardiac monitoring in the interests of detecting a life-threatening malignant arrhythmia, despite such being a rare occurrence. Furthermore, the literature regarding alert fatigue demonstrates greater than 99% of monitor alarms are erroneous and typically ignored.

Using a 796 patients sample of chest pain patients receiving cardiac monitoring, these authors validate their previously described rule for avoiding cardiac monitoring: chest pain free and normal or non-specific ECG changes. In this sample, 284 patients met these criteria, and none of them suffered an arrhythmia requiring intervention.

While this represents 100% sensitivity for their rule, as a resource utilization intervention, there is obviously room for improvement. Of patients not meeting their rule, only 2.9% of this remainder suffered an arrhythmia – mostly just atrial fibrillation requiring pharmacologic rate or rhythm control. These criteria probably ought be considered just a minimum standard, and there is plenty of room for additional exclusion.

Anecdotally, not only do most of our chest pain patients in my practice not receive monitoring – many receive their entire work-up in the waiting room!

“Prospective validation of a clinical decision rule to identify patients presenting to the emergency department with chest pain who can safely be removed from cardiac monitoring”
http://www.cmaj.ca/content/189/4/E139.full

The Chest Pain Decision Instrument Trial

This is a bit of an odd trial. Ostensibly, this is a trial about the evaluation and disposition of low-risk chest pain presenting to the Emergency Department. The authors frame their discussion section by describing their combination of objective risk-stratification and shared decision-making in terms of reducing admission for observation and testing at the index visit.

But, that’s not technically what this trial was about. Technically, this was a trial about patient comprehension – the primary outcome is actually the number of questions correctly answered by patients on an immediate post-visit survey. The dual nature of their trial is evident in their power calculation, which starts with: “We estimated that 884 patients would provide 99% power to detect a 16% difference in patient knowledge between decision aid and usual care arms”, which is an unusual choice of beta and threshold for effect size – basically one additional question correct on their eight-question survey. The rest of their power calculation, however, makes sense “… and 90% power to detect a 10% difference in the proportion of patients admitted to an observation unit for cardiac testing.” It appears the trial was not conducted to test their primary outcome selected by their patient advocates designing the trial, but in actuality to test the secondary outcomes thought important to the clinicians.

So, it is a little hard to interpret their favorable result with respect to the primary outcome – 3.6 vs 4.2 questions answered correctly. After clinicians spent an extra 1.3 minutes (4.4 vs 3.1) with patients showing them a visual aid specific to their condition, I am not surprised patients had better comprehension of their treatment options – and they probably did not require a multi-center trial to prove this.

Then, the crossover between resource utilization and shared decision-making seems potentially troublesome. An idealized version of shared decision-making allows patients to participate in their treatment when there is substantial individual variation between the perceived value of different risks, benefits, and alternatives. However, I am not certain these patients are being invited to share in a decision between choices of equal value – and the authors seem to express this through their presentation of the results.

These are all patients without known coronary disease, normal EKGs, a negative initial cardiac troponin, and considered by treating clinicians to otherwise fall into a “low risk” population. This is a population matching the cohort of interest from Weinstock’s study of patients hospitalized for observation from the Emergency Department, 7,266 patients of whom none independently suffered a cardiac event while hospitalized.  A trial in British Columbia deferred admission for a cohort of patients in favor of outpatient stress tests.  By placing a fair bit of emphasis on their significant secondary finding of a reduction in observation admission from 52% to 37%, the authors seems to indicate their underlying bias is consistent with the evidence demonstrating the safety of outpatient disposition in this cohort.  In short, it seems to me the authors are not using their decision aid to help patients choose between equally valued clinical pathways, but rather to try and convince more patients to choose to be discharged.

In a sense, it represents offering patients a menu of options where overtreatment is one of them.  If a dyspneic patient meets PERC, we don’t offer them a visual aid where a CTPA is an option – and that shouldn’t be our expectation here, either.  These authors have put in tremendous effort over many years to integrate many important tools, but it feels like the end result is a demonstration of a shared decision-making instrument intended to nudge patients into choosing the disposition we think they ought, but are somehow afraid to outright tell them.

“Shared decision making in patients with low risk chest pain: prospective randomized pragmatic trial”
http://www.bmj.com/content/355/bmj.i6165.short