Ottawa, the Land of Rules

I’ve been to Canada, but I’ve never been to Ottawa. I suppose, as the capital of Canada, it makes sense they’d be enamored with rules and rule-making. Regardless, it still seems they have a disproportionate burden of rules, for better or worse.

This latest publication describes the “Ottawa Chest Pain Cardiac Monitoring Rule”, which aims to diminish resource utilization in the setting of chest pain in the Emergency Department. These authors posit the majority of chest pain patients presenting to the ED are placed on cardiac monitoring in the interests of detecting a life-threatening malignant arrhythmia, despite such being a rare occurrence. Furthermore, the literature regarding alert fatigue demonstrates greater than 99% of monitor alarms are erroneous and typically ignored.

Using a 796 patients sample of chest pain patients receiving cardiac monitoring, these authors validate their previously described rule for avoiding cardiac monitoring: chest pain free and normal or non-specific ECG changes. In this sample, 284 patients met these criteria, and none of them suffered an arrhythmia requiring intervention.

While this represents 100% sensitivity for their rule, as a resource utilization intervention, there is obviously room for improvement. Of patients not meeting their rule, only 2.9% of this remainder suffered an arrhythmia – mostly just atrial fibrillation requiring pharmacologic rate or rhythm control. These criteria probably ought be considered just a minimum standard, and there is plenty of room for additional exclusion.

Anecdotally, not only do most of our chest pain patients in my practice not receive monitoring – many receive their entire work-up in the waiting room!

“Prospective validation of a clinical decision rule to identify patients presenting to the emergency department with chest pain who can safely be removed from cardiac monitoring”
http://www.cmaj.ca/content/189/4/E139.full

The Chest Pain Decision Instrument Trial

This is a bit of an odd trial. Ostensibly, this is a trial about the evaluation and disposition of low-risk chest pain presenting to the Emergency Department. The authors frame their discussion section by describing their combination of objective risk-stratification and shared decision-making in terms of reducing admission for observation and testing at the index visit.

But, that’s not technically what this trial was about. Technically, this was a trial about patient comprehension – the primary outcome is actually the number of questions correctly answered by patients on an immediate post-visit survey. The dual nature of their trial is evident in their power calculation, which starts with: “We estimated that 884 patients would provide 99% power to detect a 16% difference in patient knowledge between decision aid and usual care arms”, which is an unusual choice of beta and threshold for effect size – basically one additional question correct on their eight-question survey. The rest of their power calculation, however, makes sense “… and 90% power to detect a 10% difference in the proportion of patients admitted to an observation unit for cardiac testing.” It appears the trial was not conducted to test their primary outcome selected by their patient advocates designing the trial, but in actuality to test the secondary outcomes thought important to the clinicians.

So, it is a little hard to interpret their favorable result with respect to the primary outcome – 3.6 vs 4.2 questions answered correctly. After clinicians spent an extra 1.3 minutes (4.4 vs 3.1) with patients showing them a visual aid specific to their condition, I am not surprised patients had better comprehension of their treatment options – and they probably did not require a multi-center trial to prove this.

Then, the crossover between resource utilization and shared decision-making seems potentially troublesome. An idealized version of shared decision-making allows patients to participate in their treatment when there is substantial individual variation between the perceived value of different risks, benefits, and alternatives. However, I am not certain these patients are being invited to share in a decision between choices of equal value – and the authors seem to express this through their presentation of the results.

These are all patients without known coronary disease, normal EKGs, a negative initial cardiac troponin, and considered by treating clinicians to otherwise fall into a “low risk” population. This is a population matching the cohort of interest from Weinstock’s study of patients hospitalized for observation from the Emergency Department, 7,266 patients of whom none independently suffered a cardiac event while hospitalized.  A trial in British Columbia deferred admission for a cohort of patients in favor of outpatient stress tests.  By placing a fair bit of emphasis on their significant secondary finding of a reduction in observation admission from 52% to 37%, the authors seems to indicate their underlying bias is consistent with the evidence demonstrating the safety of outpatient disposition in this cohort.  In short, it seems to me the authors are not using their decision aid to help patients choose between equally valued clinical pathways, but rather to try and convince more patients to choose to be discharged.

In a sense, it represents offering patients a menu of options where overtreatment is one of them.  If a dyspneic patient meets PERC, we don’t offer them a visual aid where a CTPA is an option – and that shouldn’t be our expectation here, either.  These authors have put in tremendous effort over many years to integrate many important tools, but it feels like the end result is a demonstration of a shared decision-making instrument intended to nudge patients into choosing the disposition we think they ought, but are somehow afraid to outright tell them.

“Shared decision making in patients with low risk chest pain: prospective randomized pragmatic trial”
http://www.bmj.com/content/355/bmj.i6165.short

The Machine Can Learn

A couple weeks ago I covered computerized diagnosis via symptom checkers, noting their imperfect accuracy – and grossly underperforming crowd-sourced physician knowledge. However, one area that continues to progress is the use of machine learning for outcomes prediction.

This paper describes advances in the use of “big data” for prediction of 30-day and 180-day readmissions for heart failure. The authors used an existing data set from the Telemonitoring to Improve Heart Failure Outcomes trial as substrate, and then applied several unsupervised statistical models to the data with varying inputs.

There were 236 variables available in the data set for use in prediction, weighted and cleaned to account for missing data. Compared with the C statistic from logistic regression as their baseline comparator, the winner was pretty clearly Random Forests. With a baseline 30-day readmission rate of 17.1% and 180-day readmission of 48.9%, the C statistic for the logistic regression model predicting 30-day readmission was 0.533 – basically no predictive skill. The Random Forest model, however, achieved a C statistic of 0.628 by training on the 180-day data set.

So, it’s reasonable to suggest there are complex and heterogenous data for which machine learning methods are superior to traditional models. These are, unfortunately, pretty terrible C statistics, and almost certainly of very limited use for informing clinical care. As with most decision-support algorithms, I would be curious also to see a comparison with a hypothetical C statistic for clinician gestalt. However, for some clinical problems with a wide variety of influential factors, these sorts of models will likely become increasingly prevalent.

“Analysis of Machine Learning Techniques for Heart Failure Readmissions”
http://circoutcomes.ahajournals.org/content/early/2016/11/08/CIRCOUTCOMES.116.003039

Don’t CTPA With Your Gut Alone

Many institutions are starting to see roll-out of some sort of clinical decision-support for imaging utilization. Whether it be NEXUS, Canadian Head CT, or Wells for PE, there is plenty of literature documenting improved yield following implementation.

This retrospective evaluation looks at what happens when you don’t obey your new robot overlords – and perform CTPA for pulmonary embolism outside the guideline-recommended pathway. These authors looked specifically at non-compliance at the low end – patients with a Wells score ≤4 and performed with either no D-dimer ordered or a normal D-dimer.

During their 1.5 year review period, there were 2,993 examinations and 589 fell out as non-compliant. Most – 563 – of these were low-risk by Wells and omitted the D-dimer. Yield for these was 4.4% positivity, compared with 11.2% for exams ordered following the guidelines. This is probably even a high-end estimate for yield, because this includes 8 (1.4%) patients who had subsegmental or indeterminate PEs but were ultimately anticoagulated, some of whom were undoubtedly false positives. Additionally, none of the 26 patients that were low-risk with a normal D-dimer were diagnosed with PE.

Now, the Wells criteria are just one tool to help reinforce gestalt for PE, and it is a simple rule that does not incorporate all the various factors with positive and negative likelihood ratios for PE. That said, this study should reinforce that low-risk patients should mostly be given the chance to avoid imaging, and a D-dimer can be used appropriately to rule-out PE in those where PE is a real, but unlikely, consideration.

“Yield of CT Pulmonary angiography in the emergency Department When Providers Override evidence-based clinical Decision support”
https://www.ncbi.nlm.nih.gov/pubmed/27689922

From Way Too Many CTs to Many CTs

I am always keen to hear reports of successful imaging reduction interventions – and, even moreso, in trauma. The typical, modern, approach to trauma involves liberal use of advanced imaging – almost to the point of it being a punch line.

This single-center before-and-after report details their experiences between 2006 and 2013. Before 2010, there was no specific protocol regarding CT in trauma – leading to institutional self-examination in the setting of rampant overuse. After 2010, the following protocol was in effect:

trauma algorithm

There isn’t much besides good news presented here. Their primary imaging use outcome, abdominopelvic CT, decreased from 76.7% to 44.6% of all presentations. This was related to an increase in mean ISS for those undergoing CT. When free fluid from non-traumatic causes was individually accounted for, the rate of positivity of these CT rose from 12.3% to 17.5%. Finally, mortality was unchanged – 3.1% vs. 2.7%.

No doubt, any reduction in imaging will miss some important findings. The net counterbalancing effect, however, is likely a massive reduction in costs and harms from further evaluation of false-positives, renal contrast injury, and radiation. And, after all, they’re still performing CTs on nearly half their patients!

“Effect of an Institutional Triaging Algorithm on the Use of Multidetector CT for Patients with Blunt Abdominopelvic Trauma over an 8-year Period”

http://pubs.rsna.org/doi/abs/10.1148/radiol.2016152021

Shaking Out Stroke Mimics

In a world of continued aggressive guideline- and pharmaceutical-sponsored expansion of stroke treatment with thrombolytics, this article fills and important need – better codifying the predictors of stroke mimics. While other editorials espouse the need to be fast without being sure, this is frankly irresponsible medicine – and, in resource-constrained environments, unsustainable.

These authors at two academic centers performed a retrospective clinical and imaging review of 784 patients evaluated for potential acute cerebral ischemia. Patients were excluded if they had signs of acute stroke on initial non-contrast imaging, and if they did not subsequently undergo MRI. Based on review of the totality of clinical information for each patient, 41% of this cohort were deemed stroke mimics. The authors scoring system, then derived 6 variables – and 3 or more were present, the chance of stroke mimic being cause of the current presentation was 87.2%. Their criteria:

  • Absence of facial droop
  • Age <50 y/o
  • Absence of atrial fibrillation
  • SBP <150 mm Hg
  • Presence of isolated sensory deficit
  • History of seizure disorder

When the rate of tPA administration to stroke mimics is ~15%, and 30-40% of patients evaluated for stroke are stroke mimics – there is a lot of waste and potential harm occurring here. These authors suggest the use of this score could potentially halve these errant administrations for 94% sensitivity, or cut errant administrations down to 2% with 90% sensitivity. Considering the patients for which stroke/stroke mimic is an ambiguous diagnosis, it is reasonably likely the symptoms are of lesser severity – and in the range for which tPA is of most tenuously “proven” value. While their rule has not been prospectively validated, some of these elements certainly have face validity, and can be incorporated into current practice at least as a reminder.

“FABS: An Intuitive Tool for Screening of Stroke Mimics in the Emergency Department”

http://stroke.ahajournals.org/content/early/2016/08/04/STROKEAHA.116.013842.abstract

The Extra Head CTs in Trauma, Estimated

In the world of academia and residency training, the spirited debate in trauma is usually regarding the merits of the “pan-scan” – and whether we can all agree it is probably safe to reduce costs and resource utilization by selective scanning. In community practice, it’s about picking up the needle in a haystack – and, hence, preventing the innumerable unnecessary CTs.

This is a retrospective review using electronic health record data to estimate the number of potentially unnecessary head CTs in the setting of trauma. These authors pulled records for all patients for whom a head CT was obtained, and for whom recorded EHR values suggested an encounter for trauma. This cohort was then evaluated for appropriateness of a CT by retrospectively determining the presence of high-risk or exclusion criteria for the Canadian CT Head Rule.

Among 27,240 patients extracted, 11,432 (42.0%) were “discordant” with the CCHR by structured EHR content. However, upon manual review of the chart narrative, the structured EHR content misclassified the CCHR recommendation 12.2% (95% CI 5.6-18.8%) of the time. Thus, the authors then estimate approximately 36.8% (95% CI 34.1-39.6%) of CT head for trauma in a community setting is inappropriate.

This is probably a reasonable research strategy, warts and all. Due to EHR limitations, they actually only filtered for 3 of the 5 high-risk criteria – basilar skull fracture and open skull fracture are such rare findings in their cohort the impact on overall results would be negligible. Then, Kaiser is probably more aggressive at minimizing CT use than the general community ED population, as routine quality improvement monitors individual and group rates of CT usage.

Bottom line: at least a third of head CTs for trauma in the community can probably be obviated by use of validated criteria.

“Computed Tomography Use for Adults with Head Injury: Describing Likely Avoidable ED Imaging based on the Canadian CT Head Rule”

http://www.ncbi.nlm.nih.gov/pubmed/27473552

The Febrile Infant Step-by-Step

You’ve heard of the Philadelphia Criteria. You’ve heard of the Rochester Criteria. But – Step-by-Step?

This is an algorithm developed by European emergency physicians to identify low-risk infants who could be safely managed without lumbar puncture nor empiric antibiotic treatment. After retrospectively validating their algorithm on 1,123 patients, this is their prospective validation in 2,185 – looking for IBI or “Invasive Bacterial Infection” as their primary outcome.

The easiest way to summarize their algorithm and results is by this figure:

Step by Step

Sensitivity and specificity, respectively, were as follows:

  • Rochester – 81.6% and 44.5%
  • Lab-score – 59.8% and 84.0%
  • Step-by-Step – 92.0% and 46.9%

The authors attribute 6 of the 7 missed by Step-by-Step to evaluation early in the disease process – presentation within 2 hours of onset of fever.

Their algorithm is reasonable at face validity, and could be incorporated into a protocol with close follow-up to re-evaluate those early in their disease process. We still have, however, a long way to go regarding specificity.

“Validation of the “Step-by-Step” Approach in the Management of Young Febrile Infants”
http://www.ncbi.nlm.nih.gov/pubmed/27382134

Next Up in Syncope Prediction

The Great White North is the land of clinical decision instruments.  Canadian Head, Canadian C-Spine, Ottawa Ankle, Ottawa SAH, the list goes on – and now, from the same esteemed group: the Canadian Syncope Risk Score.

The vast majority of patients with syncope have unrevealing initial and, if admitted, in-house evaluation.  That said, any physiologic interruptions in the ability to perfuse the brain portend a poor prognosis greater than the general background radiation.  These authors performed an observational study over the course of four years to prospectively derive a decision instrument to support risk-stratification for syncope.

There were 4,030 patients enrolled and eligible for analysis based on 30-day follow-up, and 147 of these suffered a “serious adverse event”.  They identified 43 candidate predictors for prospective collection, and ultimately this resulted in a multivariate logistic regression predictive model with 9 elements.  Scores range from -3, with a 0.4% estimated risk for SAE, to 11, with an 83.6% estimated risk for SAE.  Useable confidence intervals, however, were mostly scores <5.

There are a few things I would quibble with regarding this study.  The “serious adverse event” definition is rather broad, and includes 30-day events for which the underlying pathology was not present or necessarily preventable at the initial visit.  For example, a patient with a subsequent encounter for a GI bleed or a case of appendicitis fit their criteria of SAE.  This would diminish the instrument’s apparent sensitivity without materially improving its clinical relevance.  Then, there is the oddity of incorporating the final ED diagnosis into the scoring system – where a provisional diagnosis of “vasovagal syncope” is -2, and a diagnosis of “cardiac syncope” is +2.  The authors explicitly defend its inclusion and the methods behind it – but I feel its subjectivity coupled with widespread practice variation will impair this rule’s generalizability and external validation.

Finally, the last issue with these sorts of “rules”: “high risk” is frequently conflated to mean “admit to hospital”.  In many situations close to the end-of-life, the protective effect of hospitalization and medical intervention vanishes – and may have little or no value.  This sort of stratification should be applied within the appropriate medical and social context, rather than simply triggering admission.

“Development of the Canadian Syncope Risk Score to predict serious adverse events after emergency department assessment of syncope”
http://www.ncbi.nlm.nih.gov/pubmed/27378464

Perpetuating the Flawed Approach to Chest Pain

Everyone has their favored chest pain accelerated diagnostic risk-stratification algorithm or pathway these days.  TIMI, HEART, ADAPT, MACS, Vancouver, EDACS – the list goes on and on.  What has become painfully clear from this latest article, however, is this approach is fundamentally flawed.

This is a prospective effectiveness trial comparing ADAPT to EDACS in the New Zealand population.  Each “chest pain rule-out” was randomized to either the ADAPT pathway – using modified TIMI, ECG, and 0- and 2-hour troponins – or the EDACS pathway – which is its own unique scoring system, ECG, and 0- and 2-hour troponins.  The ADAPT pathway classified 30.8% of these patients as “low risk”, while the EDACS classified 41.6% as such.  Despite this, their primary outcome – patients discharged from the ED within 6 hours – non-significantly favored the ADAPT group, 34.4% vs 32.3%.

To me, this represents a few things.

We are still have an irrational, cultural fear of chest pain.  Only 11.6% of their total cohort had STEMI or NSTEMI, and another 5.7% received a diagnosis of “unstable angina”.  Thus, potentially greater than 50% of patients were still hospitalized unnecessarily.  Furthermore, this cultural fear of chest pain was strong enough to prevent acceptance of the more-aggressive EDACS decision instrument being tested in this study.  A full 15% of low-risk patients by the EDACS instrument failed to be discharged within 6 hours, despite their evaluation being complete following 2-hour troponin testing.

But, even these observations are a digression from the core hypothesis: ADPs are a flawed approach.  Poor outcomes are such the rarity, and so difficult to predict, that our thought process ought be predicated on a foundation that most patients will do well, regardless, and only the highest-risk should stay in the hospital.  Our decision-making should probably be broken down into three steps:

  • Does this patient have STEMI/NSTEMI/true UA?  This is the domain of inquiry into high-sensitivity troponin assays.
  • Does the patient need any provocative testing at all?  I.e., the “No Objective Testing Rule”.
  • Finally, are there “red flag” clinical features that preclude outpatient provocative testing?  The handful of patients with concerning EKG changes, crescendo symptoms, or other high-risk factors fall into this category.

If we are doing chest pain close to correctly, the numbers from this article would be flipped – rather than ~30% being discharged, we ought to be ~70%.

“Effectiveness of EDACS Versus ADAPT Accelerated Diagnostic Pathways for Chest Pain: A Pragmatic Randomized Controlled Trial Embedded Within Practice”