(Failing to) Identify Severe Sepsis at Triage

This is the holy grail of predictive health informatics in Emergency Medicine – instant identification of serious morbidity, with the theoretical expectation of outcomes improvement due to early intervention.

And, more than almost any condition, accurate early identification of severe sepsis remains elusive.

This is an observational evaluation of the “Australian Triage Scale” in combination with infectious keywords as a tool to identify and manage patients with severe sepsis.  Patients were enrolled at presentation to the Emergency Department, and ultimately followed from triage through their ICU stay – where a clinical diagnosis of severe sepsis was used as the gold standard for outcomes. However, of the 995 patients triaged through the Emergency Department and ultimately diagnosed with severe sepsis, only 534 were identified at triage.  The authors present various diagnostic characteristics for each level of the ATS with regards to acuity, and the AUCs for sensitivity and specificity range from 0.457 to .567 (where 0.5 is basically a coin-flip).  So, the authors’ presented rule-based mechanism is nearly as likely to be incorrect as correct.  I’m not exactly certain how they came to the conclusion “the ATS and its categories is a sensitive and moderately accurate and valid tool”, but I tend to disagree.

These data are consistent with our a priori expectation for these sorts of tools.  The patients who trigger such rules are generally so obviously severe sepsis such rule-based notifications occur after clinician identification, and are simply redundant and alarm fatigue.  Conversely, patients with severe sepsis going undiagnosed upon initial presentation do so because of their atypical nature – and thus tend to fall outside rigid, rule-based constructs.  E.g., computers are not physicians … yet.

“Identification of the severe sepsis patient at triage: a prospective analysis of the Australasian Triage Scale”
http://www.ncbi.nlm.nih.gov/pubmed/25504659

Using Patient-Similarity to Predict Pulmonary Embolism

Topological data analysis is one of the many “big data” buzzphrases being thrown about, with roots in non-parametric statistical analysis, and promoted by the Palo Alto startup, Ayasdi.  I’ve done a little experimentation with it, and used it mostly to show the underlying clustering and heterogeneity of the PECARN TBI data set.  My ultimate hypothesis, based on these findings, would be that patient-similarity is a more useful predictor of individual patient risk than the partition analysis used in the original PECARN model.  This technique is similar to the “attribute matching” demonstrated by Jeff Kline in Annals, but of much greater granularity and sophistication.

So, I should be excited to see this paper – using the TDA output to train a neural network classifier for suspected pulmonary embolism.  Using 152 patients, 101 of which were diagnosed with PE, the authors develop a topological network with clustered distributions of diseased and non-diseased individuals, and compare the output from this network to the Wells and Revised Geneva Scores.

The AUC for the neural network was 0.8911, for Wells was 0.74, and Revised Geneva was 0.55. And this sounds fabulous – until it’s noted the neural network is being derived and tested on the same, tiny sample.  There’s no validation set, and, given such a small sample, the likelihood of overfitting is substantial.  I expect performance will degrade substantially when applied to other data sets.

However, even simply as scientific curiosity – I hope to see further testing and refinement of potentially greater value.

“Using Topological Data Analysis for diagnosis pulmonary embolism”
http://arxiv.org/abs/1409.5020
http://www.ayasdi.com/_downloads/A_Data_Driven_Clinical_Predictive_Rule_CPR_for_Pulmonary_Embolism.pdf

The 2014 AHA NSTE-ACS Guidelines

One of the best things about Emergency Medicine is the preponderance of guidelines imposed upon our management of patients by non-Emergency Medicine clinicians.  One of the most glorious offenders is the American Heart Association, dictating our care of Stroke and Acute Coronary Syndrome.

But, actually, this most recent update – despite the continued absence of Emergency Medicine from the Writing Committee – contains some interesting subtle shifts.  Out of its 150-odd pages of content and evidence, most of the Emergency Medicine-relevant content is in Section 3: Initial Evaluation and Management.  Many of the guidelines are not controversial – send patients with suspected ACS to the Emergency Department, give aspirin, obtain an ECG, etc.

But, as a Class I recommendation, they note patients with suspected ACS can be risk-stratified based on likelihood of ACS to decide on the need for hospitalization.  They also now include an expanded discussion of tools beyond the old stalwarts TIMI and GRACE, incorporating ED-centric tools such as the Vancouver Rule, the HEART score, and the HEARTS3 score.  This greatly expands guideline-based backing of these rules for shared decision-making with patients, and, frankly, makes the previously “mandatory” observation of patients with chest pain less so.

The next interesting bit relevant to the ED lay in subsection 3.4.1 – the use of biomarkers.  I’ll just reproduce my favorite bit here:

Class III: No Benefit
1. With contemporary troponin assays, creatine kinase myocardial isoenzyme (CK-MB) and myoglobin are not useful for diagnosis of ACS (158-164). (Level of Evidence: A)

The guidelines also imply, if symptom onset can be reliably determined, a single troponin measurement is reasonable 6+ hours after onset, or, for shorter onset timeframes, a troponin on arrival and a second as few as 3 hours after onset is reasonable to detect rising or falling levels.  And, beautifully, in subsection 3.5.1, all recommendations regarding discharge from the ED are Class IIa, only make weak recommendations for the reasonableness of observation, and acknowledge most patients with chest pain do not have ACS, and most are not at risk of ACS.

There is ample further fodder for the interested reader to pick apart recommendations and conflicts of interest – particularly with regards to the incorporation of newer antiplatelet agents – but, I’m generally pleased with the general direction of this guideline, as it applies to our practice.  However, this does not preclude the need for ACEP to develop its own Clinical Policy, to further guide and protect both patients and Emergency Physicians.

“2014 AHA/ACC Guideline for the Management of Patients With Non-ST-Elevation Acute Coronary Syndromes: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines”
http://circ.ahajournals.org/content/early/2014/09/22/CIR.0000000000000134.full.pdf+html

The BATiC Score for Pediatric Trauma – Promising, But Not Prime-Time

Excluding significant intra-abdominal trauma on the basis of clinical evaluation is a lost art in the realm of zero-miss.  Nowhere is this more important than in a pediatric population, considering the small, but real, potential from harms due to exposure to ionizing radiation from CT.

This is the Blunt Abdominal Trauma in Children (BATiC) score, derived in 2009 by a Swiss group.  This rule promotes use of clinical exam, ultrasonographic findings, and laboratory results to determine need for CT.  In this study, authors from the Netherlands retrospectively applied the rule to 216 pediatric trauma patients presenting in a four-year span between 2006 and 2010.  All told, this cohort contained 18 patients for whom intra-abdominal injury were identified, and a BATiC score cut-off of 6 would have a sensitivity of 100% and specificity of 87%, with an AUC of 0.98.  So, this all sounds splendid.

But, only 34 of these patients even received a CT scan as part of their evaluation – and, with the standard outcome definition being injuries diagnosed on CT or as part of hospitalization, there is potential for a fair number of missed diagnoses.  A reasonable case may be made whether any missed injuries were clinically significant, given lack of observed morbidity, but it would be difficult to have confidence based on such as small sample.  Furthermore, just as a simple cultural issue, trauma surgeons in the U.S. tend to feel any injury is clinically significant.

Then, 18.5% of observations used to validate this rule were missing from the retrospective data collection and required imputation.  The extent of this missing data further degrades the reliability of the observed diagnostic characteristics.  No confidence intervals are presented along with their results – but, rest assured, they are quite wide.  Ultimately, this decision-instrument may indeed be valid – but requires specific prospective evaluation.

As an interesting Costs of Care side note, the additional charge for a such a trauma encounter including a CT scan in the Netherlands?  A mere 148 euros.

“External validation of the Blunt Abdominal Trauma in Children (BATiC) score: Ruling out significant abdominal injury in children”
http://www.ncbi.nlm.nih.gov/pubmed/24747461

A Shared Decision-Making Trial … But Fatally Flawed?

Shared decision-making is developing as the proposed solution to many of the problems with resource utilization today.  Rather than embrace “zero miss” practice without properly involving patients as the decision-makers, we are now encouraged to offer the patient choices regarding their diagnostic and treatment decisions.  By sharing the decision – and the risk – I find patients quite amenable to forgoing much low-yield testing.

To that end, a multi-center trial has begun, evaluating the use of shared decision-making in low-risk chest pain.  The trial is based on an information graphic created by the Mayo Clinic, and individualized risk assessment is supported by Jeff Kline’s attribute-matching algorithms.  This is fabulous, from a conceptual standpoint – as shared decision-making is not nearly as feasible without the proper communication tools or best available evidence available at the point of care.

However, there’s an important missing element from the proposed information graphic:

Link to high-resolution version.

The decision tool explains the 45-day risk of myocardial infarction if testing is deferred.  However, the patient-oriented decision is between stress test (or CT coronary angiogram, at the University of Pennsylvania), cardiology follow-up, and primary care follow-up – and the decision aid doesn’t actually address those choices.  It does not describe the relative risks of MI between each option, and, more importantly, it does not describe the risks or benefits of the additional testing offered.  Without information regarding the rates of true positive and false positive test results, the incremental prognostic value of such tests, or the costs associated with additional testing, the patient doesn’t have the appropriate foundational information for their choice.

Conceptually, this is a fantastic trial.  However, I’m not sure the decision-aid has been correctly designed and implemented, with regard to the choices offered.  Indeed, if the poor test characteristics of stress and CTCA in this population were shared with patients, it would probably even show more powerful reductions in resource utilization.

“Effectiveness of the Chest Pain Choice decision aid in emergency department patients with low-risk chest pain: study protocol for a multicenter randomized trial”
http://www.trialsjournal.com/content/15/1/166

SIRS is Rarely Sepsis

You already knew this – but that hasn’t stopped your hospital from purchasing the “Sepsis Alert” tool for your electronic health record.  Now, you and your nurses get blasted with computerized interruptions every time a patient is tachycardic and has an elevated WBC count.  And, you ignore it – because it’s 1) wrong, or 2) you placed a central line and admitted the patient to the ICU half an hour ago.

But, just how often do these sepsis alerts, based on systemic inflammatory response criteria, fire erroneously?  That is the question asked by this group from Harbor-UCLA and UC Davis.  Using the National Hospital Ambulatory Medical Care Survey from 2007 to 2010, these authors attempted to estimate the frequency of true infection in the setting of SIRS.  Unfortunately, while the NHAMCS set now includes vital signs obtained at triage, it does not include results of tests, such as the WBC.  Therefore, these authors – and this is where the study breaks down a bit – were required to mathematically conjure up a range of estimates for the frequency with which patients would meet the WBC criterion for SIRS.  Based on minimum and maximum estimates, the percentage of Emergency Department visits estimated to have SIRS ranged from 9.7% to 26.0%, and the authors ultimately split the difference at 17.8% for their analysis.

Based on their estimate, there were approximately 66 million visits to Emergency Departments meeting SIRS criteria, and the largest cohort of eventual diagnoses for these patients was indeed infection – but this constituted a mere 26% of all SIRS.  The remaining diagnoses were scattered among trauma, mental disorders, respiratory diseases, and other non-specific, organ-system dysfunction, catch-all ICD-9 codes.  While the interruptions and low specificity of SIRS alert tools are the obvious problem addressed by this study, the other implication is the troubling scope of the problem:  after trauma and infection are excluded, there are approximately 42 million other ED visits that may erroneously trip institutional protocols, costly unnecessary testing, and additional resource utilization targeting sepsis.

This is the sort of decision-support that simply doesn’t add any proven value, and another venue of encroachment into efficient and effective care.

“Epidemiology of the Systemic Inflammatory Response Syndrome (SIRS) in the Emergency Department”
http://www.ncbi.nlm.nih.gov/pubmed/24868313

Predicting Past Massive Transfusion Practices

Traumatic resuscitation is evolving – and reasonably so – to an aggressive, early-intervention strategy.  The current evidence seems to suggest patients benefit from early, whole blood volume replacement in the setting of hemorrhage.

But, in order to aggressively intervene early, it’s necessary to predict such need equally early in the initial trauma assessment process.  Therefore, a variety of prediction decision-instruments have been derived, such as this one from Japan.  These authors looked retrospectively at 119 severely injured trauma patients, developed odds ratios for massive transfusion via logistic regression, and then created a scoring system with a cut-off predicting massive transfusion.  They then subsequently validated this score on another retrospective cohort of 113 patients from the same institution.  Their score contains, essentially, the expected elements – age, lactic acid level, systolic blood pressure, FAST exam findings, and pelvic fracture type – and a score of 15 or higher was 97.4% sensitive and 96.2% specific for massive transfusion.

However, what this rule predicts is not the population that needs massive transfusion – but, because both steps were performed retrospectively, it simply describes the consistency in the authors’ general practice at this single institution.  At the authors’ institution, the patients that looked like the ones described by the rule – elderly, hypotensive, positive FAST, etc. – are the ones that received massive transfusion.  Therefore, when they look back to derive a decision instrument – they’ll find it simply reflects their general practice.  Subsequently, to validate the instrument – again, if their practices haven’t changed, the decision rule will simply accurately reflect the continued practice pattern from which it was derived.  The authors do not mention whether they had a formal early massive transfusion protocol or practice in place, but, if so, this would further skew the decision instrument to reflect the guidelines guiding practice, rather than actual patient need.  Finally, for one last hit to external generalizability, a “massive transfusion” was defined as 10 units of PRBCs – which, in Japan, are about 1/3rd the volume of those in the United States.

Despite its reportedly excellent performance, this rule cannot be relied upon until prospective, external study validates its use.

“Predicting the need for massive transfusion in trauma patients: The Traumatic Bleeding Severity Score”
http://www.ncbi.nlm.nih.gov/pubmed/24747455

Damnit, Who Ordered That D-dimer?!

We live in strange, complicated times.  Popular to our twisted reality are haphazard panels of cardiac biomarkers, ordered unthinkingly via triage protocol or unwittingly by physicians using order sets.  Troponins, myoglobin, creatinine kinase, brain naturetic peptide, and, sometimes, D-dimer results will arrive on patients for whom no suspicion of cardiovascular disease is present.

So, what do you do with that positive D-dimer in a patient who, until just that moment, appeared to be zero-risk for pulmonary embolism – possibly, say, by PERC?

This retrospective chart review from four French hospitals identified all patients undergoing D-dimer testing as part of evaluation for pulmonary embolism.  Of 2,791 patients screened with complete data, 1,070 were PERC negative.  Of these 1,070 minimal risk patients, 167 had positive D-dimer.  153 of these 167 underwent diagnostic imaging for PE, with 5 detected.  Therefore, in this cohort, a patient who was PERC negative with a positive D-dimer had approximately 3.0% incidence of PE.

This result is, however, absent any other abstracted objective risk-stratification.  PERC was designed to work in concert with other objective or gestalt risk-stratification into a low-risk cohort.  So, even though these authors claim a number of unnecessary imaging studies, it is likely a handful of these were reasonable tests utilizing risk factors outside of PERC.

Regardless, please carry on properly ignoring the majority of inadvertent positive D-dimers – if PE is not reasonably in the differential, as it was in this study, the prevalence of PE will still be vanishingly small.

“Pulmonary Embolism Rule-out Criteria vs D-dimer testing in low-risk patients for the diagnosis of pulmonary embolism: a retrospective study in Paris, France.”
http://www.ncbi.nlm.nih.gov/pubmed/24736129

Splicing Up PECARN

The PECARN study is the largest of the prospective evaluations of children with minor head injury for clinically important traumatic brain injury.  The derived prediction instruments, for children aged <2 years of age and for children aged 2 to 18 years, generate “very low risk” cohorts whose incidence of important injury is negligible.  However, the overall incidence of cTBI was quite low in the entire study – meaning each positive predictor still only raises the risk of cTBI from negligible to tiny.

One of the predictors, vomiting, is an element in the decision instrument for children aged 2 to 18 years.  The management recommendation for patients with vomiting, then, defaults to “do as is your wont” – and studies suggest most folks are going ahead with CT, rather than using the “observation” option.

This study goes back and looks specifically at the vomiting component – and tries to tease out whether “isolated” or “non-isolated” prior to enrollment provided additional information.  Of the 5,392 enrolled patients with complete data, 815 had a single episode of vomiting – with 0.2% having cTBI.  The remaining 4,577 with non-isolated vomiting had a 2.5% incidence of cTBI.  The article goes further into the details of the data set, noting patients with vomiting who received CT were more likely to have cTBI – but also had other concomitant comorbid injury.

This is, unfortunately, not terribly profound – and of debatable utility.  The joy – what there is – of PECARN is its use as a decision-instrument with which to simplify medical decsion-making.  Mining the details of individual +LR and -LR provides more patient-specific information, but increases the complexity of knowledge translation – and ultimately decreases the contextual acceptability of the product.  The cTBI is heterogeneously distributed throughout the PECARN set – but the existing rule cannot be improved upon until better tools emerge to offload the cognitive demand required for for precision medicine-type applications.

“Association of Traumatic Brain Injuries With Vomiting in Children With Blunt Head Trauma”
http://www.ncbi.nlm.nih.gov/pubmed/24559605

ABCD3 – Better, But Good Enough?

The ABCD2 score for the prediction of stroke after TIA was initially touted as a possible risk-stratification tool geared towards determining which patients could undergo delayed evaluation for modifiable risk factors.  Unfortunately, the “low risk” cohort generated by the ABCD2 score still has an unacceptably high risk of stroke at 7 days, with poor predictive and discriminative power.

These authors try to take it to the next level – the ABCD3 score and the ABCD3-I score.  Derived retrospectively from the Kyoto Stroke Registry, the third element of D3 is “Dual TIA” – which is having had another TIA within the prior 7 days.  Then, the “I” is dependent upon having a positive MRI DWI lesion associated with concurrent ipsilateral carotid artery stenosis.

In their retrospective application of ABCD2, ABCD3, and ABCD3-I, as expected, the ABCD2 score showed minimal utility for the outcome of interest for the Emergency Department – the 7 day risk of stroke in the low-risk cohort was ~6%.  The low-risk ABCD3 and ABCD3-I cohort, however, had a ~1 to 2% risk at 7 days.  If verified prospectively, this begins to approach reliable utility for discharge decision-making in the ED.  Given the ABCD2 score’s checkered past, I would certainly wait for the next bit of evidence.

“ABCD3 and ABCD3-I Scores Are Superior to ABCD2 Score in the Prediction of Short- and Long-Term Risks of Stroke After Transient Ischemic Attack”
http://stroke.ahajournals.org/content/45/2/418.full