Tranexamic Acid & The WOMAN Trial

Tranexamic acid is popular for the treatment of freckles and nosebleeds – oh, and major bleeding in the setting of trauma. But, originally, the drug was developed for use in controlling hemorrhage in obstetrics and gynecology. Finally, then, we have a trial examining its use for its intended purpose.

Comprising 20,060 patients with clinically significant post-partum hemorrhage across 193 hospitals in 21 countries, the WOMAN trial is – inconveniently – negative as originally designed. The initial study design called for 15,000 patients and a composite endpoint of hysterectomy or death within six weeks of childbirth. However, as the study progressed, it was clear the standard practice in the settings involved indicated the intervention was going to have no effect on hysterectomy rates, and the trial was then expanded to examine the effect on mortality.

So, then, with their expanded sample size, does TXA save lives, as reported profusely throughout the lay media?

Nope.

Mortality within 6 weeks was 2.3% in the TXA cohort and 2.6% with placebo a relative risk of 0.88 (0.74-1.05).

There is, however, some layered complexity in these outcomes. Broken down by cause of death, deaths due to bleeding were 1.5% in the TXA cohort compared with 1.9% with placebo, reaching “statistical significance” with a p-value of 0.045. Then, if you further unpack these results, it seems even within the TXA cohort there is probably a time-to-treatment effect similar to CRASH-2.  Mortality was 1.2% in those receiving their TXA within 3 hours compared with 1.7% treated with placebo. In those treated beyond 3 hours, there was no difference in outcomes – and much higher mortality, regardless (2.6% vs. 2.5%).

So, what should we take away from these data? Is TXA more than just a treatment for freckles, or are these authors and the lay media exaggerating secondary outcomes in the setting of an overall negative trial? As usual, the answer is a little bit of both. The magnitude of the treatment effect, considering the size of this trial, is very, very small. That said, death is a quite meaningful clinical outcome, TXA is fairly inexpensive, and no specific harms were detected in this trial. Therefore, in the settings in which this trial was conducted – Nigeria, Pakistan, Sudan, Albania, etc. – this is likely an important treatment for post-partum hemorrhage.

In more robust clinical settings where additional resources are typically available to support the resuscitation of women suffering bleeding complications from childbirth, the effect size on mortality is likely even much smaller. There may be clinically important effects regarding hysterectomy, hemostasis, and reduction in transfusion utilization, but I again suspect they will be very small and difficult to quantify without a similarly large trial. Then, as the NNT increases for clinically important outcomes, even the very rare harms of a treatment become relevant – and failure of this trial to detect harms may simply be a limit of its statistical power.

Ultimately, as the mortality benefit decreases, the range of acceptable practice variation for protocols incorporating TXA increases.  This is an important trial – but, as typically, not quite as breathlessly so.

“Effect of early tranexamic acid administration on mortality, hysterectomy, and other morbidities in women with post-partum haemorrhage (WOMAN): an international, randomised, double-blind, placebo-controlled trial”
http://thelancet.com/journals/lancet/article/PIIS0140-6736(17)30638-4/abstract

No Change in Ordering Despite Cost Information

Everyone hates the nanny state. When the electronic health record alerts and interrupts clinicians incessantly with decision-“support”, it results in all manner of deleterious unintended consequences. Passive, contextual decision-support has the advantage of avoiding this intrusiveness – but is it effective?

It probably depends on the application, but in this trial, it was not. This is the PRICE (Pragmatic Randomized Introduction of Cost data through the Electronic health record) trial, in which 75 inpatient laboratory tests were randomized to display of usual ordering, or ordering with contextual Medicare cost information. The hope and study hypothesis was the availability of this financial interest would exert a cultural pressure of sorts on clinicians to order fewer tests, particularly those with high costs.

Across three Philadelphia-area hospitals comprising 142,921 hospital admissions in a two-year study period, there were no meaningful differences in lab tests ordered per patient day in the intervention or the control. Looking at various subgroups of patients, it is also unlikely there were particularly advantageous effects in any specific population.

Interestingly, one piece of feedback the authors report is the residents suggest most of their routine lab test ordering resulted from admission order sets. “Routine” daily labs are set in motion at the time of admission, not part of a daily assessment of need, and thus a natural impediment to improving low-value testing. However, the authors also note – and this is probably most accurate – because the cost information was displayed ubiquitously, physicians likely became numb to the intervention. It is reasonable to expect substantially more selective cost information could have focused effects on an adea of particularly high cost or low-value.

“Effect of a Price Transparency Intervention in the Electronic Health Record on Clinician Ordering of Inpatient Laboratory Tests”

http://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2619519

Dexamethasone Dilemma

Look! On Twitter! Two highly-respected medical minds taking the same trial publication and producing two, very different responses:

The controversy stems from a small study examining the relatively common practice of treating pharyngitis with an oral steroid – usually dexamethasone – for its anti-inflammatory effect. Most pharyngitis does not require antibiotics, and physicians understandably prefer to try something to provide relief from suffering.

This study enrolled 576 patients in a randomized, placebo-controlled, double-blind trial in an outpatient general practice setting. Patients were provided either 10mg of oral dexamethasone or an identical lactose placebo. Patients could only enter into the trial if immediate antibiotics were not prescribed, but physicians were allowed to give a “delayed” prescription for failure to improve.

The trial is statistically negative for the primary outcome, complete resolution of sore throat at 24 hours. Of those assigned to dexamethasone, 22.6% had complete symptom resolution at 24 hours, compared with 17.7% of placebo, an absolute risk difference of 4.7% (-1.8 to 11.2)[sic]. The effect size is slightly larger at 48 hours, 8.7%, which does reach statistical significance – and thus the NNT noted above by Ian Stiell. Nearly all the other secondary outcomes – resource utilization, subsequent antibiotic use, use of pain relief – favor dexamethasone, but generally range in effect size between 1-4%.

Does the failure to meet statistical significance for the primary outcome refute this therapy as effective? Not hardly – but it certainly calls into question whether the difference is reproducible or clinically meaningful. Plug these data into Ioannidis’ framework regarding the reliability of research findings, and we see this is precisely the sort of work where both conclusions are reasonable. Is there a signal for a symptomatic benefit? Absolutely. The strength of the signal, however, is not strong enough to overcome whatever pre-study odds you placed on the treatment being successful. If, like many, you feel this is a treatment likely beneficial, this study appears confirmatory. If, like many, you feel systemic steroids for symptomatic pharyngitis is inane, this study does little to change your view of the inadequate risk/benefit ratio.

Another possible interpretation of these data is the possibility of variable effects within subgroups, where the entire small effect size seen in these data results from a more substantial effect size in some fraction of the cohort. For example, the mean duration of symptoms was ~3.9 days, with a SD of ~1.7 days. Could the recency of symptoms be associated with likelihood of benefit? Any secondary analyses such as these, particularly in a small trial like this, would only serve as fodder for future investigations.

I have seen, however, other folks using this as an opportunity to link to the recent BMJ publication regarding adverse events and corticosteroid exposures. Without delving into that publication in detail, it would be a mistake to generalize those data to this population. That said, systemic corticosteroids are certainly not harmless. These authors rather ludicrously state “Short courses of oral steroids have been shown to be safe, in the absence of contraindications” – justified by a citation from 1982.

The final answer is somewhere in between our two friends above. Dexamethasone will help some patients with symptom relief from pharyngitis, and it will harm some.  Teasing out a prediction of the optimal risk/benefit for a patient is substantially challenging – and wide practice variation is justifiable from these data, as long as it is acknowledged the uncertainty in the evidence base.

“Effect of Oral Dexamethasone Without Immediate Antibiotics vs Placebo on Acute Sore Throat in Adults”
http://jamanetwork.com/journals/jama/fullarticle/2618622

PECARN, CATCH, CHALICE … or None of the Above?

The decision instrument used to determine the need for neuroimaging in minor head trauma essentially a question of location. If you’re in the U.S., the guidelines feature PECARN. In Canada, CATCH. In the U.K., CHALICE. But, there’s a whole big world out there – what ought they use?

This is a prospective observational study from two countries out in that big remainder of the world – Australia and New Zealand. Over approximately 3.5 years, these authors enrolled patients with non-trivial mild head injuries (GCS 13-15) and tabulated various rule criteria and outcomes. Each rule has slightly different entry criteria and purpose, but over the course of the study, 20,317 patients were gathered for their comparative analysis.

And, the winner … is Australian and New Zealand general practice. Of these 20,000 patients included, only 2,106 (10%) underwent CT. It is hard to read between the lines and determine how many of the injuries included in this analysis were missed on the initial presentation, but if rate of neuroimaging is the simplest criteria for winning, there’s no competition. Applying CHALICE to their analysis cohort would have increased their CT rate to approximately 22%, and CATCH would raise the rate to 30.2%. Application of PECARN would place 46% of the cohort into CT vs. observation – an uncertain range, but certainly higher than 10%.

Regardless, in their stated comparison, the true winner depends on the value-weighting of sensitivity and resource utilization. PECARN approached 100% or 99% sensitivity, missing only 1 patient with clinically important traumatic brain injury out of ~10,000. Contrawise, CATCH and CHALICE missed 13 and 12 out of ~13,000 and ~14,000, respectively. Most of these did not undergo neurosurgical intervention, but a couple missed by CHALICE and CATCH would. However, as noted above, PECARN is probably substantially less specific than both CATCH and CHALICE, which has relatively profound effect on utilization for a low-frequency outcome.

Ultimately, however, any of these decision instruments is usable – as a supplement to your clinical reasoning. Each of these rules simplifies a complex decision into one less so, with all its inherent weaknesses. Fewer than 1% of children with mild head injury need neurosurgical intervention and these are certainly rarely missed by any typical practice. In settings with high CT utilization rates, any one of these instruments will likely prove beneficial. In Australia and New Zealand – as well as many other places around the world – potentially not so much.  This is probably a fine example of the need to compare decision instruments to clinician gestalt.

“Accuracy of PECARN, CATCH, and CHALICE head injury decision rules in children: a prospective cohort study”

http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(17)30555-X/abstract

Leave the Blood at Home?

In severely injured multi-system trauma patients, the gold standard for volume replacement is blood – in a relatively balanced ratio between PRBCs, plasma, and platelets. Match this need for blood with the conceptual “golden hour” for acute resuscitation, and it is reasonable to hypothesize there might be added benefit to providing blood products as early as feasible – including during emergency transport. Many of the most critically injured patients with time delays to a trauma center require aeromedical evacuation, so blood products on the helicopter may be ideal.

Sounds good, but the outcomes here are unfortunately not.

This is an observational report from nine trauma systems utilizing aeromedical transport, five of whose helicopters carried blood products and four whose carried only crystalloid. There were 25,118 patients during the study period, 2,341 of whom were transported by helicopter, and 1,058 of whom met “high risk” criteria. Approximately half of these were transported with blood products available, and 142 (24%) of those received transfusion.

Unfortunately, there were vast differences and great heterogeneity between the groups with and without blood products available, including GCS, ISS, and “prehospital lifesaving interventions”. There were similarly profound differences between those receiving blood and those not. The unadjusted mortality outcomes generally followed lower GCS and worse ISS, as one would expect. The authors then attempted a propensity-match analysis to dredge some signal from their data, but only 10% of their cohort could be parsed by their matching algorithm. Owing to only this small sample and the statistical techniques, no reliable difference in outcomes can be demonstrated.

The authors ultimately suggest a multicenter randomized trial will be required to adequately test whether the availability of blood has any mortality benefit. This is clearly the best strategy to improve our answer to this question, although it is prudent to recall non-obvious effect sizes in observational data potentially suggest only a very small magnitude of beneficial effect, if any. This must then be weighed against the important wastage of limited transfusion resources, which would require a non-trivial improvement in outcomes.

“Multicenter Observational Prehospital Resuscitation on Helicopter Study (PROHS)”

https://www.ncbi.nlm.nih.gov/pubmed/28383476

Just the Cost of Doing Business

Good news, everyone!

In the past two decades, for virtually every specialty, the number of paid medical malpractice claims has decreased. Overall, for all specialties, the rate of payment has been halved, compared with the 1992-1996 timeframe. Neurosurgery, unfortunately, is still the “winner”, followed by plastic surgery, thoracic surgery, and obstetrics. The lowest rates were seen in psychiatry and pediatrics. Emergency medicine sits right in the middle, with 18.8 paid claims per 1,000 physician years.

The bad news, unfortunately, was that the claim amounts – including paid claims greater than $1 million – increased. Emergency Medicine paid claim amounts increased 26.1% to a mean of $314,052 in the most recent time period of analysis, an increase in line with the overall mean for all specialties. The largest jump in payout amounts was essentially a tie between dermatology, gastroenterology, pathology, and urology. Neurosurgery actually had one of the lowest payout increases – probably because they started from such lofty heights, already.

Types of malpractice alleged varied by specialty, with the expected variation between diagnostic error, surgical error, and treatment errors between the diagnostic and surgical specialties. Most (63.6%) of malpractice alleged in emergency medicine fell into alleged diagnostic error, while logically 73.3% of alleged error in plastic surgery fell under surgical error.

These data, from the National Provider Data Bank, only document payments made for written claims and do not include settlements or monies paid out by institutions. Whether these actually represent a friendlier environment for physicians, more aggressive approaches to settling claims, or a shifting of liability to corporate proxy is not clear. Regardless, even if it is a little of all three, the trend is probably moving in the right direction.

“Rates and Characteristics of Paid Malpractice Claims Among US Physicians by Specialty, 1992-2014”

http://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2612118

Stem Cells for Stroke Redux

A few months ago, folks at Stanford were claiming miraculous recoveries after implanting stem cells directly into patients’ brains at the site of injury. An interesting concept, to be certain.

Now we have “stem cells lite”, or, at least, the slightly-fewer-holes-in-the-skull version – and it’s apparently just as miraculous.

This is a Phase 2 double-blinded dose-escalation study evaluating treatment with intravenous multipotent adult progenitor cells, with treatment initiated between 24 and 48 hours. Their trial design reflects the nature of a Phase 2 trial, with three cohorts, unbalanced allocation, and dosing differences between groups, but is otherwise fairly straightforward. Until you get to the primary outcome:

“The primary efficacy outcome was the multivariate global stroke recovery at day 90, which assesses global disability, neurological deficit, and activities of daily living and consists of mRS 2 or less; NIHSS total score improvement of 75% or more from baseline; and Barthel index of 95 or more in the multipotent adult progenitor cells treatment group, compared with the placebo treatment.”

Which is to say, they’ve conjured up their own unique black-box composite primary outcome – an outcome they changed midway through the trial.

Why would you need to change the primary efficacy outcome in 2014 for a study that started in 2011? The obvious implication is the results were unfavorable – and, the cursory review of their results table suggests this is a reasonable stance to take.

These authors screened 160 patients at several different sites for eligibility and ultimately randomized 129. Of these, three did not receive the allocated intervention – leaving the remainder for analysis. Patients in each group were generally similar based on NIHSS, time until infusion, and stroke interventions. Sticking to traditional outcomes measured by stroke trials, there was no difference between groups: mRS ≤2 in 37% of the intervention group and 36% of the placebo.  However:

“exploratory analyses suggested an increase in excellent outcome in the multipotent adult progenitor cells arms in the ITT population, and a beneficial clinical effect on long-term 1 year disability.“

This “excellent” outcome is the product of the midstream outcome change combined with their post-hoc data dredging for a feasible positive finding – a combination of patients with mRS ≤1, a NIHSS ≤1, and a Barthel Index ≥95. Then, the bulk of their analysis is further restricted to one year outcomes of those who received their stem cells within 36 hours from stroke onset. With such an obvious “beneficial clinical effect”, is there any question regarding the role of the funding source?

“The funder of the study was involved in study design and in data interpretation. All data collection and analysis were overseen by Medpace. One employee of the funder (RWM) was represented on the writing committee.“

and:

“DCH received grants from Athersys, payments to his university from Medpace for patient enrolment, has a patent on the MultiStem cells through his university and has received licensing revenue through his university. LRW received grants from SanBio and Athersys, and personal fees from SanBio. GAF is a consultant for Athersys; received personal fees from Medpace; and payment from Medpace to his institution for study costs. SS received grants from Athersys. SIS received grants from Athersys, and consulting fees that were paid to the institution from Mesoblast, Aldagen, and Celgene. CAS received grants from Athersys. DC received grants from Athersys.”

The likelihood these results are valid, reproducible, and have a clinically meaningful effect size is nearly zero – but that certainly won’t stop them from throwing good money after bad.

“Safety and efficacy of multipotent adult progenitor cells in acute ischaemic stroke (MASTERS): a randomised, double-blind, placebo-controlled, phase 2 trial”
https://www.ncbi.nlm.nih.gov/pubmed/28320635

D-Dimer, It’s Not Just a Cut-Off

It’s certainly simpler to have a world where everything is black or white, right or wrong, positive or negative. Once upon a time, positive cardiac biomarkers meant acute coronary syndrome – now we have more information and shades of grey in between. The D-dimer, bless its heart, is probably like that, too.

This is a simple study that pooled patients from five pulmonary embolism studies to evaluate the diagnostic performance characteristics of the D-dimer assay. Conventional usage is simply to deploy the test as a dichotomous rule-out – a value below our set sensitivity threshold obviates further testing, while above consigns us to the bitter radiologic conclusion. These authors, perhaps anticipating a more sophisticated diagnostic strategy, go about trying to calculate interval likelihood ratios for the test.

Using over 6,000 patients as their substrate for analysis, these authors determine the various likelihood ratios for D-dimer levels between 250 ng/mL and greater than 5,000 ng/mL, and identify intervals of gradually increasing width, starting at 250 and building up to 2,500. Based on logistic regression modeling, the fitted and approximate iLR range from 0.0625 for those with D-dimer less 250 ng/mL and increasing to 8 for levels greater than 5,000. Interestingly, a D-dimer between 1,000 and 1,499 had an iLR of roughly 1 – meaning those values basically have no effect on the post-test likelihood of PE.

The general implication of these data would be to inform more precise accounting of the risk for PE involving the decision to proceed to CTPA. That said, with our generally inexact tools for otherwise estimating pretest likelihood of disease (Wells, Geneva, gestalt), these data are probably not quite ready for clinical use. I expect further research to develop more sophisticated individual risk prediction models, for which these likelihood ratios may be of value.

“D-Dimer Interval Likelihood Ratios for Pulmonary Embolism”
https://www.ncbi.nlm.nih.gov/pubmed/28370759