Ryan Radecki

The Andexxa Showpiece

Every so often a masterclass performance arises in the medical literature. A performance transcending the boundaries of what was once thought possible. A shining exemplar of human achievement.

This is a trial, published in the New England Journal of Medicine, with the following features:

Conducted by an institute sponsored by pharma.
Designed by the first author, a consultant for pharma, and two employees of pharma.
Written by a medical writer employed by pharma.
Replete with authors reporting multiple financial conflicts of interest with pharma.
Substantially modified trial procedures and outcomes two and three years into the trial.
Introduced an interim stopping rule whose analysis was performed by an unblinded statistician affiliated with the funded institute.
Stopped the trial early based on the new interim stopping rule.
Used a surrogate composite primary endpoint.
Allowed the “usual care” arm to include patients who did not receive an active treatment comparator.
Permitted discrepancies in the baseline characteristics favoring the experimental arm.

And, this is solely the reported mechanisms by which pharma has placed their hands on the scales of this trial. It ought to be quite clear these procedures were carefully designed to ensure the (financial) success of this trial, and its ultimate publication is virtually an advertorial for the product in question.

The culprit this go-around? AstraZeneca née Alexion née Portola for Andexxa – better known as “andexanet alfa” (even though the FDA declined their drug naming for this label, properly known as “Coagulation Factor Xa [Recombinant], Inactivated-zhzo”). The trial is ANNEXA-I, which purports to be a comparison between Andexxa and Prothrombin Concentrate Complexes.

As alluded to above, this trial was not designed to permit Andexxa to fail. With Andexxa sales climbing and approaching $200M annually, it is obviously impermissible to allow a trial to offer a hint of doubt – especially considering Portola/Alexion/AstraZeneca have been investing in “expert guidelines” aimed at elevating Andexxa above PCCs as first-line treatment for Factor Xa-associated bleeding.

So – naturally, Andexxa “succeeds”. On the composite endpoint of “good hemostatic efficacy” – hematoma volume change < 35%, NIHSS change < 7 points, and no use of rescue therapy between 3-12 hours – Andexxa outperformed “usual care” by 13.4%, 67.0% to 53.1%. The primary limiting factor to this composite endpoint was the sub-endpoint of hematoma volume change of < 35%. And, as this composite favours Andexxa, the trial was stopped early – and the favorable press releases roll in. Ideally, this is the point at which our sponsors would like us to stop further analysis and critique.

Interestingly, the main paper presents an efficacy analysis consisting of 452 patients. However, between the initiation of the interim analysis and cessation of trial procedures, the authors enrolled an additional 78 patients. The authors report findings from all 530 in their safety analysis, but exclude them from the primary efficacy analysis – consigning the full cohort analysis to a supplementary appendix. There is no obvious reason to do so – other than the fact the larger cohort demonstrates less favorable results for Andexxa, with the hemostatic efficacy composite dropping from 67.0% to 63.9%. As is frequently cautioned regarding stopping trials early, doing so inflates the confidence intervals, diminishes the precision of an effect size estimate, and precludes the natural propensity of regression to the mean.

Then, there are the trial procedures. Prior to a protocol amendment excluding subdural hematomas, the Andexxa group included 13 patients with SDH, as compared with only 4 in “usual care”. Subdural hematomas, generally speaking, have far less sinister an outcome than intracerebral hemorrhage – an imbalance favoring the Andexxa cohort. Then, bizarrely, only 85% of the “usual care” cohort received anticoagulation reversal using PCCs. Very little data is included regarding these 60 patients receiving “non-PCC” care at the discretion of their treating clinicians. What sort of selection bias led clinicians to withhold an active treatment for ICH? Without concrete data, it is impossible to do more than speculate, but it seems logical to theorize these patients must have been disadvantaged by their lack of treatment.

Next, there are The Downsides. Treatment with Andexxa very clearly causes increased arterial thrombotic events. Ischemic strokes occurred in 6.5% of those treated with Andexxa, as compared to 1.5% receiving “usual care”. Myocardial infarctions occurred in 4.2% of those treated with Andexxa, as compared to 1.5% of those receiving “usual care”. A smaller excess of pulmonary embolism was seen in the “usual care” arm, however.

Lastly, there are the patient-oriented outcomes. Naturally, with a trial stopped early due to a composite surrogate, the authors are quick to mention the trial is underpowered to evaluate these endpoints. However, the overall outcomes of patients included in this trial are grim – and they are more grim for those treated with Andexxa. At 30 days, only 28% of patients treated with Andexxa achieved a modified Rankin scale of 0 to 3, compared with 31% in the “usual care” cohort. Similarly, 27.8% of patients treated with Andexxa had died at 30 days, as compared with 25.5% of those receiving “usual care”.

So, there you have it – such a “success” story of a trial it needed to be stopped early, and we still have no clear evidence Andexxa ought to be favored over “usual care”. The authors merrily cite INTERACT1, the trial upon which the “hematoma growth” surrogate is “validated” – and they will rely on this heavily for marketing purposes. In the end, we have exactly what we ought to have expected from a trial designed to stand on its head to deliver for its product, and we as clinicians are ever-poorer for it.

Andexanet for Factor Xa Inhibitor-Associated Acute Intracerebral Hemorrhage

Cancer Clinical Trials Don’t Benefit Patients

Hearkening back to my former life as the chair of an Institutional Review Board: you do not promise or imply a potential for benefit to clinical trial participants.

Why? Because clinical trials aren’t designed to benefit participants. Participants may be randomized to the “standard of care” arm. The trial drug may not have any improvement in efficacy over the “standard of care”. Worse, the trial drug may, in fact, have greater toxicity than the current options. Finally, there are the frequent – and frequently invasive – trial procedures: blood draws, repeat imaging, and repeat tumor sampling.

The perception remains clinical trials produce better outcomes for some trial participants – but the whole of the literature does not support this conclusion. This systematic review and meta-analysis from JAMA clearly shows the data are insufficient to support a net benefit from cancer clinical trial participation. Small signals of benefit are most likely the result of trial effects and publication bias.

The unquestioned benefit? To pharma – and, distantly, potentially to future patients.

While this study does not exclude such benefits for cancer clinical trial participation, it remains unsubstantiated.

https://pubmed.ncbi.nlm.nih.gov/38767595

When EHR Interventions Succeed … and Fail

This is a bit of a fascinating article with a great deal to unpack – and rightly published in a prominent journal.

The brief summary – this is a “pragmatic”, open-label, cluster-randomized trial in which a set of interventions designed to increase guideline-concordant care were rolled out via electronic health record tools. These interventions were further supported by “facilitators”, persons assigned to each practice in the intervention cohort to support uptake of the EHR tools. In this specific study, the underlying disease state was the triad of chronic kidney disease, hypertension, and type II diabetes. Each of these disease states has well-defined pathways for “optimal” therapy and escalation.

The most notable feature of this trial is the simple, negative topline result – rollout of this intervention had no reliably measurable effect on patient-oriented outcomes relating to disease progression or acute clinical deterioration. Delving below the surface provides a number of insights worthy of comment:

The authors could have easily made this a positive trial by having the primary outcome as change in guideline-concordant care, as many other trials have done. This is a lovely example of how surrogates for patient-oriented outcomes must always be critically appraised for the strength of their association.
The entire concept of this trial is likely passively traumatizing to many clinicians – being bludgeoned by electronic health record reminders and administrative nannying to increase compliance with some sort of “quality” standard. Despite all these investments, alerts, and nagging – patients did no better. As above, since many of these trials simply measure changes in behavior as their endpoints, it likely leaves many clinicians feeling sour seeing results like these where patients are no better off.
The care “bundle” and its lack of effect size is notable, although it ought to be noted the patient-oriented outcomes here for these chronic, life-long diseases are quite short-term. The external validity of findings demonstrated in clinical trials frequently falls short when generalized to the “real world”. The scope of the investment here and its lack of patient-oriented improvement is a reminder of the challenges in medicine regarding evidence of sufficient strength to reliably inform practice.

Not an Emergency Medicine article, per se, but certainly describes the sorts of pressures on clinical practice pervasive across specialties.

“Pragmatic Trial of Hospitalization Rate in Chronic Kidney Disease”
https://www.nejm.org/doi/full/10.1056/NEJMoa2311708

Quick Hit: Elders Risk Assessment

A few words regarding an article highlighted in one of my daily e-mails – a report regarding the Elders Risk Assessment tool (ERA) from the Mayo Clinic.

The key to the highlight is the assertion this score can be easily calculated and presented in-context to clinicians during primary care visits, allowing patients with higher scores to be easily identified for preventive interventions. With an AUC of 0.84, the authors are rather chuffed about the overall performance. In fact, they close their discussion with this rosy outlook:

The adoption of a proactive approach in primary care, along with the implementation of a predictive clinical score, could play a pivotal role in preventing critical ill- nesses, benefiting patients and optimizing healthcare resource allocation.

Completely missed by their limitations is that prognostic scores are not prescriptive. The ERA is based on age, recent hospitalizations, and chronic illness. The extent to which the management of any of these issues can be addressed “proactively” in the current primary care environment, and demonstrate a positive impact on patient-oriented outcomes, remains to be demonstrated.

To claim a scoring system is going to better the world, it is necessary to compare decisions made with formal prompting by the score to decisions made without – several steps removed from performing a retrospective evaluation to generate an AUC. It ought also be appreciated some decisions based on high ERA scores will increase resource utilization without a corresponding beneficial effect on health, while lower scores may likewise inappropriately bias clinical judgement.

This article has only passing applicability to emergency medicine, but the same issues regarding the disutility of “prognosis” apply widely.

“Individualized prediction of critical illness in older adults: Validation of an elders risk assessment model”
https://agsjournals.onlinelibrary.wiley.com/doi/abs/10.1111/jgs.18861

Update to Start 2024

A brief post collating a few bits of my various work published across the interwebs ….

The Annals of Emergency Medicine Podcast continues to summarise the meatiest articles from each month, featuring a cycle of new co-hosts, as well:

Naturally, there are continuing Journal Club features, covering the following articles:

I should also point out a couple additional new publications with two very different and amazing teams:

Lastly, in ACEPNow, we have:

Enjoy!

Everyone’s Got ChatGPT Fever!

And, most importantly, if you put the symptoms related to your fever into ChatGPT, it will generate a reasonable differential diagnosis.

“So?”

This brief report in Annals describes a retrospective experiment in which 30 written case summaries lifted from the electronic documentation system were fed to either clinician teams or ChatGPT. The clinician teams (either an internal medicine or emergency medicine resident, plus a supervising specialist) and ChatGPT were asked to generate a “top 5” of differential diagnoses, and then settle upon one “most likely” diagnosis. Each case was tested both solely on the recorded narrative, as well as with laboratory results added.

The long and short of this brief report is the lists of diagnoses generated contained the correct final diagnosis with similar frequency – about 80-90% of the time. The correct leading diagnosis was chosen from these lists about 60% of the time by each. Overlap between clinicians and ChatGPT in their lists of diagnoses was, likewise, about 50-60%.

The common reaction: wow! ChatGPT is every bit as good as a team of clinicians. We ought to use ChatGPT to fill in gaps where clinician resources are scarce, or to generally augment clinicians contemporaneously.

This may indeed be a valid reaction, and, looking at the healthcare funding environment, it is clear billions of dollars are being thrown at the optimistic interpretation of these types of studies. However, what is lacking from these studies are any sort of comparison. Prior to ChatGPT, clinicians did not operate in an information resource vacuum, as is frequently the case in these contrived situations. When faced with clinical ambiguity, clinicians (and patients) have used general search engines, in addition to medical knowledge-specific resources (e.g., UpToDate) as augments. These ChatGPT studies are generally, much like many decision-support studies, quite light on testing their clinical utility and implementation in real-world contexts.

Medical applications of large language models are certainly interesting, but it is always valuable to remember LLMs are not “intelligent” – they are simply pattern-matching and generation tools. They may, or may not, provide reliable improvement over current information search strategies available to clinicians.

“ChatGPT and Generating a Differential Diagnosis Early in an Emergency Department Presentation“

Don’t Use Lytics in Mild Stroke, Part 3

Well, PRISMS demonstrated unfavorable results.

MARISS tried to ascertain predictors of poor outcome in mild stroke, and intravenous thrombolysis was not associated with an effect on the primary outcome.

Now, again, we examine thrombolysis in “mild” stroke, in this case, NIHSS ≤3 – and fail.

Like MARISS, this is a retrospective dredge of patients selected by the treating clinicians to receive either intravenous thrombolysis or, in this case, dual-antiplatelet therapy with clopidogrel and aspirin. The population included for analysis is the Austrian Stroke Unit Registry from 2018 until 2019, an original cohort of 53,899 patients. Of these, 29,252 were NIHSS ≤3, but exclusions meant nearly 25,000 were left out – primarily those whose strokes were the result of atrial fibrillation, or whose treating clinicians chose platelet monotherapy instead of dual antiplatelet therapy.

The remaining ~4,000 were analyzed both in their unadjusted cohorts, as well as propensity scored cohorts comprised of roughly 20% of the original. In the unadjusted cohorts, efficacy and safety outcomes were universally worse in those selected for thrombolysis – but, of course, were generally more severe stroke syndromes. After propensity score matching, these differences generally disappeared – except a preponderance of sICH in the thrombolysis cohort.

The authors here conclude there’s no evidence of superiority for thrombolysis in mild stroke, and their results fit broadly with those from other cohorts. It’s observational and unreliable, but it ought to be a very reasonable stance to withhold thrombolysis for mild strokes pending trials conclusively demonstrating which, if any, mild strokes do improve with thrombolysis.

“IV Thrombolysis vs Early Dual Antiplatelet Therapy in Patients With Mild Noncardioembolic Ischemic Stroke“

Which Sepsis Alert is the Biggest Loser?

It’s a trick question – in the end, all of us have already lost.

This is a short retrospective report evaluating, primarily, the Epic Sepsis Prediction Model, and the mode in which is deployed. The Epic SPM generates a “prediction of sepsis score”, calculated at 15 minute intervals, providing a continuous risk score for the development of sepsis. Of course, in modern medicine, this is usually reduced to a trigger threshold at which point an alert is fired. Alerts, alerts, alerts – what are they good for?

In this study, the Epic SPM was evaluated at several difference SPS score thresholds ranging from ≥5 to ≥10 – and compared, as well, with SIRS, qSOFA, and SOFA. There were two goals for the evaluation: accuracy and timeliness. All prediction tools provided the same age-old tradeoff between sensitivity and specificity, with a PSS of ≥5 being 95% sensitive, but merely 53% specific. Likewise, a more specific cut-off sacrificed sensitivity. SIRS, qSOFA, and SOFA suffered from the same limitations.

The “time to detection” was a bit more interesting, but conclusions are a bit limited by the methods used to determine. The PSS is calculated at 15 minute intervals, while their calculations of SIRS, qSOFA, and SOFA all happened at hourly intervals. Then, “time zero” for their calculations was actually determined by the time of clinician action – the time at which a clinician suspected sepsis and ordered either antimicrobials or blood cultures. With respect to timeliness, only a minority of patients met threshold scores at “time zero” – except SIRS, where nearly half were at threshold.

So, it’s hard to conclude much from these data – other than, as previously alluded, we are all losers. These alerts are clearly useless, yet they, and the Surviving Sepsis bundle gestapo have trained clinicians to leap at the earliest opportunity to (over)diagnose sepsis and administer broad-spectrum antibiotics. Multiple specialty societies have asked for the SEP-1 measures to be rolled back due to these obvious harms, let alone the administrative costs, and eliminating that “quality” measure would go a long way to putting these useless alerts to bed.

Sepsis Prediction Model for “Determining Sepsis vs SIRS, qSOFA, and SOFA”

End Nail Dogma

In a world of doors, truck beds, furniture, and other finger-crushing nuisances, emergency department visits for injuries involving the distal digits are common. Injuries range from tuft fractures, to degloving injuries, to all manner of nail and nailbed derangement.

Perusing any textbook or online resource will typically advise some manner of repair, including, but not limited to, replacing an avulsed nail back into the proximal nail fold and securing it in place. If the avulsed nail is not available, recommendations include placing a bit of foil into the proximal nail fold. The general idea being that failure to do so will irretrievably scar the germinal matrix, resulting in some disfigured and mutant nail growth.

The NINJA trial tests whether this dogma is valid – and, rather unsurprisingly, finds it is not.

In this trial, children with finger nail and nailbed injuries requiring surgical repair were randomized, at the conclusion of the injury repair, either to replacement of the nail (or foil) into the nail fold, or to discard the nail and simply leave on a non-adherent dressing. The “c0-primary” outcomes were cosmetic appearance of the nail (using the Oxford Fingernail Appearance Score) and surgical-site infection at 1 week follow-up.

The majority of the 451 children involved were aged younger than 6 and most were crush injuries resulting in avulsion of the nail plate. The primary outcomes were no different between groups – 5 and 2 surgical-site infections in the “nail replacement” and “nail discarded” groups, respectively, and median OFNAS score was 5 (the highest score) in each group. Lest the trial be accused of just failing to demonstrate a difference favoring the “nail replacement” group, it was actually the “nail discarded” group having a non-significantly more favorable distribution of cosmetic scores.

When suggesting these results are unsurprising, it’s rather just a perspective many clinical encounters in the emergency department are “over-medicalized”, and receive unnecessary tests or treatment simply due to the spectrum bias associated with acute care. Most healthy human substrate is capable of healing from minor injury in a satisfactory fashion; hopefully, these results further inform the care of children with finger nail injuries, and, may be reasonably generalized to other nails and healthy adults.

“Effectiveness of nail bed repair in children with or without replacing the fingernail: NINJA multicentre randomized clinical trial“

The Opiates in Back Pain Conundrum

We do love to give out opiates in the emergency department. Kidney stone? Opiates. Broken arm? Opiates. Gunshot wound? Opiates. Sore throat? Dexamethasone. And opiates.

So of course we’re here with opiates for your back pain.

In this modern day, we are far, far more judicious than in times of yore, back when pharma had lobbied for pain to become the “fifth vital sign”. But, nonetheless, those patients who are struggling to manage despite non-opiate analgesia frequently end up with some sort of small supply to try and resolve an acutely painful condition.

The OPAL trial, published in The Lancet, is yet another in a series of trials decrying the disutility of virtually anything for back pain – in the context of prior work diminishing the efficacy of skeletal muscle relaxants, as well as even acetaminophen added to ibuprofen. In this trial, patients with “acute” low back pain were prescribed an oxycodone-based opiate or matching placebo, and their functional recovery was assessed in follow up. Unfortunately, no advantage was seen for patients randomized to oxycodone, while there were small, but likely real, risks for opiate misuse at later intervals.

However, does this trial apply to the emergency department?

Patients were eligible if they had low back pain for up to 3 months. This is not exactly “acute” – especially since early versions of the protocol excluded patients whose back pain had been ongoing for less than 2 weeks.
Modified-release oxycodone-naloxone was the opiate of choice in this Australian trial. The naloxone itself does not exert much influence on the analgesic effect, but the preparation itself differs from preparation used commonly in the emergency department.
The follow-up interval was at six weeks, a good patient-oriented timeframe for long-term clinical resolution. However, emergency department treatment tends to choose opiate analgesia with the goal of short-term mobilization and return to activity, so 48- or 72- hour relief or functioning may be more relevant.

The most notable problem with this trial is not, in fact, the trial itself. Rather, the issue remains the paucity of true short-term data regarding any added benefit for the minimally effective quantity of opiates usually dispensed from the emergency department. Spring into action, team!

“Opioid analgesia for acute low back pain and neck pain (the OPAL trial): a randomised placebo-controlled trial”