Combined Assessment of Function and Survival to Demonstrate the Effect of Treatment on Progressive Supranuclear Palsy
Relevant conflicts of interest/financial disclosures: M.G., I.R.M., T.J.B., S.D.B., T.G., H.L.G.V.T., and C.E. were employees of UCB during this study and may hold/have access to stock options. L.I.G. has consultancies with AI Therapeutics, Amylyx, Apellis, Aprinoia, Ferrer, IQVIA, Mitochon, Mitsubishi Tanabe, P3Lab, Roche, Switch, UCB, and Woolsey; participated in advisory boards with Amylyx, Roche, Rossy Centre (University of Toronto), and Springer; board of directors: CurePSP (travel expenses only); inventions: PSP Rating Scale and A Video Guide to the PSP Rating Scale (IP of Rutgers University). A.B. has consultancies with AGTC, Alector, Alzprotect, Amylyx, Arkuda, Arrowhead, Arvinas, Aviado, Eli Lilly, GSK, Humana, Merck, Modalis, Muna, Oligomerix, Oscotec, Pfizer, Roche, Switch, Transposon, and UnlearnAI; institution's research support with Biogen, Eisai, and Regeneron; research support with National Institute on Aging: NIH U19AG063911, R01AG078457, R01AG073482, R56AG075744, R01AG038791, RF1AG077557, P01AG019724, R01AG071756, and U24AG057437; Alzheimer's Association, Alzheimer's Drug Discovery Foundation, Association for Frontotemporal Degeneration, Bluefield Project to Cure FTD, Gates Ventures, GHR Foundation, Rainwater Charitable Foundation, UCSF Parkinson's Spectrum Disorders Center, and the University of California Cures AD Program. G.H. has employment with LMU University Hospital; ongoing research collaborations: AbbVie, Alzprotect, Amylyx, Aprineua, Asceneuron, Bayer, Bial, Biogen, Biohaven, Epidarex, Ferrer, Kyowa Kirin, Lundbeck, Novartis, Retrotope, Roche, Sanofi, Servier, Takeda, Teva, and UCB; honoraria: AbbVie, Bayer, Bial, Biogen, Bristol Myers Squibb, Kyowa Kirin, Pfizer, Roche, Teva, UCB, and Zambon.
Funding agencies: This research was funded by UCB. G.H. was supported by the European Joint Programme on Rare Diseases (Improve-PSP) and the Deutsche Forschungsgemeinschaft DFG, (German Research Foundation) under Germany's Excellence Strategy within the framework of the Munich Cluster for Systems Neurology (EXC 2145 SyNergy ID 390857198).
Full financial disclosures and author roles may be found in the online version of this article.
Abstract
Background
Progressive supranuclear palsy (PSP) is a rare and fatal neurodegenerative disorder for which there are currently no disease-modifying treatments. Recent trials of potential therapies had durations of 12 months, which may be insufficient because of nonrandom missingness due to death. Longer durations, incorporating PSP Rating Scale and survival, can reduce the potential for type II error. Selecting efficacy measures more sensitive to disease modification may facilitate identification of treatment effect.
Objective
The objective of this study was to evaluate the simulated phase 3 PSP trial assessing the effect of disease-modifying intervention on a novel combined primary endpoint comprising function (PSP Rating Scale) and survival, the Combined Assessment of Function and Survival (CAFS), and to determine operating characteristics of the CAFS.
Methods
To simulate PSP progression in the trial population, we developed models of PSP Rating Scale and survival using data from published clinical studies. These models were used to define operating characteristics of the CAFS for use in a phase 3 trial.
Results
The sample size determined (N = 384; 1:1 randomization) would provide >80% power to detect significant treatment effects on the CAFS compared with placebo. The CAFS provides good operating characteristics and increased power to detect moderate treatment effects on the PSP Rating Scale. We propose a trial design allowing potential detection of treatment effects at a preplanned interim analysis after participants complete 12 months of treatment, with assessment of effects of treatment (≤24 months) on survival.
Conclusions
Use of the CAFS could provide a comprehensive and robust estimate of the clinical benefit of future therapies. © 2024 UCB. Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.
Progressive supranuclear palsy (PSP) is a rare, progressive neurodegenerative disease.1, 2 The different clinical phenotypes of PSP share a common pathophysiology and may converge to PSP-Richardson's syndrome over time.3, 4 PSP leads to death after an average of 6 to 7 years.5, 6 Neuropathologically, PSP is an archetypical primary four-repeat tauopathy that affects neurons, oligodendrocytes, and astrocytes.7, 8
There are currently no approved disease-modifying treatments that halt or delay PSP progression.9 Recent studies have elucidated the pathogenesis of tauopathies, facilitating the development of disease-modifying interventions.10 However, earlier clinical trials (CTs) that evaluated the efficacy and disease modification potential of investigational therapies for PSP were unsuccessful.11, 12
To date, most PSP studies have used the PSP Rating Scale (PSPRS) as a primary outcome measure and defined change in the PSPRS within a 12-month observation period as the primary endpoint.11-13 The PSPRS is a prospectively validated physician-rated measure of disease severity in PSP, measured by 28 items within six categories (daily activities, mentation, bulbar, ocular motor, limb motor, gait/midline examination).14 The rater assigns a score of 0 to 2 or 0 to 4 points for each item, yielding a total PSPRS score from 0 to 100. Higher PSPRS scores indicate greater disease severity, with more functional impairment.14 Previous studies that used the PSPRS indicate annual increases in total PSPRS score in patients with PSP between 10 and 12.14-19 A reduction in progression of PSP scores by ~40% is considered clearly clinically meaningful20 and would likely increase survival time.
Results from past CTs suggest the investigational therapies evaluated were not efficacious. Optimization of trial protocols is still warranted to increase sensitivity of trials to changes in functional decline, obtain additional information on potential effects on mortality, and detect delayed effects of treatment (by increasing the duration of trials). Previously, data missing after patient death were treated as missing at random.11 This could bias estimates of treatment effect, reduce power to detect efficacy, and may cause potential reductions in mortality to be overlooked. This could lead to a lack of definitive data to support the approval of novel therapies.
Based on the Joint Rank Test (JRT),21 the Combined Assessment of Function and Survival (CAFS) was developed for analysis of function adjusted for mortality in patients with amyotrophic lateral sclerosis.22 Data from recent PSP trials, as well as recent FDA guidance for amyotrophic lateral sclerosis,23 prompted us to apply this methodology to PSP. We describe simulation work used to define operating characteristics of the CAFS and enable use of the CAFS in PSP trials.
Subjects and Methods
Two reference datasets were used to obtain the relevant parameters to inform trial simulations. Data from a phase 2/3 CT of davunetide24 were used to determine parameters for assessment of disease progression measured by PSPRS, and parameters for survival were assessed using data from the research conducted to develop and validate the PSPRS.14 Data from trials of riluzole25 and tideglusib16 were also used to calculate sample sizes required for assessment of impact on overall survival.
Phase 3 Clinical Trial Design
The proposed design of a hypothetical phase 3 PSP trial was a double-blind, placebo-controlled study with up to 24 months treatment duration, assessing efficacy, safety, and tolerability of a disease-modifying therapy. The final proposed design included a preplanned interim efficacy analysis after the last participant reached 12 months post-randomization, to allow early stopping for a positive result. The proposed population comprised participants ≥40 years of age with probable/possible PSP-Richardson's syndrome (meeting Movement Disorder Society-PSP diagnostic criteria).2 The primary endpoint was the CAFS, comprising change from baseline to month 24 in the signs and symptoms of PSP (assessed using PSPRS) and time to death to month 24.
The Item Response Theory Model to Simulate Longitudinal PSPRS Data
- Absence of efficacy or null: annual change from baseline in PSPRS total score as observed in the davunetide trial24 (~11.1 points).
- Clinically significant effect: treated participants annual change from baseline reduced by 40% compared with the null.
- Moderate effect: treated participants annual change from baseline reduced by 20% compared with the null.
The following visit schedule was assumed: baseline, six visits during year 1, and four visits during year 2.
The IRT model developed was qualified for use in simulations using the Visual Predictive Check approach. The Visual Predictive Check approach compared the predicted versus the observed mean score for each item, the predicted versus observed probabilities for each score item, and the predicted versus total score.
In neurodegenerative disease, moderate treatment effects (ie, 20% to 35% reduction in disease progression) often have no impact on cognitive impairment.27, 28 The IRT model allowed estimation of disease progression without assuming the relationship between time and increase in the total PSPRS score. The only parametric assumption in the IRT model links time with progression of disease (disease severity), which defines the latent variable.
Scores generated by the IRT model, representing the virtual dataset, were analyzed using a linear mixed-effects model to compute the mean percentage change in total PSPRS score between treatment groups. This two-step simulation process ensures the use of different models for virtual data creation and analysis.
Simulation of Overall Survival
The largest CT to evaluate survival for up to 3 years in people with PSP-Richardson's syndrome was the riluzole phase 3 trial,25 which estimated the probability of survival for 3 years from enrollment to be approximately 50%. A 2-year mortality rate of approximately 30% is expected with placebo or ineffective treatment. A strong association between higher PSPRS scores and higher probability of death has previously been demonstrated,14 and a score of >70 equated to >50% probability of mortality within 1 year.
Data from research completed to develop the PSPRS14 indicated that a higher baseline (median baseline scores >40) and a greater increase in PSPRS score during the first year (exceeding the median 1 year slope of 9.55) were correlated with a reduction in median survival of ≥1 year. The correlation between PSPRS and survival was incorporated in the simulation design, with a higher simulated baseline, or faster progression correlated with a higher risk of death.
Survival data were simulated based on the hazard function determined in the placebo arm of the riluzole trial,25 including the simulated individual PSPRS scores as covariates in the model. The simsurv package from R was used to perform the simulation, and hazard ratios (HR) for survival were used to compare survival between the treated and placebo groups (eg, HR of 2 would indicate risk of death in the treated group was twice that in the control group).29 The baseline hazard was obtained using a flexible spline model with three knots, based on the time-to-death data in the riluzole study. At each simulation replication, the simulated PSPRS baseline and 1-year individual slope were included as proportional-hazard covariates in the survival simulation model.
- HR = 1.1: Treatment increased mortality risk.
- HR = 0.8: Treatment moderately reduced mortality risk.
- HR = 0.5: Treatment greatly reduced mortality risk.
The indirect treatment effect was introduced via covariate adjustment with the individual baseline and annual slope of simulated PSPRS data.
Calculation of the CAFS
- If person A died and the comparative participant (person B) was alive, −1 was assigned to the former and 1 to the latter.
- If both participants died, the comparison would be based on time to death: the longer survival time would be assigned a score of 1 and the shorter −1; if both had the same survival time, a score of 0 would be assigned to both.
- If both participants were alive at their last visit, the changes in PSPRS score between baseline and the last common visit would be compared.
- The smaller increase would be assigned a score of 1 and the larger −1; if both had the same change in PSPRS, a score of 0 would be assigned to both.
The N−1 scores are summated, and the corresponding rank is their CAFS.
Sample Size
The simulation was based on participants with PSP-Richardson's syndrome being randomized to investigational therapies or placebo (1:1). A total of 384 participants were required to provide >80% power to detect significant treatment effects on either of the subcomponents of the CAFS. Based on the survival probabilities determined in the riluzole trial placebo arm, the simulation identified that 384 participants would provide >80% power to detect a minimum 50% reduction in relative risk of death (HR = 0.5) in the investigational therapies group compared with placebo, after 24 months of treatment. This sample size would account for an expected 20% random censoring by the end of the 2-year double-blind treatment period, as observed in the riluzole trial. Additionally, assuming a mean annual change from baseline in PSPRS score of 11.24 (standard deviation: 9.95),10 as in the davunetide and tideglusib trials,16, 24 384 participants would provide >90% power to detect ≥40% reduction (4.6-point difference) in annual disease progression, with 1% type I error (assuming up to 10% annual random dropout and ~15% annual mortality).
The difference between treatment groups in the CAFS was tested using analysis of covariance (ANCOVA), adjusted for baseline PSPRS score, with treatment group as a fixed effect. Statistical significance of the treatment effect was evaluated using the P value corresponding to the Wald-test statistic of the fixed effect of treatment.
Clinical Trial Simulations and Analysis
Effect sizes (for PSPRS and survival) were simulated; Figure 2 shows the sequence followed in the simulation study. For both variables, a null scenario (no difference between placebo and the investigational therapy) was generated, and two levels of effectiveness corresponding to minimum detectable differences in endpoints (survival HR = 0.5, and 40% reduction in PSPRS progression) and intermediate effect (survival HR = 0.8, and 20% reduction in PSPRS progression). The intermediate effect is meant to understand power of the JRT when combining moderate effects. A further effect size was generated to simulate a slightly deleterious effect of treatment on survival (HR = 1.1) and to evaluate the corresponding loss of power when using the CAFS in this scenario.
A total of 5000 participants per scenario provided stable results, and each step was run in a loop 5000 times. At each of the 5000 replications, a new CT sample was drawn from the IRT model-generated PSPRS “population data” comprising 192 placebo participants and 192 participants treated with investigational therapies. At each new replication of the 24-month trial, a new bootstrap sample was drawn for each effect size, with replacement. Based on the randomly drawn sample of PSPRS data, survival data were generated. An interim dataset was created after all participants completed the 12-month treatment period to evaluate performance of the preplanned interim analysis. Each virtual trial was analyzed considering both interim and final simulated data, according to the stopping rules, and alpha spending strategy (Fig. 3).
To compare operating characteristics of the CAFS with the univariate analyses of its subcomponents, we used a linear mixed-effects model to analyze treatment effects on the PSPRS total score, and a Cox proportional hazards model was used to analyze treatment effects on overall survival.
Results
The Item Response Theory Model of Longitudinal PSPRS Data
The davunetide CT did not show statistically significant differences between treatment arms for any clinical endpoints; therefore, IRT analysis was performed in the complete sample without compromising the validity of the results. The predicted versus observed PSPRS total scores comparing PSPRS disease progression with IRT modeling are presented in Supporting Information Figure S1. The IRT model and the derived final parameters were used to simulate the PSPRS total score for 384 virtual participants.
Clinical Trial Simulation
The proposed trial design (Supporting Information Fig. S2) included an interim analysis after the last participant reached 12 months post-randomization, to allow early stopping for efficacy. A futility analysis was not planned because all patients would be enrolled and treated for a long time. Continuation to the final analysis would be of interest to provide definitive evidence of efficacy and comprehensive safety data. Assuming 384 participants, a recruitment rate of 0.25 participant per site per year (Supporting Information Methods) and 180 sites open by month 10 (~90 sites by month 6), a total duration of ~38 months is expected (10% random annual dropout), meaning that interim analysis could be projected at month 26 (~12 months after 14 months of recruitment) and 1 year earlier than the completion date. According to this strategy, success could be declared at the interim analysis when the P value of the JRT (ANCOVA) was <0.025. If success was declared at the interim, the trial was not “analyzed” at month 24. Final analyses would be completed only for trials where the JRT P value for treatment effects was ≥0.025 at the interim.
Operating characteristics of the JRT in the final trial design across conditions indicate type I error was well controlled by the Bonferroni correction, keeping the familywise type I error rate <5% (Table 1). The trial design is characterized by high power at the interim analysis, given PSPRS effects of 40% are present irrespective of survival effects. Improved survival rates in the active arm increased power significantly (>90%) and were particularly valuable when the effect of treatment on the PSPRS was moderate (20%). Reduced survival rates in the active arm, even when mild, led to reduced power, reflecting the protective properties of the JRT, indicating comprehensive evaluation of treatment effect.
HRa | PSPRS ES (%) | Success at Interim (%) | Success at Final (%)b | Powerc |
---|---|---|---|---|
0.5 | Null | 30.1 | 23.24 | 0.53 |
0.5 | 20 | 89.8 | 6.56 | 0.96 |
0.5 | 40 | 99.84 | 0.16 | 1.00 |
0.8 | Null | 8.48 | 3.36 | 0.12 |
0.8 | 20 | 63.18 | 6.56 | 0.69 |
0.8 | 40 | 96.6 | 1.22 | 0.97 |
1.0 | Null | 3.4 | 1.02 | 0.04d |
1.0 | 20 | 40.56 | 2.78 | 0.43 |
1.0 | 40 | 89.02 | 1.24 | 0.90 |
1.1 | Null | 2.64 | 1.62 | 0.04 |
1.1 | 20 | 30.06 | 1.26 | 0.31 |
1.1 | 40 | 82.64 | 1.08 | 0.84 |
- Success is determined by P < 0.025.
- a True direct effect simulated on survival.
- b Success at final analysis shows the percentage of trials with a negative result at interim analysis and positive result at final analysis.
- c Power reflects the proportion of trials that are either successful at the interim or at final analysis, simulating a real trial; thus, trials successful at the interim are not reanalyzed at the final analysis.
- d Two-sided type I error.
- Abbreviations: ANCOVA, analysis of covariance; PSPRS, Progressive Supranuclear Palsy Rating Scale; HR, hazard ratio; ES, effect size.
The current simulation design showed that a small number of trials would be expected to be negative at the interim analysis, but positive at the final analysis. This number, however, was much larger when the PSPRS effects were null, and effects on survival were large (HR = 0.5). This pattern of results was partly due to the properties of the JRT, but largely due to the assumptions of the simulation design. The final analysis is expected to provide additional value when assumptions fail, eg, delayed treatment effects, slower rate of death, faster recruitment, and/or larger variability at interim analysis.
Table 2 shows operating characteristics of the univariate analyses of the PSPRS for total score, as a function of the outcome in the primary analysis of the JRT. For each clinical scenario defined by survival HR and effect size in PSPRS, probability of each potential outcome was defined, with higher probability resulting in the most likely scenario. Although the rank score cannot be interpreted by itself, descriptive statistics of the CAFS for the 10-item and total PSPRS scores across simulation conditions are available in Supporting Information Tables S2 and S3.
HRa | PSPRS ES (%) | Result of JRT | Interim %-sig PSPRS 12 M Effect | Interim %-sig PSPRS 24 M Effect | Final %-sig PSPRS 12 M Effect | Final %-sig PSPRS 24 M Effect | Probability of JRT Result (%) |
---|---|---|---|---|---|---|---|
0.5 | 20 | NSb | 0.3 | 0.5 | 0.3 | 0.6 | 3.6 |
0.5 | 20 | Significant interim | 58.0 | 19.3 | 59.8 | 67.4 | 89.8 |
0.5 | 20 | Significant finalc | 0.8 | 0.4 | 0.9 | 2.0 | 6.6 |
0.5 | 40 | NSb | 0 | ||||
0.5 | 40 | Significant interim | 97.9 | 45.8 | 98.5 | 99.6 | 99.8 |
0.5 | 40 | Significant final | 0.1 | 0.0 | 0.1 | 0.1 | 0.2 |
0.8 | 20 | NS | 8.7 | 3.6 | 9.6 | 11.6 | 30.3 |
0.8 | 20 | Significant interim | 47.6 | 14.3 | 49.4 | 50.9 | 63.2 |
0.8 | 20 | Significant final | 2.8 | 0.8 | 3.1 | 4.3 | 6.6 |
0.8 | 40 | NS | 1.8 | 0.4 | 1.9 | 2.0 | 2.2 |
0.8 | 40 | Significant interim | 94.7 | 41.6 | 95.8 | 96.3 | 96.6 |
0.8 | 40 | Significant final | 1.0 | 0.1 | 1.0 | 1.2 | 1.2 |
1.0 | 20 | NS | 24.1 | 7.3 | 26.2 | 28.1 | 56.7 |
1.0 | 20 | Significant interim | 33.8 | 9.7 | 35.0 | 35.1 | 40.6 |
1.0 | 20 | Significant final | 1.3 | 0.3 | 1.5 | 2.1 | 2.8 |
1.0 | 40 | NS | 8.6 | 2.1 | 9.0 | 9.4 | 9.7 |
1.0 | 40 | Significant interim | 86.9 | 37.0 | 88.6 | 88.8 | 89.0 |
1.0 | 40 | Significant final | 1.1 | 0.3 | 1.1 | 1.2 | 1.2 |
- Significance of PSPRS evaluated at P < 0.05 because no multiple testing correction will be applied to supplementary analysis. All calculations assume that once the interim is significant, the trial stops (and therefore there is no final analysis). A negligible amount of models between 0% and 1.6% failed to converge at the interim analysis because of missing follow-up data.
- a True direct effect simulated on survival.
- b Not significant either at the interim or the final analyses.
- c Significant JRT at final analysis when the interim JRT was not significant.
- Abbreviations: PSPRS, Progressive Supranuclear Palsy Rating Scale; HR, hazard ratio; ES, effect size; JRT, Joint Rank Test; sig, significant; M, month; %-sig PSPRS 12 M effect, percentage of significance for the Progressive Supranuclear Palsy Rating Scale at 12 months; %-sig PSPRS 24 M effect, percentage of significance for the Progressive Supranuclear Palsy Rating Scale at 24 months; NS, not significant.
Overall, power of the JRT was largely driven by effects on the PSPRS; thus, a significant JRT was commonly accompanied by a significant effect on the univariate PSPRS analysis. When treatment effects on the PSPRS are moderate (ie, 20% reduction in the annual progression), the linear mixed-effects model is always underpowered (Table 2). In contrast, the JRT allows the detection of treatment effects that are moderate in the PSPRS, if accompanied by moderate (HR = 0.8) or strong (HR = 0.5) treatment effects on survival, with powers of 0.69 and 0.96, respectively (Table 1). Greater discrepancies between results of the JRT and of the PSPRS change alone appeared when significant JRT was driven by treatment effects on survival, accompanied by moderate treatment effect (20%) on the PSPRS. In the absence of treatment effect on the PSPRS, when direct effects on survival were strong (HR = 0.5), many trials (30% to 40%) result in a nonsignificant JRT (Table 3). However, the Cox proportional hazards analysis of survival would be statistically significant at the end of month 24 in >80% of trials. Although this simulated scenario is artificial and unlikely, these findings highlight that the operating characteristics of the JRT are largely driven by effects on function (PSPRS), and survival is taken into consideration but is not driving the trial outcome.
HRa | PSPRS ES (%) | Result of JRT (Total) | % Survival Significant (P < 0.05) Interim | % Survival Significant (P < 0.05) Final | Probability of JRT Outcome (%) | Power Survival Interim | Power Survival Final |
---|---|---|---|---|---|---|---|
0.5 | Null | NS | 27.7 | 38.9 | 47 | 0.59 | 0.83 |
0.5 | Null | Significant interim | 29 | 29.8 | 30 | 0.96 | 0.99 |
0.5 | Null | Significant final | 20.6 | 23.2 | 23 | 0.88 | 1 |
0.5 | 20 | NSb | 1.5 | 2.5 | 4 | 0.4 | 0.7 |
0.5 | 20 | Significant interim | 83.1 | 88.5 | 90 | 0.92 | 0.99 |
0.5 | 20 | Significant finalc | 4.4 | 6.4 | 7 | 0.66 | 0.98 |
0.5 | 40 | Significant interim | 94.9 | 99.2 | 100 | 0.95 | 0.99 |
0.5 | 40 | Significant final | 0.1 | 0.1 | 0 | 0.38 | 0.88 |
0.8 | Null | NS | 9.3 | 13.1 | 88 | 0.11 | 0.15 |
0.8 | Null | Significant interim | 5.3 | 5.8 | 8 | 0.62 | 0.69 |
0.8 | Null | Significant final | 1.5 | 2.8 | 3 | 0.44 | 0.85 |
0.8 | 20 | NS | 1.7 | 3.2 | 30 | 0.06 | 0.11 |
0.8 | 20 | Significant interim | 29.1 | 38.1 | 63 | 0.46 | 0.6 |
0.8 | 20 | Significant final | 1.3 | 4 | 7 | 0.2 | 0.61 |
0.8 | 40 | NS | 0 | 0.1 | 2 | 0 | 0.03 |
0.8 | 40 | Significant interim | 47.6 | 65.2 | 97 | 0.49 | 0.67 |
0.8 | 40 | Significant final | 0.1 | 0.3 | 1 | 0.07 | 0.26 |
- All calculations assume that once the interim is significant, the trial stops, and therefore no final analysis is performed.
- a True direct effect simulated on survival.
- b Not significant either at the interim or the final analyses.
- c Significant at final analysis when the interim was not significant.
- Abbreviations: HR, hazard ratio; PSPRS, Progressive Supranuclear Palsy Rating Scale; ES, effect size; JRT, Joint Rank Test; NS, not significant.
Discussion
Clinical scores are key to assessing disease progression in neurodegenerative conditions, because they represent outcomes that cannot be assessed directly. Based on the simulated results, a phase 3 trial design has been recommended, with 384 participants with PSP-Richardson's syndrome, randomized 1:1, over a treatment duration of 12 to 24 months. A trial design was proposed based on the CAFS and included an interim analysis after all participants completed 12 months of treatment. The sample size determined would provide >80% power to detect significant treatment effects on either of the two subcomponents of the CAFS (change from baseline in PSPRS and time to death).
In this new phase 3 trial design concept, two major design changes from previous studies in PSP11-13 were suggested to optimize decision making. First, the option to increase the treatment period from 12 to 24 months. This will detect potentially delayed treatment effects and provide additional time for biological effects to be translated into clinical benefit, boosting power when combined with treatment effects on PSPRS. Second, ~20% mortality would be expected over 24 months, resulting in nonrandom missing data and resulting bias. We proposed to switch the primary endpoint from PSPRS to a combined assessment of function and survival using the CAFS. The CAFS acknowledges death as an important outcome (rather than a random event and loss of information if using PSPRS alone). This approach would also allow detection of moderate treatment effects and is currently the FDA-recommended approach to amyotrophic lateral sclerosis, including for ongoing phase 3 studies.23, 30
- The effect on PSPRS is moderate and an effect on survival is present.
- Treatment effects are delayed (eg, differences in PSPRS progression are observed only from month 6 onward).
- Participant survival is higher than predicted.
- Significant differences in PSPRS are required/desirable at 24 months, regardless of the outcomes at 12 months.
- Data collected at interim analysis have higher variability than predicted.
- Recruitment occurs at an accelerated rate, in which case, direct treatment effects on survival would be key to a significant result.
IRT modeling was useful in simulating a CT without any underlying hypothesis about the shape of the disease progression profile versus time. In addition, IRT models can be used as psychometric tools to describe performance of the PSPRS in assessing disease progression. Based on IRT modeling results, different items in the PSPRS can be ranked indicating which items are most sensitive when describing disease severity.
The design was conceived as a single pivotal study, to provide definitive evidence of efficacy and a comprehensive safety package, in the absence of any prior proof of concept/efficacy study, and assuming no parallel phase 3 study would be feasible due to the rarity of PSP. The study would provide strong evidence to support a regulatory submission in the context of high unmet need in a rare disease. Our proposal is an innovative and scientifically robust approach to provide PSP patients accelerated access to a much-needed treatment.
In general, IRT modeling results confirmed expected limitations of the PSPRS scoring approach to measure disease progression. Intermediate scores were often problematic, including many of the most sensitive items, demonstrating tendency of the investigators to consider these items as dichotomous. This reinforces the requirement to revise the answer categories, or at least scoring of those items. Revisions to the PSPRS were not an objective of this trial; however, the presented JRT approach was also tested on a smaller scale, focusing on the 10 items considered more clinically relevant by the FDA.31 Although the IRT model confirmed that the selected items were among the most informative, the 10-item scale had a negligible impact on operating characteristics compared with the 28-item scale (Supporting Information Table S4). The expected mortality rate was derived from the Natural History and riluzole trial, which is 20 years old.25 There have since been advances in medical treatment and earlier recognition of possible PSP.32 Therefore, referral to a trial would occur at a point in the disease course with lesser mortality, resulting in more prolonged post-enrollment survival than the model anticipates.
In addition, distortion in mortality rate could occur if another pandemic heavily affects the elderly or disabled, therefore affecting only part of each patient's clinical course, injecting statistical noise, and reducing the statistical power of the CAFS approach.
Switching the primary endpoint from change in the PSPRS to the CAFS mitigates against nonrandom “missingness” due to participant mortality and consequential bias. The CAFS could facilitate a comprehensive and robust estimate of the clinical benefit of potential future therapies for people living with PSP and optimize the probability of clinical success (if the drug is effective). These results provide guidance for the PSP community around design and interpretation of future CTs.
Acknowledgments
Medical writing support for the development of this manuscript, under the direction of the authors, was provided by Dolapo Odujinrin, PhD Researcher, and Sarah Hibbert, PhD, of Ashfield MedComms, an Inizio company. Medical writing support was funded by UCB in accordance with Good Publication Practice (GPP 2022) guidelines (http://www.ismpp.org/gpp-2022).
Author Roles
(1) Study: (A) Conception, (B) Organization, (C) Execution;
(2) Statistical Analysis: (A) Design, (B) Execution, (C) Review and Critique;
(3) Manuscript Preparation: (A) Writing First Draft; (B) Review and Critique.
M.G.: 3A, 3B
I.R.M.: 1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B
T.J.B.: 3A, 3B
S.D.B.: 3A, 3B
T.G.: 3A, 3B
H.L.G.V.T.: 3A, 3B
C.E.: 3A, 3B
L.I.G.: 1A, 2C, 3A, 3B
A.B.: 3A, 3B
G.H.: 1A, 2C, 3A, 3B
Open Research
Data Availability Statement
Data from non-clinical studies is outside of UCB's data sharing policy and is unavailable for sharing.