Novartis Pharmaceuticals Corporation
When data are collected adaptively, such as in bandit algorithms, classical statistical approaches such as ordinary least squares and MM-estimation will often fail to achieve asymptotic normality. Although recent lines of work have modified the classical approaches to ensure valid inference on adaptively collected data, most of these works assume that the model is correctly specified. We propose a method that provides valid inference for M-estimators that use adaptively collected bandit data with a (possibly) misspecified working model. A key ingredient in our approach is the use of flexible machine learning approaches to stabilize the variance induced by adaptive data collection. A major novelty is that our procedure enables the construction of valid confidence sets even in settings where treatment policies are unstable and non-converging, such as when there is no unique optimal arm and standard bandit algorithms are used. Empirical results on semi-synthetic datasets constructed from the Osteoarthritis Initiative demonstrate that the method maintains type I error control, while existing methods for inference in adaptive settings do not cover in the misspecified case.
Recurrent events are common and important clinical trial endpoints in many disease areas, e.g., cardiovascular hospitalizations in heart failure, relapses in multiple sclerosis, or exacerbations in asthma. During a trial, patients may experience intercurrent events, that is, events after treatment assignment which affect the interpretation or existence of the outcome of interest. In many settings, a treatment effect in the scenario in which the intercurrent event would not occur is of clinical interest. A proper estimation method of such a hypothetical treatment effect has to account for all confounders of the recurrent event process and the intercurrent event. In this paper, we propose estimators targeting hypothetical estimands in recurrent events with proper adjustments of baseline and internal time-varying covariates. Specifically, we apply inverse probability weighting (IPW) to the commonly used Lin-Wei-Yang-Ying (LWYY) and negative binomial (NB) models in recurrent event analysis. Simulation studies demonstrate that our approach outperforms alternative analytical methods in terms of bias and power.
This paper reviews and compares methods to assess treatment effect heterogeneity in the context of parametric regression models. These methods include the standard likelihood ratio tests, bootstrap likelihood ratio tests, and Goeman's global test motivated by testing whether the random effect variance is zero. We place particular emphasis on tests based on the score-residual of the treatment effect and explore different variants of tests in this class. All approaches are compared in a simulation study, and the approach based on residual scores is illustrated in a study comparing multiple doses versus placebo. Our findings demonstrate that score-residual based methods provide practical, flexible and reliable tools for identifying treatment effect heterogeneity and treatment effect modifiers, and can provide useful guidance for decision making around treatment effect heterogeneity.
This paper proposes a Workflow for Assessing Treatment effeCt Heterogeneity (WATCH) in clinical drug development targeted at clinical trial sponsors. WATCH is designed to address the challenges of investigating treatment effect heterogeneity (TEH) in randomized clinical trials, where sample size and multiplicity limit the reliability of findings. The proposed workflow includes four steps: Analysis Planning, Initial Data Analysis and Analysis Dataset Creation, TEH Exploration, and Multidisciplinary Assessment. The workflow offers a general overview of how treatment effects vary by baseline covariates in the observed data, and guides interpretation of the observed findings based on external evidence and best scientific understanding. The workflow is exploratory and not inferential/confirmatory in nature, but should be pre-planned before data-base lock and analysis start. It is focused on providing a general overview rather than a single specific finding or subgroup with differential effect.
In clinical trials, patients may discontinue treatments prematurely, breaking the initial randomization and, thus, challenging inference. Stakeholders in drug development are generally interested in going beyond the Intention-To-Treat (ITT) analysis, which provides valid causal estimates of the effect of treatment assignment but does not inform on the effect of the actual treatment receipt. Our study is motivated by an RCT in oncology, where patients assigned the investigational treatment may discontinue it due to adverse events. We propose adopting a principal stratum strategy and decomposing the overall ITT effect into principal causal effects for groups of patients defined by their potential discontinuation behavior. We first show how to implement a principal stratum strategy to assess causal effects on a survival outcome in the presence of continuous time treatment discontinuation, its advantages, and the conclusions one can draw. Our strategy deals with the time-to-event intermediate variable that may not be defined for patients who would not discontinue; moreover, discontinuation time and the primary endpoint are subject to censoring. We employ a flexible model-based Bayesian approach to tackle these complexities, providing easily interpretable results. We apply this Bayesian principal stratification framework to analyze synthetic data of the motivating oncology trial. We simulate data under different assumptions that reflect real scenarios where patients' behavior depends on critical baseline covariates. Supported by a simulation study, we shed light on the role of covariates in this framework: beyond making structural and parametric assumptions more credible, they lead to more precise inference and can be used to characterize patients' discontinuation behavior, which could help inform clinical practice and future protocols.
Tables, figures, and listings (TFLs) are essential tools for summarizing clinical trial data. Creation of TFLs for reporting activities is often a time-consuming task encountered routinely during the execution of clinical trials. This study explored the use of large language models (LLMs) to automate the generation of TFLs through prompt engineering and few-shot transfer learning. Using public clinical trial data in ADaM format, our results demonstrated that LLMs can efficiently generate TFLs with prompt instructions, showcasing their potential in this domain. Furthermore, we developed a conservational agent named Clinical Trial TFL Generation Agent: An app that matches user queries to predefined prompts that produce customized programs to generate specific predefined TFLs.
In the era of big data, standard analysis tools may be inadequate for making inference and there is a growing need for more efficient and innovative ways to collect, process, analyze and interpret the massive and complex data. We provide an overview of challenges in big data problems and describe how innovative analytical methods, machine learning tools and metaheuristics can tackle general healthcare problems with a focus on the current pandemic. In particular, we give applications of modern digital technology, statistical methods, data platforms and data integration systems to improve diagnosis and treatment of diseases in clinical research and novel epidemiologic tools to tackle infection source problems, such as finding Patient Zero in the spread of epidemics. We make the case that analyzing and interpreting big data is a very challenging task that requires a multi-disciplinary effort to continuously create more effective methodologies and powerful tools to transfer data information into knowledge that enables informed decision making.
2
TorchSurv is a Python package that serves as a companion tool to perform deep survival modeling within the PyTorch environment. Unlike existing libraries that impose specific parametric forms, TorchSurv enables the use of custom PyTorch-based deep survival models. With its lightweight design, minimal input requirements, full PyTorch backend, and freedom from restrictive survival model parameterizations, TorchSurv facilitates efficient deep survival model implementation and is particularly beneficial for high-dimensional and complex input data scenarios.
There has been a growing trend that activities relating to clinical trials take place at locations other than traditional trial sites (hence decentralized clinical trials or DCTs), some of which are at settings of real-world clinical practice. Although there are numerous benefits of DCTs, this also brings some implications on a number of issues relating to the design, conduct, and analysis of DCTs. The Real-World Evidence Scientific Working Group of the American Statistical Association Biopharmaceutical Section has been reviewing the field of DCTs and provides in this paper considerations for decentralized trials from a statistical perspective. This paper first discusses selected critical decentralized elements that may have statistical implications on the trial and then summarizes regulatory guidance, framework, and initiatives on DCTs. More discussions are presented by focusing on the design (including construction of estimand), implementation, statistical analysis plan (including missing data handling), and reporting of safety events. Some additional considerations (e.g., ethical considerations, technology infrastructure, study oversight, data security and privacy, and regulatory compliance) are also briefly discussed. This paper is intended to provide statistical considerations for decentralized trials of medical products to support regulatory decision-making.
Dose selection is critical in pharmaceutical drug development, as it directly impacts therapeutic efficacy and patient safety of a drug. The Generalized Multiple Comparison Procedures and Modeling (MCP-Mod) approach is commonly used in Phase II trials for testing and estimation of dose-response relationships. However, its effectiveness in small sample sizes, particularly with binary endpoints, is hindered by issues like complete separation in logistic regression, leading to non-existence of estimates. Motivated by an actual clinical trial using the MCP-Mod approach, this paper introduces penalized maximum likelihood estimation (MLE) and randomization-based inference techniques to address these challenges. Randomization-based inference allows for exact finite sample inference, while population-based inference for MCP-Mod typically relies on asymptotic approximations. Simulation studies demonstrate that randomization-based tests can enhance statistical power in small to medium-sized samples while maintaining control over type-I error rates, even in the presence of time trends. Our results show that residual-based randomization tests using penalized MLEs not only improve computational efficiency but also outperform standard randomization-based methods, making them an adequate choice for dose-finding analyses within the MCP-Mod framework. Additionally, we apply these methods to pharmacometric settings, demonstrating their effectiveness in such scenarios. The results in this paper underscore the potential of randomization-based inference for the analysis of dose-finding trials, particularly in small sample contexts.
Treatment of cancer has rapidly evolved over time in quite dramatic ways, for example from chemotherapies, targeted therapies to immunotherapies and chimeric antigen receptor T-cells. Nonetheless, the basic design of early phase I trials in oncology still follows pre-dominantly a dose-escalation design. These trials monitor safety over the first treatment cycle in order to escalate the dose of the investigated drug. However, over time studying additional factors such as drug combinations and/or variation in the timing of dosing became important as well. Existing designs were continuously enhanced and expanded to account for increased trial complexity. With toxicities occurring at later stages beyond the first cycle and the need to treat patients over multiple cycles, the focus on the first treatment cycle only is becoming a limitation in nowadays multi-cycle treatment therapies. Here we introduce a multi-cycle time-to-event model (TITE-CLRM: Time-Interval-To-Event Complementary-Loglog Regression Model) allowing guidance of dose-escalation trials studying multi-cycle therapies. The challenge lies in balancing the need to monitor safety of longer treatment periods with the need to continuously enroll patients safely. The proposed multi-cycle time to event model is formulated as an extension to established concepts like the escalation with over dose control principle. The model is motivated from a current drug development project and evaluated in a simulation study.
Assessing treatment effect heterogeneity (TEH) in clinical trials is crucial, as it provides insights into the variability of treatment responses among patients, influencing important decisions related to drug development. Furthermore, it can lead to personalized medicine by tailoring treatments to individual patient characteristics. This paper introduces novel methodologies for assessing treatment effects using the individual treatment effect as a basis. To estimate this effect, we use a Double Robust (DR) learner to infer a pseudo-outcome that reflects the causal contrast. This pseudo-outcome is then used to perform three objectives: (1) a global test for heterogeneity, (2) ranking covariates based on their influence on effect modification, and (3) providing estimates of the individualized treatment effect. We compare our DR-learner with various alternatives and competing methods in a simulation study, and also use it to assess heterogeneity in a pooled analysis of five Phase III trials in psoriatic arthritis. By integrating these methods with the recently proposed WATCH workflow (Workflow to Assess Treatment Effect Heterogeneity in Drug Development for Clinical Trial Sponsors), we provide a robust framework for analyzing TEH, offering insights that enable more informed decision-making in this challenging area.
Adaptive sample size re-estimation (SSR) is a well-established strategy for improving the efficiency and flexibility of clinical trials. Its central challenge is determining whether, and by how much, to increase the sample size at an interim analysis. This decision requires a rational framework for balancing the potential gain in statistical power against the risk and cost of further investment. Prevailing optimization approaches, such as the Jennison and Turnbull (JT) method, address this by maximizing power for a fixed cost per additional participant. While statistically efficient, this paradigm assumes the cost of enrolling another patient is constant, regardless of whether the interim evidence is promising or weak. This can lead to impractical recommendations and inefficient resource allocation, particularly in weak-signal scenarios. We reframe SSR as a decision problem under dynamic costs, where the effective cost of additional enrollment reflects the interim strength of evidence. Within this framework, we derive two novel rules: (i) a likelihood-ratio based rule, shown to be Pareto optimal in achieving smaller average sample size under the null without loss of power under the alternative; and (ii) a return-on-investment (ROI) rule that directly incorporates economic considerations by linking SSR decisions to expected net benefit. To unify existing methods, we further establish a representation theorem demonstrating that a broad class of SSR rules can be expressed through implicit dynamic cost functions, providing a common analytical foundation for their comparison. Simulation studies calibrated to Phase III trial settings confirm that dynamic-cost approaches improve resource allocation relative to fixed-cost methods.
There are several steps to confirming the safety and efficacy of a new medicine. A sequence of trials, each with its own objectives, is usually required. Quantitative risk metrics can be useful for informing decisions about whether a medicine should transition from one stage of development to the next. To obtain an estimate of the probability of regulatory approval, pharmaceutical companies may start with industry-wide success rates and then apply to these subjective adjustments to reflect program-specific information. However, this approach lacks transparency and fails to make full use of data from previous clinical trials. We describe a quantitative Bayesian approach for calculating the probability of success (PoS) at the end of phase II which incorporates internal clinical data from one or more phase IIb studies, industry-wide success rates, and expert opinion or external data if needed. Using an example, we illustrate how PoS can be calculated accounting for differences between the phase IIb data and future phase III trials, and discuss how the methods can be extended to accommodate accelerated drug development pathways.
Pharmaceutical companies regularly need to make decisions about drug development programs based on the limited knowledge from early stage clinical trials. In this situation, eliciting the judgements of experts is an attractive approach for synthesising evidence on the unknown quantities of interest. When calculating the probability of success for a drug development program, multiple quantities of interest - such as the effect of a drug on different endpoints - should not be treated as unrelated. We discuss two approaches for establishing a multivariate distribution for several related quantities within the SHeffield ELicitation Framework (SHELF). The first approach elicits experts' judgements about a quantity of interest conditional on knowledge about another one. For the second approach, we first elicit marginal distributions for each quantity of interest. Then, for each pair of quantities, we elicit the concordance probability that both lie on the same side of their respective elicited medians. This allows us to specify a copula to obtain the joint distribution of the quantities of interest. We show how these approaches were used in an elicitation workshop that was performed to assess the probability of success of the registrational program of an asthma drug. The judgements of the experts, which were obtained prior to completion of the pivotal studies, were well aligned with the final trial results.
Participants in clinical trials are often viewed as a unique, finite population. Yet, statistical analyses often assume that participants were randomly sampled from a larger population. Under Complete Randomization, Randomization-Based Inference (RBI; a finite population inference) and Analysis of Variance (ANOVA; a random sampling inference) provide asymptotically equivalent difference-in-means tests. However, sequentially-enrolling trials typically employ restricted randomization schemes, such as block or Maximum Tolerable Imbalance (MTI) designs, to reduce the chance of chronological treatment imbalances. The impact of these restrictions on RBI and ANOVA concordance is not well understood. With real-world frames of reference, such as rare and ultra-rare diseases, we review full versus random sampling of finite populations and empirically evaluate finite population Type I error when using ANOVA following randomization restrictions. Randomization restrictions strongly impacted ANOVA Type I error, even for trials with 1,000 participants. Properly adjusting for restrictions corrected Type I error. We corrected for block randomization, yet leave open how to correct for MTI designs. More directly, RBI accounts for randomization restrictions while ensuring correct finite population Type I error. Novel contributions are: 1) deepening the understanding and correction of RBI and ANOVA concordance under block and MTI restrictions and 2) using finite populations to estimate the convergence of Type I error to a nominal rate. We discuss the challenge of specifying an estimand's population and reconciling with sampled trial participants.
Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail in unexpected ways or go entirely unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication between these two groups. We find that, while the two groups share common goals of understanding the data and predictions of the model, friction can stem from unfamiliar terms, metrics, and visualizations - limiting the transfer of knowledge to SMEs and discouraging clarifying questions being asked during presentations. Based on our findings, we derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model. We provide a demonstration of our guidelines in a regression modeling scenario and elicit feedback on their use from subject matter experts. From our demonstration, subject matter experts were more comfortable discussing a model's performance, more aware of the trade-offs for the presented model, and better equipped to assess the model's risks - ultimately informing and contextualizing the model's use beyond text and numbers.
Randomized discontinuation design (RDD) is an enrichment strategy commonly used to address limitations of traditional placebo-controlled trials, particularly the ethical concern of prolonged placebo exposure. RDD consists of two phases: an initial open-label phase in which all eligible patients receive the investigational medicinal product (IMP), followed by a double-blind phase in which responders are randomized to continue with the IMP or switch to placebo. This design tests whether the IMP provides benefit beyond the placebo effect. The estimand framework introduced in ICH E9(R1) strengthens the dialogue among clinical research stakeholders by clarifying trial objectives and aligning them with appropriate statistical analyses. However, its application in oncology trials using RDD remains unclear. This manuscript uses the phase III JAVELIN Gastric 100 trial and the phase II trial of sorafenib (BAY 43-9006) as case studies to propose an estimand framework tailored for oncology trials employing RDD in phase III and phase II settings, respectively. We highlight some similarities and differences between RDDs and traditional randomized controlled trials in the context of ICH E9(R1). This approach aims to support more efficient regulatory decision-making.
Indolent cancers are characterized by long overall survival (OS) times. Therefore, powering a clinical trial to provide definitive assessment of the effects of an experimental intervention on OS in a reasonable timeframe is generally infeasible. Instead, the primary outcome in many pivotal trials is an intermediate clinical response such as progression-free survival (PFS). In several recently reported pivotal trials of interventions for indolent cancers that yielded promising results on an intermediate outcome, however, more mature data or post-approval trials showed concerning OS trends. These problematic results have prompted a keen interest in quantitative approaches for monitoring OS that can support regulatory decision-making related to the risk of an unacceptably large detrimental effect on OS. For example, the US Food and Drug Administration, the American Association for Cancer Research, and the American Statistical Association recently organized a one-day multi-stakeholder workshop entitled 'Overall Survival in Oncology Clinical Trials'. In this paper, we propose OS monitoring guidelines tailored for the setting of indolent cancers. Our pragmatic approach is modeled, in part, on the monitoring guidelines the FDA has used in cardiovascular safety trials conducted in Type 2 Diabetes Mellitus. We illustrate proposals through application to several examples informed by actual case studies.
Shape constraints yield flexible middle grounds between fully nonparametric and fully parametric approaches to modeling distributions of data. The specific assumption of log-concavity is motivated by applications across economics, survival modeling, and reliability theory. However, there do not currently exist valid tests for whether the underlying density of given data is log-concave. The recent universal inference methodology provides a valid test. The universal test relies on maximum likelihood estimation (MLE), and efficient methods already exist for finding the log-concave MLE. This yields the first test of log-concavity that is provably valid in finite samples in any dimension, for which we also establish asymptotic consistency results. Empirically, we find that a random projections approach that converts the d-dimensional testing problem into many one-dimensional problems can yield high power, leading to a simple procedure that is statistically and computationally efficient.
There are no more papers matching your filters at the moment.