**IWSM 2021: Invited Speakers**

- Maria Durban (Professor of Statistics. University Carlos III of Madrid, Spain).
**A general framework for prediction in generalized additive mixed models**

Smoothing techniques have become one of the most popular modelling approaches in the last decades. However, out-of-sample prediction in the context of smoothing models is, somehow, still an open problem. Building on the seminal work by Currie, Durban and Eilers (2004), we present two approaches to carrying out prediction in the context of penalized regression: with low-rank basis and penalties (one-step approach), or through smooth mixed models (two-steps approach). We give further insight in the case of P-splines showing the influence of the penalty on the prediction and, in the context of mixed models, we connect the new predicted values to the observed ones through a joint distribution, which allows us to compute prediction intervals. The task becomes more challenging when smooth interaction terms are present in the models (a very common situation for example, in the case of data collected in space and/or time). Our proposal fits the data and predicts the new observations simultaneously and uses constraints to ensure a coherent fit or to impose further restrictions on the predictions. We also develop this methodology for the so-called smooth-ANOVA models which allow the interaction terms to be decomposed as a sum of several smooth functions. We illustrate these methods with several real data examples - Montserrat Fuentes (President. St. Edward’s University, USA).
**Spatial Statistical Modelling of Neuroimaging Data**

Imaging data with thousands of spatially-correlated data points are common in many fields. In Neurosciences, magnetic resonance imaging (MRI) is a primary modality for studying brain structure and activity. Modeling spatial dependence of MRI data at different scales is one of the main challenges of contemporary neuroimaging, and it could allow for accurate testing for significance in neural activity. The high dimensionality of this type of data (millions of voxels) presents modeling challenges and serious computational constraints. Methods that account for spatial correlation often require very cumbersome matrix evaluations which are prohibitive for data of this size, and thus current methods typically reduce dimensionality by modeling covariance among regions of interest – coarser or larger spatial units – rather than among voxels. However, ignoring spatial dependence at different scales could drastically reduce our ability to detect important activation patterns in the brain and hence produce misleading results. To overcome these problems, we introduce a novel Bayesian Tensor approach, treating the brain image as response and having a vector of predictors. Our method provides estimates of the parameters of interest using a generalized sparsity principle. This method is implemented using a fully Bayesian approach to characterize different sources of uncertainty. We demonstrate posterior consistency and develop a computational efficient algorithm. The effectiveness of our approach is illustrated through simulation studies and the analysis of the effects of drug addiction on the brain structure. We implement this method to identify the effects of demographic information and cocaine addiction on the functioning of the brain. - Virginie Rondeau (Directeur de Recherche, INSERM Bordeaux, France).
**The use of joint modelling to validate surrogate failure-time endpoints**

In many Biomedical areas, the identification and validation of surrogate endpoints is of prime interest to reduce the duration and/or size of clinical trials. Numerous validation methods have been proposed, the most popular is based on a two-step analysis strategy in the context of meta-analysis. For two failure time endpoints, two association measurements are usually considered, one at the individual level and one at the trial level. However, thus approach is not always available mainly due to convergence or estimation problems in clinical trials. We are presenting here different approaches based on joint frailty models and a one-step validation method with new attractive and well developed tools for the validation of failure time surrogate endpoints. Both individual- and trial-level surrogacy were evaluated using a new definition of Kendall's tau and the coefficient of determination.

We aim in this work to popularize these new surrogate endpoints validation approaches by making the methods available in a user-friendly R package. Thus, we provide in the frailtypack R package numerous tools, including more flexible functions, for the validation of candidate surrogate endpoints, using data from multiple randomized clinical trials. We have especially the surrogate threshold effect which is used in combination with R^2_{trial} to make a decision concerning the validity of the surrogate endpoints. It is also possible thanks to frailtypack to predict the treatment effect on the true endpoint in a new trial using the treatment effect observed on the surrogate endpoint. The leave-one-out cross-validation is available for the assessment of the accuracy of the prediction using the joint surrogate model. Other tools concerned data generation, studies simulation and graphic representations. We illustrate the use of the new functions with both real data and simulated data. - Stijn Vansteelandt (Professor of Statistics. Ghent University, Belgium).
**Assumption-lean Inference for Generalised Linear Model Parameters**

Inference for the parameters indexing generalised linear models is routinely based on the assumption that the model is correct and a priori specified. This is unsatisfactory because the chosen model is usually the result of a data-adaptive model selection process, which induces bias and excess uncertainty that is not usually acknowledged; moreover, the assumptions encoded in the resulting model rarely represent some a priori known, ground truth. Standard inferences may therefore lead to bias in effect estimates, and may moreover fail to give a pure reflection of the information that is contained in the data. Inspired by developments on assumption-free inference for so-called projection parameters, we here propose nonparametric definitions of main effect estimands and effect modification estimands. These reduce to standard main effect and effect modification parameters in generalised linear models when these models are correctly specified, but continue to capture the primary (conditional) association between two variables, or the degree to which two variables interact (in a statistical sense) in their effect on outcome, even when these models are misspecified. We achieve an assumption-lean inference for these estimands by deriving their influence curve under the nonparametric model and invoking flexible data-adaptive (e.g., machine learning) procedures. This talk aims to be broadly accessible, focussing on concepts more than technicalities.This is joint work with Oliver Dukes, Ghent University.