Type something to search...

Keynote Lectures

We are delighted to announce that the esteemed speakers have graciously accepted our invitation to deliver keynote speeches at the Applied Statistics 2026 conference.

Parametric sparse models for distributional data

Prof. Paula Brito, Ph.D.
Professor of Statistics
University of Porto, Porto, Portugal
Homepage

Abstract [Monday, September 21, 9.00–10.00]

In classical multivariate statistics and machine learning, data are typically organized in a tabular format, where each row corresponds to an individual unit and each column records a single value for a given variable. However, this representation becomes inadequate when the data inherently involve variability. This situation arises when the units of analysis are not individual entities but rather abstract concepts—such as diseases instead of specific patients—or groups formed on the basis of shared characteristics. In such cases, for each descriptive variable, the variability observed within each concept or group should be taken into account, rather than relying solely on central tendencies (e.g., means, medians, or modes), in order to preserve potentially relevant information. Symbolic Data Analysis offers a framework for representing and analyzing such complex data, enabling aggregation at various levels of detail while retaining the associated variability. New types of variables have been introduced, where observations take the form of sets, intervals, or distributions over a given domain.

In this work, we focus on numerical data described by empirical distributions. We propose parametric models based on representing each distribution using a central statistic and the logarithms of inter-quantile ranges for a selected set of quantiles. Multivariate normal distributions are assumed for the full set of indicators, considering alternative sparse structures for the covariance matrix. Interval-valued data is a particular case within this framework. The proposed model enables multivariate parametric analysis of distributional data, including analysis of variance, discriminant analysis, and model-based clustering. Applications to real-world data illustrate the relevance and usefulness of the approach

Towards evidence-based guidance on variable selection methods for multivariable regression models

Prof. Georg Heinze, Ph.D.
Professor of Biostatistics
Medical University of Vienna, Vienna, Austria
Homepage

Abstract [Tuesday, September 22, 9.00–10.00]

Multivariable regression modelling has a central role in empirical research, and it is used to answer descriptive, predictive or explanatory research questions. Often data-driven variable selection methods are used to identify relevant and irrelevant variables, but they may also lead to false omission of relevant covariates, inclusion of irrelevant variables, biased coefficient estimates, poorly calibrated predictions, and unstable models. Alternatively, outcome-ignorant screening of variables based on results of Initial Data Analysis can often reduce the number of predictors without compromising model stability (Heinze et al., 2024). Although some methodological recommendations exist, only limited evidence is available about the relative and absolute performances of these methods (Sauerbrei et al., 2020). The aim of the STRATOS initiative is to give evidence-based guidance on the design and analysis of observational studies (https://www.stratos-initiative.org/).

I will present some recent activities of STRATOS’ topic group on selection of variables and functional forms for multivariable models (see also https://stratostg2.github.io). First, I will clarify the role of data-driven variable selection in different types of research questions. Next, I will discuss a principled approach to data screening as an invaluable preliminary step in model building. Third, I will report on a systematic methodological review of the practice of variable and functional form selection in COVID-19 prognosis models, which revealed a huge gap between the state-of-the-art and analysis practice. Fourth, I will report on our simulation studies to evaluate competing methods neutrally, and provide interactive access to their results (Ullmann et al., 2024). Lastly, I will provide an outlook to our activities on evaluating methods that allow simultaneous variable and functional form selection.

Depending on the desired purpose of data-driven variable selection, these results allow us to conclude under which conditions specific methods may be applicable and where they should better not be used. In summary, data-driven selection can only complement, but not replace substantive knowledge–driven selection.