Conformal Prediction Intervals and Predictive Distributions


报告专家:秦进(美国国家过敏及传染性疾病研究院)

报告时间:2024年10月10日(星期四)下午16:00-17:00

报告地点:数学学院西202报告厅

报告摘要:Conformal prediction (CP) is a machine learning framework for uncertainty quantification that produces statistically valid prediction regions (prediction intervals) for any underlying point predictor (whether statistical, machine, or deep learning) only assuming exchangeability of the data. Consider a scenario where we possess training data inclusive of both the feature variable X and the outcome Y. Simultaneously, we have test data that only includes the feature variable X. The objective is to construct a 95% confidence interval for the outcome Y in the test data. Lawless and Fredette (2005) addressed this challenge within parametric frameworks, employing a pivotal-based approach. Their method yields prediction intervals and predictive distributions with well-calibrated frequentist probability interpretations. However, as the dimension of the feature variable grows large, modeling the conditional distribution of Y|X becomes increasingly challenging.

In this talk, we aim to extend their work by removing the parametric assumption for the predictive interval. Unfortunately, without making parametric assumptions about the conditional distribution of Y|X, obtaining an accurate estimation of conditional coverage becomes impossible. Instead, we will leverage the concept from the latest conformal inference (Vovk et al. 2005), which requires only accurate unconditional coverage. While the conformal predictive interval is inherently distribution-free, it is noteworthy that the choice of a robust working conditional model can significantly impact the resulting interval length. In essence, a well-designed conditional model contributes to the construction of shorter intervals, highlighting the practical importance of a thoughtful and effective modeling approach even in distribution-free settings.

Furthermore, we will delve into the application of conformal predictive confidence intervals in more intricate scenarios. This includes situations where there is a covariate shift between training and test data, as well as cases where the outcome Y might be right-censored.

专家简介:Dr. Jing Qin is a Mathematical Statistician at the Biostatistics Research Branch of the National Institute of Allergy and Infectious Diseases (NIAID). He earned his Ph.D. in 1992 from the University of Waterloo and subsequently became an Assistant Professor at the University of Maryland, College Park.

Before joining the National Institutes of Health (NIH) in 2004, Dr. Qin spent five years at the Memorial Sloan-Kettering Cancer Center. His research interests encompass a wide range of topics, including empirical likelihood methods, case-control studies, various-biased sampling problems, econometrics, survival analysis, missing data, causal inference, genetic mixture models, generalized linear models, survey sampling, and microarray data analysis.

Recently, Dr. Qin's work has focused on conformal inference for quantifying uncertainty in machine learning. In 2006, he was elected a Fellow of the American Statistical Association. He is also the author of a 2017 monograph titled Biased Sampling, Over-identified Parametric Problems, and Beyond (Springer, ICSA Book Series in Statistics).



10.10秦进-01.jpg