Concordance Measure-based Feature Screening and Variable Selection*

[Math. Dept.]

April 25, 2018  16:00-17:00

E409  School of Mathematics

[colloquium] Huazhen Lin20180425-01.png

SPEAKER

Huazhen Lin (Southwestern University of Finance and Economics)

ABSTRACT

The $C$-statistic, measuring the rank concordance between predictors and outcomes, has become a standard metric of predictive accuracy and is therefore a natural criterion for variable screening and selection. However, as the $C$-statistic is a step function, its optimization requires brute-force search, prohibiting its direct usage in the presence of high-dimensional predictors. We develop a smoothed version of the $C$-statistic to facilitate variable screening and selection. Specifically, we propose a smoothed $C$-statistic sure screening (C-SS) method for screening ultrahigh-dimensional data, and a penalized $C$-statistic (PSC) variable selection method for regularized modeling based on the screening results. We have shown that these two coherent procedures form an integrated framework for screening and variable selection: the C-SS possesses the sure screening property, and the PSC possesses the oracle property. Specifically, the PSC achieves the oracle property if $m_n = o(n^{1/4})$, where $m_n$ is the cardinality of the set of predictors captured by the C-SS. Our extensive simulations reveal that, compared to existing procedures, our proposal is more robust and efficient. Our procedure has been applied to analyze a multiple myeloma study, and has identified several novel genes that can predict patients response to treatment.

This is a joint work with Yunbei MA, Yi Li and Yi Li.

SUPPORTED BY

School of Mathematics, Sichuan University