Concordance Measure-based Feature Screening and Variable Selection*
[Math. Dept.]
April 25, 2018 16:00-17:00
E409 School of Mathematics
SPEAKER
Huazhen Lin (Southwestern University of Finance and Economics)
ABSTRACT
The $C$-statistic, measuring the rank concordance between predictors and outcomes, has become a standard metric of predictive accuracy and is therefore a natural criterion for variable screening and selection. However, as the $C$-statistic is a step function, its optimization requires brute-force search, prohibiting its direct usage in the presence of high-dimensional predictors. We develop a smoothed version of the $C$-statistic to facilitate variable screening and selection. Specifically, we propose a smoothed $C$-statistic sure screening (C-SS) method for screening ultrahigh-dimensional data, and a penalized $C$-statistic (PSC) variable selection method for regularized modeling based on the screening results. We have shown that these two coherent procedures form an integrated framework for screening and variable selection: the C-SS possesses the sure screening property, and the PSC possesses the oracle property. Specifically, the PSC achieves the oracle property if $m_n = o(n^{1/4})$, where $m_n$ is the cardinality of the set of predictors captured by the C-SS. Our extensive simulations reveal that, compared to existing procedures, our proposal is more robust and efficient. Our procedure has been applied to analyze a multiple myeloma study, and has identified several novel genes that can predict patients response to treatment.
This is a joint work with Yunbei MA, Yi Li and Yi Li.
SUPPORTED BY
School of Mathematics, Sichuan University