Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources


报告人:王汉生北京大学

时间:2022年7月19日  14:30

地点:腾讯会议:579-8970-5000  会议密码:202278

 

摘要:Modern statistical analysis often involves large data sets, for which conventional estimation methods are not suitable, owing to limited computational resources. To solve this problem, we propose a novel subsampling-based method with jackknifing. The key idea is to treat the whole sample as if it were the population. Then, we obtain multiple subsamples with greatly reduced sizes using simple random sampling with replacement. We do not recommend sampling methods without replacement, because this would incur a significant data processing cost when the processing occurs on a hard drive. However, such a cost does not exist if the data are processed in memory. Because subsampled data have relatively small sizes, they can be comfortably read into computer memory and processed. Based on subsampled datasets, jackknife-debiased estimators can be obtained for the target parameter. The resulting estimators are statistically consistent, with an extremely small bias. Finally, the jackknife-debiased estimators from different subsamples are averaged to form the final estimator. We show theoretically that the final estimator is consistent and asymptotically normal. Furthermore, its asymptotic statistical efficiency can be as good as that of the whole sample estimator under very mild conditions. The proposed method is easily implemented on most computer systems, and thus is widely applicable.

7.19报告.jpg

VIDEOS

  • Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources
  • 14:30 - 17:00, 2022-07-19 at 腾讯会议
  • 王汉生