Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:University of Pittsburgh
  • Zhao Ren
  • (412) 624-7400
Award Date:06/11/2021
Estimated Total Award Amount: $ 235,948
Funds Obligated to Date: $ 235,948
  • FY 2021=$235,948
Start Date:07/01/2021
End Date:06/30/2024
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.049
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:New Frontiers of Robust Statistics in the Era of Big Data
Federal Award ID Number:2113568
DUNS ID:004514360
Parent DUNS ID:004514360
Program Officer:
  • Gabor Szekely
  • (703) 292-8869

Awardee Location

Street:300 Murdoch Building
Awardee Cong. District:18

Primary Place of Performance

Organization Name:University of Pittsburgh
Street:300 Murdoch Building
Cong. District:18

Abstract at Time of Award

Modern technologies have facilitated the collection of an unprecedented amount of features with complex structures. Although extensive progress has been made towards extracting useful information from massive data, the statistical analysis typically assumed that data are drawn without any contamination. However, in reality the data sets arising in applications such as genomics and medical imaging are usually more inhomogeneous due to either data collection process or the intrinsic nature of the data in the era of big data. For instance, in gene expression data analysis, outliers frequently arise in microarray experiments due to the array chip artifacts such as uneven spray of reagents within arrays. Compared to the recent advances in the era of big data, research in modeling and theoretical foundations for robust procedures under contamination models has fallen behind. To bridge this gap, this project seeks to develop new robust estimation and inference procedures which are rate-optimal for various contamination models as building blocks to address the modeling, theory and computational challenges. Upon completion, this work will lead to a comprehensive understanding of contamination models and have an immediate impact on various disciplines such as biology, genomics, astronomy and finance. The project also provides training opportunities for undergraduate and graduate students, and is used to enrich courses and outreach educational materials in statistics and data science. This project aims to address some of the most pressing challenges that are faced by robust procedures in high-dimensional and nonparametric contamination models. Specifically, (I) the research begins with statistical inference of low-dimensional parameters in both increasing-dimensional and high-dimensional regressions under contamination models. The PI will study the influence of contamination proportion in obtaining the root-n consistency results. Robust large-scale simultaneous inference under contamination models are also considered. (II) Next, the PI will revisit some classical nonparametric density estimation problems both under arbitrary and structured contamination distributions. The PI plans to propose rate-optimal procedures and carefully study the effect of contamination on estimation through various model indices, including contamination proportion, the structure of contamination and the choice of loss function. (III) The PI will develop a U-type robust covariance estimator under structured contamination models and provide rigorous theoretical guarantees on its rate optimality. This general robust estimator can serve as building blocks for establishing many rate-optimal procedures for structured large covariance/precision matrix estimation problems. User-friendly R packages will be developed to implement the proposed methods. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.