Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Awardee:UNIVERSITY OF SOUTHERN CALIFORNIA
Doing Business As Name:University of Southern California
PD/PI:
  • Xin Tong
  • (213) 740-0172
  • xint@marshall.usc.edu
Award Date:06/11/2021
Estimated Total Award Amount: $ 120,000
Funds Obligated to Date: $ 120,000
  • FY 2021=$120,000
Start Date:07/01/2021
End Date:06/30/2024
Transaction Type:Grant
Agency:NSF
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.049
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:Collaborative Research: Development of Classification Theory and Methods for Objective Asymmetry, Sample Size Limitation, Labeling Ambiguity, and Feature Importance
Federal Award ID Number:2113500
DUNS ID:072933393
Parent DUNS ID:072933393
Program:STATISTICS
Program Officer:
  • Yong Zeng
  • (703) 292-7902
  • yzeng@nsf.gov

Awardee Location

Street:University Park
City:Los Angeles
State:CA
ZIP:90089-0001
County:Los Angeles
Country:US
Awardee Cong. District:37

Primary Place of Performance

Organization Name:Univ of Southern California
Street:3670 Trousdale Pkwy
City:Los Angeles
State:CA
ZIP:90089-0806
County:Los Angeles
Country:US
Cong. District:37

Abstract at Time of Award

Classification is a popular data analytical technique in disciplines ranging from biomedical sciences to information technologies. This project will develop theory-backed statistical methods and algorithms to address pressing challenges in the application of classification. These challenges are related to imperfect aspects of training data, which are widespread in high-stake applications such as disease diagnosis and cybersecurity. In particular, this project will focus on the so-called asymmetric classification problems where a particular class is of greater importance than other classes, and the methods and algorithms will aim to control the classification error of missing the most important class in the population, not just in a particular dataset. This property will make the methods and algorithms powerful for medical diagnosis, for which the primary goal is diagnosis accuracy in the population. Moreover, this project will provide a suite of projects, ranging from theory to applications, that are suitable for training graduate and undergraduate students. The interdisciplinary nature of this project is expected to attract students from diverse background to join the PIs’ efforts. The PIs will develop a suite of application-driven, theory-backed methods and algorithms to address pressing data challenges including sample size limitations, sampling biases, and ambiguous class labels. The development will be primarily under the Neyman-Pearson (NP) classification paradigm, which was designed to control the population-level false-negative rate (p-FNR) under a desired level while minimizing the population-level false-positive rate (p-FPR). This project will integrate the NP classification into cutting-edge statistical learning tasks and enable it to address the aforementioned real-world data challenges. Specifically, this project will include the following four overarching goals. First, the PIs will use random matrix theory to address a long-standing problem in the NP classification methodology: whether NP classifiers can be constructed without a sample-splitting step to improve data efficiency. Second, because the NP paradigm has an invariance property to sampling bias, the PIs will develop NP classifiers to address the sampling bias issue in biomedical applications. These classifiers can be trained on biased samples but still achieve the p-FNR control. Third, the PIs will develop a model-free feature ranking framework to incorporate multiple classification paradigms including the NP paradigm and to reflect prediction objectives. Fourth, the PIs will develop the first NP umbrella algorithm under the label noise setting and the first information-theoretic criteria that combine ambiguous classes in multi-class classification. To disseminate the project outcomes, the PIs will give research talks, organize conference sessions, share open-source software packages with tutorials, and reach out to practitioners of classification methods. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.