Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:Yale University
  • Heping Zhang
  • (203) 785-5185
Award Date:06/11/2021
Estimated Total Award Amount: $ 299,993
Funds Obligated to Date: $ 299,993
  • FY 2021=$299,993
Start Date:07/01/2021
End Date:06/30/2024
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.049
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:Measure of Heterogeneity for Complex Data Objects
Federal Award ID Number:2112711
DUNS ID:043207562
Parent DUNS ID:043207562
Program Officer:
  • Huixia Wang
  • (703) 292-2279

Awardee Location

Street:Office of Sponsored Projects
City:New Haven
County:New Haven
Awardee Cong. District:03

Primary Place of Performance

Organization Name:Yale University
Street:Office of Sponsored Projects
City:New Haven
County:New Haven
Cong. District:03

Abstract at Time of Award

Technological advances have led to the routine collection of large and complex data objects such as matrices, tensors, functions, and manifolds. Understanding and drawing conclusions from large and complex data sources with a sound scientific rationale is challenging. The level of difficulty generally increases with the size and complexity of the data. Most existing statistical tools are designed to deal with a number as the unit of information, and they become inadequate, because a human face, for example, can be digitalized into many numbers but must be analyzed as a whole, to retain the essential information. The goal of this project is to develop proper statistical methods and software for this need. The concept and methods to be developed will provide vital tools for the analysis of complex data, which have broad applications in both science and engineering. This project will not only advance statistical methodology and disseminate computing software but also offer solutions to important and challenging scientific and public health problems. Applications in understanding and diagnosing Alzheimer's disease and breast cancer are two examples of major public health significance. Furthermore, by engaging doctoral and postdoctoral students, junior faculty members, and summer interns, the PI will take advantage of this project in training new generations of statisticians and data scientists. To consider the complex data structures and to retain the essential information during the statistical analysis, it is important to treat certain structures as the observational point such as the functional resonance imaging (MRI) collected from a person at a given time. Such high dimensional points are referred to as tensors which may or may not fit in a traditionally defined Euclidean space. This project aims to develop statistical methods to analyze tensors as data objects, and classify such data objects in a possibly non-Euclidean space. In particular, this project introduces the concept of ball impurity as a measure of heterogeneity in the distribution of complex data objects and will investigate its use in developing tree-based methods to classify data objects in non-Euclidean spaces. The efforts will be made for an in-depth understanding of the theoretical and empirical properties of the ball impurity, and software will be developed and distributed simultaneously. The developed methods will be used to analyze large-scale, public databases from UK Biobank and dbGaP such as Human Connectome Project. The analyses of these important datasets can not only assess the utility of the novel methods, but also lead to insightful and new scientific discoveries. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.