Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:University of Utah
  • Anna V Little
  • (919) 605-6372
Award Date:05/04/2021
Estimated Total Award Amount: $ 150,000
Funds Obligated to Date: $ 103,553
  • FY 2019=$103,553
Start Date:01/01/2021
End Date:06/30/2022
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.049
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:Collaborative Research: Data-driven Path Metrics for Machine Learning
Federal Award ID Number:2131292
DUNS ID:009095365
Parent DUNS ID:009095365
Program Officer:
  • Yuliya Gorb
  • (703) 292-2113

Awardee Location

Street:75 S 2000 E
County:Salt Lake City
Awardee Cong. District:02

Primary Place of Performance

Organization Name:University of Utah
City:Salt Lake City
County:Salt Lake City
Cong. District:02

Abstract at Time of Award

The era of big data has introduced unprecedented computational and mathematical challenges. Traditional machine learning algorithms often lack scalable computational complexity, while modern approaches lack solid mathematical foundations. Moreover, high data dimensionality creates challenges for traditional methods of data analysis. The principal investigators (PIs) propose to combine classic dimension reduction methods with data-driven distances, so that both the distance and embedding procedure are data dependent. This novel approach allows for greater flexibility in balancing the density-based and geometric features of the data, achieves a density-based simplification of geometry, and insightfully represents the data in a small number of dimensions. In contrast to black box methods such as deep learning, the developed methodology can be rigorously analyzed to derive strong theoretical guarantees for several statistical and machine learning tasks. This research will contribute computational tools for cancer immunogenomics and the investigators will consult with the Rogel Cancer Center at the University of Michigan for scientific questions related to tumor immunology and T-cell biology. In addition, new data analysis tools will be made publicly available in an open source software package. The investigators' approach is driven by the analysis of a family of data-dependent path metrics. These metrics are both density-sensitive and geometry-preserving, with the balance governed by the choice of a single parameter p. By utilizing the space of paths through data, the PIs will obtain density based metrics and embeddings while avoiding the explicit computation of a density estimator, which may be unreliable in a large number of dimensions. The PIs will propose a simple yet highly flexible data model which does not assume the data is sampled from a manifold or collection of manifolds, and investigate the continuous limit of these metrics and an associated graph Laplacian operator. By continuously varying the parameter p, the PIs will propose to create data videos which represent the data from multiple perspectives. The PIs will investigate both multidimensional scaling and graph Laplacian embeddings as mechanisms for obtaining path-based low dimensional representations, and will explore fast algorithms with scalable computational complexity for approximating these metrics. The PIs will contextualize path metrics in the larger frame work of data-driven metrics and focus specifically on the analysis of biological data. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Publications Produced as a Result of this Research

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hirn, Matthew and Little, Anna "Wavelet invariants for statistically robust multi-reference alignment" Information and Inference: a journal of the IMA, v., 2020, p.. doi: Citation details  

Little, Anna and Maggioni, Mauro and Murphy, James M "Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms" Journal of machine learning research, v.21, 2020, p.. Citation details  

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.