Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:Tulane University
  • Brian Summa
  • (504) 865-4000
Award Date:07/22/2021
Estimated Total Award Amount: $ 180,000
Funds Obligated to Date: $ 180,000
  • FY 2021=$180,000
Start Date:10/01/2021
End Date:09/30/2023
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.070
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:EAGER: Scalable, Content-Based, Domain-Agnostic Search of Scientific Data through Concise Topological Representations
Federal Award ID Number:2136744
DUNS ID:053785812
Parent DUNS ID:053785812
Program:Info Integration & Informatics
Program Officer:
  • Hector Munoz-Avila
  • (703) 292-4481

Awardee Location

County:New Orleans
Awardee Cong. District:01

Primary Place of Performance

Organization Name:Tulane University
Street:6823 St Charles Avenue
City:New Orleans
County:New Orleans
Cong. District:01

Abstract at Time of Award

Cutting-edge science relies on scientists’ ability to sift through and access the massive amounts of data that are being produced by the latest research. Much of that data is stored in online databases and is searchable only by using specific, scientific terms, like keywords, tags, or descriptions. If someone doesn’t know exactly the right terms to use, they often can’t access all the data that might be useful for their research. By using mathematical approaches for information retrieval in a new way, this project will study whether a powerful search tool, called content-based search, can be modified for scientific data. If successful, this project will free data users from needing to know exactly which keywords to use, transforming how scientists are able to access and share data and creating new opportunities for scientists with vastly different expertise to work together. One particularly promising way to describe the content of scientific data is through a dataset’s topology. Therefore, this project will develop approaches to compute topological similarity that are smaller, faster, and more scalable than previously thought possible, with the goal of creating a method for cross-cutting, content-based search of scientific data. Specifically, the investigators will develop a learned-hash function to convert a dataset’s persistence diagram - the common encoding of its topology - to a simple binary code. This hash will be trained such that the bitwise distance between codes will maintain a measure of topological similarity between datasets. This will convert topological comparisons from the current state of an expensive bottleneck to one with nominal processing costs that can scale to large database queries. Initially, this project will focus on binary codes that maintain clusters and neighborhoods, ultimately developing codes that are rank or semi-metric preserving. The investigators will also explore strategies for training a learned-hash function on synthetic data, with the goal of developing a fully domain-oblivious approach to content-based search. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.