Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Awardee:RECTOR & VISITORS OF THE UNIVERSITY OF VIRGINIA
Doing Business As Name:University of Virginia Main Campus
PD/PI:
  • Aidong Zhang
  • (716) 696-2562
  • aidong@virginia.edu
Co-PD(s)/co-PI(s):
  • Kevin A Janes
Award Date:09/15/2019
Estimated Total Award Amount: $ 342,000
Funds Obligated to Date: $ 168,589
  • FY 2019=$168,589
Start Date:09/01/2019
End Date:08/31/2021
Transaction Type:Grant
Agency:NSF
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.070
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:Collaborative Research: Knowledge Guided Machine Learning: A Framework for Accelerating Scientific Discovery
Federal Award ID Number:1934600
DUNS ID:065391526
Parent DUNS ID:065391526
Program:HDR-Harnessing the Data Revolu
Program Officer:
  • Eva Zanzerkia
  • (703) 292-4734
  • ezanzerk@nsf.gov

Awardee Location

Street:P.O. BOX 400195
City:CHARLOTTESVILLE
State:VA
ZIP:22904-4195
County:Charlottesville
Country:US
Awardee Cong. District:05

Primary Place of Performance

Organization Name:University of Virginia Main Campus
Street:85 Engineer's Way
City:Charlottesville
State:VA
ZIP:22904-4745
County:Charlottesville
Country:US
Cong. District:05

Abstract at Time of Award

The success of machine learning (ML) in many applications where large-scale data is available has led to a growing anticipation of similar accomplishments in scientific disciplines. The use of data science is particularly promising in scientific problems involving processes that are not completely understood. However, a purely data-driven approach to modeling a physical process can be problematic. For example, it can create a complex model that is neither generalizable beyond the data on which it was trained nor physically interpretable. This problem becomes worse when there is not enough training data, which is quite common in science and engineering domains. A machine learning model that is grounded by explainable theories stands a better chance at safeguarding against learning spurious patterns from the data that lead to non-generalizable performance. This is especially important when dealing with problems that are critical and associated with high risks (e.g., extreme weather or collapse of an ecosystem). Hence, neither an ML-only nor a scientific knowledge-only approach can be considered sufficient for knowledge discovery in complex scientific and engineering applications. This project is developing novel techniques to explore the continuum between knowledge-based and ML models, where both scientific knowledge and data are integrated synergistically. Such integrated methods have the potential for accelerating discovery in a range of scientific and engineering disciplines. This project will train interdisciplinary scientists who are well versed in such methods and will disseminate results of the project via peer-reviewed publications, open-source software, and a series of workshops to engage the broader scientific community. This project aims to develop a framework that uses the unique capability of data science models to automatically learn patterns and models from data, without ignoring the treasure of accumulated scientific knowledge. Specifically, the project builds the foundations of knowledge-guided machine learning (KGML) by exploring several ways of bringing scientific knowledge and machine learning models together using pilot applications from four domains: aquatic ecodynamics, climate and weather, hydrology, and translational biology. These pilot applications were selected because they are at tipping points where knowledge-guided machine learning can have a transformative effect. KGML has the potential for providing scientists and engineers with new insights into their domains of interest and will require the development of innovative new machine learning approaches and architectures that can incorporate scientific principles. Scientific knowledge, KGML methods, and software developed in this project could potentially be extended to a wide range of scientific applications where mechanistic (also known as process-based) models are used. This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.