Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:Harvard University
  • Edward Kohler
  • (617) 496-2630
Award Date:11/30/2017
Estimated Total Award Amount: $ 400,000
Funds Obligated to Date: $ 123,344
  • FY 2018=$123,344
Start Date:01/15/2018
End Date:12/31/2021
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.070
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:CSR: Medium: Collaborative Research: Soup: Flexible Storage and Processing for On-Line Applications
Federal Award ID Number:1704376
DUNS ID:082359691
Parent DUNS ID:001963263
Program Officer:
  • Samee Khan
  • (703) 292-8950

Awardee Location

Awardee Cong. District:05

Primary Place of Performance

Organization Name:Harvard University
Street:1033 Massachusetts Ave
Cong. District:05

Abstract at Time of Award

The project aims to build a new kind of storage system for use in busy web sites, combining high performance with ease of programming. The project's key idea is to ask a web site's developers to declare in advance all the ways in which the web site will need to retrieve and process data. This allows the database to prepare all the required outputs in advance, and keep these outputs up to date as new data is inserted into the database. The result is that the web site can read data (and thus generate web pages) efficiently. The project prototype, called Soup, uses a data-flow graph to keep materialized views up to date as database writes arrive; these views hold the results for the web site software's pre-declared queries. However, as the web site software evolves, it will change the set of queries it needs. Soup uses several novel techniques to handle these changes efficiently: re-use of state across successive versions of the data-flow graph, and partial materialization of views and internal data-flow state. Soup supports transactions by combining optimistic concurrency control with data-flow, and allows scale-up of throughput by spreading data and computation over multiple servers. Web sites are an important part of modern life, and an enormous effort is invested in building and maintaining them. This effort could be significantly reduced if storage systems were better matched to the needs of web sites. Soup will provide this better match, by combining the ease of use of relational databases with much-increased speed and efficiency. The project's main results will be a prototype implementation, along with sample applications, documentation, and research papers. The code (Soup and sample applications) will be maintained on GitHub, where anyone can examine and fetch the most recent versions. Documentation will also be maintained on GitHub, and papers will be available on the project web site. We intend to maintain the project repository for at least five years beyond the end of the project. All of these resources will be available from the project web page:

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.