Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:Florida International University
  • Naphtali D Rishe
  • (305) 348-2025
  • Raju Rangaswami
  • Evangelos Christidis
Award Date:06/18/2008
Estimated Total Award Amount: $ NaN
Funds Obligated to Date: $ 232,000
  • FY 2009=$66,000
  • FY 2010=$16,000
  • FY 2008=$150,000
Start Date:06/15/2008
End Date:01/31/2011
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.070
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:Investigation of Geospatial Data Management on MapReduce Platform
Federal Award ID Number:0837716
DUNS ID:071298814
Parent DUNS ID:159621697

Awardee Location

Street:11200 SW 8TH ST
Awardee Cong. District:26

Primary Place of Performance

Organization Name:Florida International University
Street:11200 SW 8TH ST
Cong. District:26

Abstract at Time of Award

The High Performance Database Research Center at Florida International University is leveraging the Hadoop framework, which implements Google's computational paradigm MapReduce and provides distributed file system services, for serving geospatial imagery and to execute spatial queries with heterogeneous predicates. This work is laying the foundation for high-performance geospatial querying. For instance, queries such as "the percentage of Florida state's land-mass that has vegetation" can be computed using basic image processing (map operation) at each image tile, followed by a simple summation (reduce operation) across tiles that comprise the aerial imagery of the Florida land-mass. A potentially infinite number of such semantic queries can thus be computed using the MapReduce paradigm and a large-scale raster imagery dataset. This exploratory work is providing a bridge between geospatial Web services and the MapReduce platform which has demonstrated success in other data-intensive applications. This work is expected to produce a major impact on the field of geospatial data management and especially decision support based on geospatial data, by enabling decision support queries which were not previously practical. This will provide a foundation to enable critical decision support applications in fields such as disaster mitigation and environmental protection.This work is also providing a uniquely comprehensive collection of geospatial data to a broad research community.

Publications Produced as a Result of this Research

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ariel Cary, Ouri Wolfson, and Naphtali Rishe "Efficient and Scalable Method for Processing Top-k Spatial Boolean Queries" Lecture Notes in Computer Science: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM 2010), v.6187, 2010, p.87.

Publications Produced as Conference Proceedings

Cary, A;Sun, ZG;Hristidis, V;Rishe, N "Experiences on Processing Spatial Data with MapReduce" 21st International Conference on Scientific and Statistical Database Management, v.5566, 2009, p.302 View record at Web of Science

Project Outcomes Report


This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

MapReduce at Florida International University

NSF Program Director: Xiaoyang Wang

PI: Naphtali Rishe

Co-PIs: Vagelis Hristidis and Raju Rangaswami

Researchers at the NSF Industry-University Research Center CAKE at Florida International University have leveraged their Geospatial Data Server TerraFly project to deploy data and algorithms on the CluE infrastructure and to develop new algorithms with applications in geographic information retrieval, urban improvement, and disaster mitigation.

TerraFly users visualize and query aerial imagery and data layers.  Users virtually "fly" over imagery via a web browser, without any software to install or plug in.  Tools include user-friendly geospatial querying, data drill-down, interfaces with real-time data suppliers, demographic analysis, annotation, route dissemination via autopilots, customizable applications, production of aerial atlases, application programming interface (API) for web sites.

The TerraFly project has been featured on TV news programs (including FOX TV News), worldwide press, covered by the New York Times, USA Today, NPR, and Science and Nature journals.

The 40TB TerraFly data collection includes, among others, 1-meter aerial photography of almost the entire United States and 3-inch to 1-foot full-color recent imagery of major urban areas.  TerraFly vector collection includes 400 million geolocated objects, 50 billion data fields, 40 million polylines, 120 million polygons, including: all US and Canada roads, the US Census demographic and socioeconomic datasets, 110 million parcels with property lines and ownership data, 15 million records of businesses with company stats and management roles and contacts, 2 million physicians with expertise detail, various public place databases (including the USGS GNIS and NGA GNS), Wikipedia, extensive global environmental data (including daily feeds from NASA and NOAA satellites and the USGS water gauges), and hundreds of other datasets.

In the present project, we used MapReduce to execute and benchmark massive data computations in the GIS domain.

The specific problems that FIU’s team has addressed are:

  1. Scalability in geo-textual search algorithms is an important concern. We tackled the problem of clustering spatial objects in MapReduce towards enhancing the scalability of spatial searches with text constraints. We implemented a clustering algorithm inspired in X-means clustering, using a distance metric that takes into account spatial and non-spatial similarities at the same time. After clustering, independent spatio-textual indexes are created concurrently, one per cluster, in MapReduce. Our results show that better query processing performance can be attained under certain conditions. The clustering techniques have been refined to achieve higher clustering quality for better scalability.
  2. Non-Negative Matrix Factorization is a popular technique in data mining and machine learning. It has wide applications in environmentrics, image processing, text analysis, and bioinformatics. We have investigated the usage of MapReduce to scale up the Non-Negative Matrix Factorization algorithms to handle large scale problems that are in terabyte or petabyte scale. The resulting algorithms are applicable to the 40TB Aerial/Satellite imagery we have in TerraFly to extract useful information for environment monitoring or urban planning. We have designed and implemented the multiplicative update rule of the algorithm in MapReduce. Our results show fair scalability of our implementation.
  3. Shortest Paths problem has a long research history. Although exact algorithms have been developed, they are not of practical use for large scale...

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.