Skip directly to content

Minimize RSR Award Detail

Research Spending & Results

Award Detail

Doing Business As Name:Kansas State University
  • Cornelia Caragea
  • (814) 308-4974
Award Date:12/01/2017
Estimated Total Award Amount: $ 54,960
Funds Obligated to Date: $ 64,960
  • FY 2018=$10,000
  • FY 2016=$54,960
Start Date:10/26/2017
End Date:08/31/2019
Transaction Type:Grant
Awarding Agency Code:4900
Funding Agency Code:4900
CFDA Number:47.070
Primary Program Source:040100 NSF RESEARCH & RELATED ACTIVIT
Award Title or Description:III: Small: Collaborative Research: Keyphrase Extraction in Document Networks
Federal Award ID Number:1813571
DUNS ID:929773554
Parent DUNS ID:041146432
Program Officer:
  • Maria Zemankova
  • (703) 292-8930

Awardee Location

Awardee Cong. District:01

Primary Place of Performance

Organization Name:Kansas State University
Cong. District:01

Abstract at Time of Award

Keyphrases for a document concisely describe the document using a small set of phrases (i.e., sequences of contiguous words in a document). For example, the keyphrases "social networks" and "interest targeting" quickly provide us with a high-level topic description (i.e., a summary) of a document focused on targeting interest for recommending services such as products and news to users, in the context of social networks. Given today's very large collections of documents, these keyphrases are extremely important not only for summarizing a document, but also for the search and retrieval of relevant information. However, keyphrases are not always available directly. Instead, they need to be gleaned from the many details in documents. This project addresses the problem of automatic keyphrase extraction from research papers, which are enablers of the sharing and dissemination of scientific discoveries. The goal of the project is to explore accurate approaches that automatically discover and extract keyphrases in documents, using document networks, which will help users handle and digest more information in less time during these "big data" times. Educationally, this research will involve training of both graduate and undergraduate students in the active area of research of keyphrase extraction, which has high impact in many real-world applications such as online advertising, document categorization, recommendation, and summarization, Web search and discovery, and topic tracking in newswire. Although much research to date has been done on automatic keyphrase extraction, no previous approaches have captured the impact of documents on one another via the citation relation that connects documents in a network. This project will investigate models that take into consideration the linkage between citing and cited documents in a document network and will explore various qualitative and quantitative aspects of the question: "What are the key phrases or concepts in a document?" Scalable iterative algorithms will be designed and developed that capture different aspects of documents (e.g., topics or concepts), as well as the impact of one document on another (e.g., influence or topic evolution) in a document network. The results of this research will have a direct pipeline to the CiteSeerX digital library ( The software, tools, and benchmark datasets developed during the course of this project will be broadly disseminated via the project website ( All findings will be shared to the research community through publications in academic journals and presented in Information Retrieval, Text Mining and Natural Language Processing conferences.

For specific questions or comments about this information including the NSF Project Outcomes Report, contact us.