Here are some of my current sponsored projects:
NSF CAREER: Reference Resolution for Natural Language Understanding
A major obstacle in building robust systems that extract and interpret information,
and summarize and answer questions from texts, is the need to identify the entities referred to by pronouns or other referential expressions. This
project extends the PI's prior work involving the development of an empirical reference resolution system that relies on several sets of heuristics
that correspond to various forms of reference. In particular, the framework will be extended to learn semantic knowledge that supports
consistency checks. This enhancement will provide high precision reference resolution and also enhance substantially the recall of referential
links. The research will be evaluated using reference annotated texts and the Penn Treebank corpora. The outcome will be a corpus-based
method for reference resolution for both pronouns and nominal expressions. First, the semantics of all referential noun phrases will be captured.
Then, by extending the empirical environment with bootstrapping, this reference resolution technique should lead to a powerful tool capable of
resolving reference correctly in a large variety of texts. Finally, the tool will be incorporated both in an information extraction system and in a
question/answering system, to measure its contribution to the overall performance of these systems. The proposed research departs from
previous approaches to reference resolution, in that it promotes data-driven techniques instead of relying on combinations of linguistic and
cognitive aspects of language. The immediate pragmatic outcome indicated by the preliminary results should be a substantial recall enhancement. This research is sponsored by the National Science Foundation.
PI: Dr. Sanda Harabagiu
|
ARDA AQUAINT Computational Implicatures for Advanced Question Answering
The capability of interpreting question implicatures in advanced
Question Answering systems is a very important feature. When using a
Question Answering system to find information, a professional analyst
cannot separate his/her intentions and beliefs from the formulation of
the question and therefore (s)he incorporates intentions and beliefs in
the interrogation. Moreover, beyond the question, the analyst
sometimes makes a proposal or an assertion. This implied information,
not recognizable at the syntactic or semantic level, has great
importance in the interpretation of a question, and therefore in the
quality of the answers returned by a Questions Answering system. This project concerns
with the study and development of computational methods that enable coercions of
implicatures in the context of advanced Question Answering. This project is sponsored by
ARDA.
PI: Dr. Sanda Harabagiu.
|
ARP: Knowledge Mining for Open-Domain Information Extraction
Nowadays, access to information from large-scale on-line text
collections is largely limited to keyword-based searches which retrieve
entire documents or passages containing the query keywords. While such
tools are often satisfactory for retrieving information on general
topics, they provide little support for accessing information
involving specific relationships, events or facts.
The Information Extraction (IE) technology enables the generation of
structured, tabular representations of selected relations from large
text collections - representations which can support more detailed
document querying. However, IE systems rely on domain knowledge,
thus imposing customization every time when a new topic is considered. This
explains why until now, developing extraction systems for a broad
range of relations, spanning a large number of semantic domains has
been too expensive and time-consuming. This research concerns with the
development of the infrastructure that enables open-domain IE.This research is sponsored by the Advanced Research Program of the Texas Higher Education Coordinating Board.
PI: Dr. Sanda Harabagiu
|
NSF CADRE: A Tool for Transforming WordNet into a Core Knowledge Base
This project extends a popular database of English words to make it more useful in such tasks as question answering, information retrieval, and
summarization. Wordnet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a
variety of practical applications. The basic elements of WordNet are sets of words that are linked according to semantic relations: synonymy,
antonymy, super-ordination, and so forth. WordNet is publicly available, widely used, and is currently being transformed into a multi-lingual database.
This project develops a set of tools that can be applied to current and future versions of WordNet to extend it for knowledge processing
applications. The extensions are enhancements of the glosses that
now contain definitions, comments, and examples.
The enhanced glosses are part-of-speech tagged, syntactically parsed
and semantically disambiguated. In addition, topically related words are clustered by
lexical chains generated on the extended WordNet. This research is sponsored by the National Science Foundation.
PI: Dr. Dan Moldovan, Co-PI: Dr. Sanda Harabagiu.
|
|