Paper Citation Analysis

Project report

The goal of the project is to implement a citation analysis system with graphical user interface for papers in artificial intelligence and machine learning area. The main functionality is topic clustering and tracking of references between papers. The idea of the interface is to represent search results in cluster diagrams, so that user can pick one cluster and either see documents in it, sorted with some relevance measure, or enter the cluster recursively to see internal structure of the cluster. For this we have downloaded papers (PDF document) and bibliography information from CiteSeerX, convert the downloaded PDF documents into plain text documents for ease of indexation, extract and organize citations from each document and metadata and create two index, one for documents and one for citations. When a user enters a query the search results are clustered using Lingo clustering from Carrot2. These clusters are then again presented to the user as clickable objects, which allow them to navigate the hierarchical clusters recursively.