Dr. Laitfur Khan and his group of research students conduct top-notch research in data mining, multimedia information management, semantic web and database systems

Data mining

In Data mining we have considered some very challenging and far-reaching clustering and classification problems. For clustering, automatic clustering can properly  be described as  undirected knowledge discovery or unsupervised learning. The function of clustering algorithms in unsupervised learning problems is to identify  an optimal partition in a data set, and the function of unsupervised learning (or self-organizing) can be thought of as the  development of  suitable clustering for this data set. Recently, we have introduced a promising new tree-structured self-organizing neural  network  called a dynamical  growing  self-organizing  tree (DGSOT). This DGSOT algorithm constructs a hierarchy from top to bottom  by  division. The  DGSOT  algorithm  overcomes  the  drawbacks  of traditional hierarchical clustering algorithms (e.g., hierarchical agglomerative clustering). The DGSOT algorithm has been tested on various domains such as text data, and complex data  (i.e., bio-informatics area in the form of 112 rat central nervous system and 3000 gene micro array expression data) and has demonstrated impressive results.

For classification, many algorithms have been available (e.g., Bayesian network, decision tree and so on) but  the  most  effective  one  is  support  vector  machines (SVM). However, the training time for SVM is at least O(N2) with the dataset size N, which makes it non-favorable for large datasets. However, by clustering a large dataset we cannot only reduce the size of the dataset, but also choose the most qualified/representative data points   among the whole dataset. Hence, we train SVM with the clusters' references that are much less than the size of the original dataset. By doing so, we reduce the training time of SVM, and improve the accuracy of the classifier gradually by using the hierarchical information and de-clustering clusters close to the boundaries on the fly.

 

Intrusion Detection

With regard to intrusion detection, we propose a scalable solution using DGSOT along with support vector machine (SVM) for network-based anomaly detection. The SVM is one of the most successful classification algorithms in the data mining area, but its long training time limits its use. We present a study for enhancing the training time of SVM, specifically when dealing with large data sets, using hierarchical clustering analysis. We use DGSOT for clustering because it has proved to overcome the drawbacks of traditional hierarchical clustering algorithms (e.g., hierarchical agglomerative clustering). Clustering analysis helps find the boundary points, which are the most qualified data points to train SVM, between two classes.

 

Multimedia Information Management

We propose a novel framework for semantic image annotation. An efficient image annotation and retrieval system is highly desired. Clustering algorithms make it possible to represent visual features of images with finite symbols. Based on this, many statistical models, which analyze correspondence between visual features and words and discover hidden semantics, have been published. These models improve the annotation and retrieval of large image databases. However, image data usually have a large number of dimensions. Traditional clustering algorithms assign equal weights to these dimensions, and become confounded in the process of dealing with these dimensions. We propose weighted feature selection algorithm as a solution to this problem. For a given cluster, we determine relevant features based on histogram analysis and assign greater weight to relevant features as compared to less relevant features. Furthermore, we extend our approaches to capture dependence between neighboring objects/contexts to improve annotation accuracy. We implement various models to link visual tokens with keywords based on the clustering results of K-means algorithm with weighted feature selection and without weighted feature selection, and evaluated performance using precision, and recall using benchmark dataset. The results show that weighted feature selection is better than traditional ones for automatic image annotation and retrieval.

 

Semantic web

Semantic Web is a relatively new area of research and one that can potentially change the way communication transpires whether between humans or machines. Our long term goal is to develop a scheme that can exploit both the semantics and syntax involved with data in Semantic Web. Our current research plan is to investigate the data and ontology layer of Semantic Web and provide a stable security mechanism that is modular and can be leveraged across heterogeneous platforms. We have put a lot of emphasis on bringing privacy into third-party architecture that is widely utilized by various organizations -- big and small -- for efficient and cost-effective data processing and storage purposes. Since data belonging to a particular organization can be subject to contractual bindings or privacy related legal hooks, unless a secure privacy protected scheme is available, not many organizations will be comfortable with off-shoring their data. We have proposed a AES cipher based scheme to encrypt data in Semantic Web that exploits RDF syntax and can work just as well for XML documents.

 


Home | Research | Publication | Courses | DB Lab | Activities | Students | Funding | Lamisah | Contacts
page last updated: January 18, 2008