B.S., Nanjing University
M.S., Georgia Institute of Technology
Ph.D., Georgia Institute of Technology
The last decade has been marked by unprecedented growth in the production of biomedical data. Analyzing these data may reveal systematic biological insights into states of cells, mechanisms of disease and treatments In order to exploit this wealth of information, a new field of science, bioinformatics, has arisen that fuses biology and medicine on one side with mathematics, statistics and computer science on the other side. My general research interests are in bioinformatics, and computational biology, especially in developing innovative algorithmic approaches using computation to understand life processes. My lab explores designing and implementing data mining and machine learning algorithms to various contexts of bioinformatics research.
1 Gene Expression Data and Proteomic Data Analysis.
Machine learning techniques have been applied to gene expression and proteomic data analysis. Biomarker discovery (Gene selection) is of primary importance in Bioinformatics. Out of thousands of genes, only a small fraction is highly correlated with the signature of the pattern we wish to detect. My group designed a novel biomarker discovery algorithm, replicator dynamics, based on neural network and evolutional computing to re-ordering the micorarray data matrix to discover biomarkers or gene signatures. Novel algorithms were also designed and implemented to discover “outlier” samples from microarray data and proteomic data. These outliers could be new classes of diseases.
2 GeneTrek: A Text Mining System to Cluster Genes by Functional Keyword Association from Biomedical Literature.
Partitioning genes into closely related groups has become the first step in practically all statistical analyses of microarray data. Expression profile gene clustering has received much attention; however, the task of finding functional relationships between specific genes is left to the investigator. Almost every known or postulated piece of information pertaining to genes, and their role in biological processes is reported in the vast amount of published biomedical literature. Therefore, if, instead of organizing by expression pattern similarity, genes were grouped according to shared function extracted from biomedical literature, investigators might more quickly discover patterns or themes of biological processes that were revealed by their microarray experiments and focus on a select group of functionally related genes. My group developed a text mining system, GeneTrek, for mining functional keywords for each gene from biomedical literature, and then clustering the genes based on the shared functional keywords. GeneTrek has two main components, a text analysis component and a gene clustering component. Besides the clustered genes, for each cluster, GeneTrek outputs a list of keywords which are shared by the genes in that cluster. The list of shared keywords can assist the researchers in understanding the function of the genes in each cluster and generating new hypotheses. The GeneTrek system can correctly cluster genes and some new gene-to-gene and gene-to-disease relationships were discovered.
3 Biological Network Analysis for functional module discovery.
Recent advances in high throughput experiments and annotations via published literature have provided a wealth of interaction maps of several biomolecular networks, including metabolic, protein-protein, and protein-DNA interaction networks. The architecture of these molecular networks reveals important principles of cellular organization and molecular functions. Analyzing such networks, i.e., discovering the dense regions in the networks, is an important way to identify protein complexes and functional modules. We proposed a new algorithm and two dynamical systems to discover heavy subgraphs from large biological networks. Experimental results on both the simulated graphs and biological networks have demonstrated the efficiency and effectiveness of our algorithms and systems.
4 Computer-aid Drug Discovery.
To develop a molecule that can interact with the disease agents to neutralize them is one of the main goals of modern biomedical research. However, many complex interactions are occurring at the molecular level that makes the full rational drug design process extremely difficult. The advent of combinational chemistry in the mid-1980s has allowed the synthesis of hundreds, thousands and even millions of new molecular compounds. The need for a more refined search than simply producing and testing every single molecular combination possible has meant that intelligent computation has become an integral part of the drug production process. My group designed feature selection methods for molecular representation of the chemicals for drug candidate screening. Furthermore, we designed and applied different machine learning algorithms to QSAR (Quantitative structure-activity relationship) modeling.
1. Yongjing Lin, Wenyuan Li, Keke Chen and Ying Liu. A Document Clustering and Ranking System for Exploring MEDLINE Citations. Jounral of the American Medical Informatics Association (JAMIA), 14(5): 651-661
2. Wenyuan Li, Wee-Keong Ng,Ying Liu and Kok-Leong Ong. Enhancing the Effectiveness of Clustering with Spectra Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE), 19(7): 887-902
3. Xiuwen Zheng, Hung-Chung Huang, Wenyuan Li, Peng Liu, Quan-Zhen Li, and Ying Liu. Modeling Nonlinearity in Dilution Design Microarray Data. Bioinformatics, 23(11):1339-1347
4. Wenyuan Li, Yanxiong Peng, Hung-Chung Huang and Ying Liu. Biomarker Discovery and Visualization In Gene Expression Data With Efficient Generalized Matrix Approximations. Journal of Bioinformatics and Computational Biology (JBCB) 5: 251-279.
5. Wenyuan Li, Yongjing Lin, and Ying Liu. The Structure of Weighted Small-World Network. Physica A: Statistical Mechanics and its Applications, 378:708-718, 2007.
6. Yanxiong Peng,Wenyuan Li and Ying Liu. A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data. Cancer Informatics, 2: 301-311, 2006.
7. MAQC Consortium. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nature Biotechnology, 24 (9): 1151-1161, 2006.
8. Wenyuan Li, Ying Liu, H.-C. Huang, Yanxiong Peng, Yongjing Lin, Wee-Keong Ng, and Kok-Leong Ong. Dynamical Systems for Discovering Protein Complexes and Functional Modules from Biological Networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(2): 233-250
9. Ying Liu, Shamkant B. Navathe, Alex Pivoshenko, Venu Dasigi, Ray Dingledine, and Brian J. Ciliax. Text Analysis of MEDLINE for Discovering Functional Relationships among Genes: Evaluation of Keyword Extraction Weighting Schemes. International Journal of Data Mining and Bioinformatics, 1:88-110, 2006.
10. Ying Liu. Serum Proteomic Pattern Analysis for Early Cancer Detection. Technology in Cancer Research and Treatment, 5: 61-66, 2006.
11. Ying Liu, Shamkant B. Navathe, Jorge Civera, Venu Dasigi, Ashwin Ram, Brian J. Ciliax, and Ray Dingledine. (2005) Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships. A Comparative Study of Algorithms. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2:62-76
12. Ying Liu. (2004) A Comparative Study of Feature Selection Methods for Drug Discovery. Journal of Chemical Information and Computer Sciences 44: 1823-1828.
13. Ying Liu. (2004) Active learning with support vector machine applied to gene expression data analysis for cancer classification. Journal of Chemical Information and Computer Sciences 44: 1936-1941.
- Updated: February 6, 2006