Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/3437
Title: Effective techniques for gene expression data mining
Authors: Ma Chi-hung Patrick
Subjects: Hong Kong Polytechnic University -- Dissertations
Gene expression -- Data processing
Data mining
Issue Date: 2006
Publisher: The Hong Kong Polytechnic University
Abstract: Gene expression data mining as a new research area poses new challenges to data mining researchers. Gene expression data are typically very noisy and have very high dimensionality. To tackle bioinformatics problems involving them, traditional data mining techniques may not be the best tools to use as they were not originally developed to deal with such data. For this reason, new effective techniques are required. In this thesis, we propose some such techniques. In particular, these techniques can be used to address the problems of reconstructing gene regulatory networks and clustering gene expression data. The former is concerned with the problem of discovering gene interactions to infer the structures of gene regulatory networks. The latter is concerned with the problem of discovering clusters of co-expressed genes so that genes that have similar expression patterns under different experimental conditions can be identified. To reconstruct gene regulatory networks, we have proposed to use an association-discovery technique, which is based on residual analysis and an information theoretic measure, to detect whether or not there interesting association relationships between genes. Given time-dependent gene expression data, this technique can reveal interesting sequential associations between genes for the effective inference of the structures of gene regulatory networks. The association-discovery technique proposed can also be used to find interesting association relationships between gene expression levels and cluster labels. Based on discovering such relationships, we have developed a two-phase clustering algorithm for gene expression data. This algorithm consists of an initial clustering phase and a second re-clustering phase. Using this two-phase approach, it is able to group genes, whose cluster memberships cannot be easily determined by existing methods, into the appropriate clusters. Since the effectiveness of the two-phase clustering algorithm depends, to some extent, on that of the existing clustering method used in the first phase, therefore, we have developed a novel evolutionary clustering algorithm, called EvoCluster, that can be used in the first phase to overcome some of the limitations of existing ones. By making use of an evolutionary approach and the association-discovery technique, it not only is able to perform well in the presence of very noisy data, it can also be used to discover overlapping clusters. For performance evaluation, the data mining techniques proposed in this thesis have been tested with simulated and real data and the experimental results show that they are very promising.
Description: vii, 152 p. : ill. ; 30 cm.
PolyU Library Call No.: [THS] LG51 .H577P COMP 2006 Ma
Rights: All rights reserved.
Type: Thesis
URI: http://hdl.handle.net/10397/3437
Appears in Collections:COMP Theses
PolyU Electronic Theses

Files in This Item:
File Description SizeFormat 
b20592863_link.htmFor PolyU Users 161 BHTMLView/Open
b20592863_ir.pdfFor All Users (Non-printable) 2.44 MBAdobe PDFView/Open


All items in the PolyU Institutional Repository are protected by copyright, with all rights reserved, unless otherwise indicated. No item in the PolyU IR may be reproduced for commercial or resale purposes.