PolyU Institutional Repository >
COMP Theses >
Please use this identifier to cite or link to this item:
|Title: ||A meta-mining approach to discovering regularities, differences, and changes in databases|
|Authors: ||Au, Wai-ho|
|Subjects: ||Hong Kong Polytechnic University -- Dissertations|
|Issue Date: ||2006 |
|Publisher: ||The Hong Kong Polytechnic University|
|Abstract: ||We propose to mine a set of rules from a collection of rule sets, each rule being discovered in a data set using a data mining algorithm. These meta-rules, rules about rules, represent the kind of knowledge that few existing data mining algorithms have been developed to mine for. In this study, we define problems in discovering the underlying regularities, differences, and changes hidden in rule sets and propose a new approach, meta-mining, which mines previous data mining results to discover these underlying regularities, differences, and changes. The purpose of meta-mining for regularities and for differences in rule sets is to discover association relationships. Meta-mining for regularities seeks to discover association relationships supported by a sufficiently large number of rules contained in just a few records in many data sets. Meta-mining for differences seeks to discover association relationships supported by a sufficiently small number of rules contained in many records in a small number of data sets. It would not be possible to distinguish between these two kinds of association relationships if the data sets were concatenated into a single data set. The associations that a large number of data sets have in common can be discovered in the form of rules. Their rule sets will contain a correspondingly large number of rules that support the associations. As these rules govern regular characteristics in the data sets, we refer to the rules for these rules as regular meta-rules. In contrast, the rules for some associations will be found in just a few data sets and their rule sets will contain a correspondingly smaller number of rules that support the associations. As these associations contribute to distinguishing or differentiating the data sets which contain them, we refer to the rules for these rules as differential meta-rules. Meta-mining can also be used to reveal changes in rule sets and this information can be used to discover change meta-rules, regularities governing how rules change over time. Change meta-rules can be used to predict how the rules will change in the future, freeing users from dependence on the historical data, allowing better planning, and making it possible to obviate or delay undesirable change.|
A meta-mining approach to the discovery of regular, differential, and change metarules should be able to 1) automatically generate fuzzy sets from data; 2) use linguistic variables and linguistic terms to represent regularities, differences, and changes; 3) exploit the scalability of parallel computer systems; 4) group and select a subset of attributes; and 5) enable the mining of association relationships involving attributes that were not originally contained in the data. To generate fuzzy sets directly from data, we present a new fuzzy partitioning method to maximize the class-attribute interdependence, thereby improving the classification results. This method uses an information-theoretic measure to evaluate the interdependence between the class and an attribute. So that association relationships can be represented using easily-understood linguistic variables and terms, we propose new algorithms for mining fuzzy rules and meta-rules. These utilize an objective measure to discover interesting associations among attributes without the need for a user to supply any thresholds. We also extend these new algorithms to exploit the scalability of parallel systems so as to handle very large data sets and rule sets. The parallel algorithms produce the same results as their serial counterparts in a fraction of the time. We also define the problem of attribute clustering and introduce a methodology for solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. Clustering attributes reduces the search dimension of a mining algorithm. To allow the discovery of association relationships involving attributes that are not originally contained in the data, we introduce the concept of using transformation functions and propose a formal approach to this problem. This approach can also handle the union of relational and transactional data stored in a relational database. In this study, we also tested our proposed techniques with extensive experiments on many synthetic and real-world data sets. The results show that they are very effective in mining not just rules from data sets, but also meta-rules from rule sets.
|Degree: ||Ph.D., Dept. of Computing, The Hong Kong Polytechnic University, 2006|
|Description: ||xii, 237 leaves : ill. ; 30 cm.|
PolyU Library Call No.: [THS] LG51 .H577P COMP 2006 Au
|Rights: ||All rights reserved.|
|Appears in Collections:||COMP Theses|
PolyU Electronic Theses
All items in the PolyU Institutional Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
No item in the PolyU IR may be reproduced for commercial or resale purposes.