Sunday, November 22, 2009

Classification Methods


(1) Decision Tree
There are two main types of Decision Tree classification methods:
* Decision tree induction is a technique for partitioning a training file into a set of rules. A decision tree consists of nodes and edges. Each node is labeled with attribute names and each edge is labeled with possible values for this attribute. The starting node is called the root node. Depending upon the results of a test, the training files are partitioned into two or more sub-sets.
The end result is a set of rules covering all the possibilities. They are labeled with different classes.
To construct a decision tree, we could start with the first attribute, find a certain threshold, go on to the next one, find a certain threshold, and repeat this process until we have made a correct classification for our customers, thus creating a decision tree for our database. The tree induction algorithms scale up very well for large data sets. They give a true insight into the nature of the decision process.
*Tree pruning When a decision tree is built, many of the branches will reflect anomalies in the training data due to noise or outliers. Tree pruning methods address this problem of over fitting the data. Such methods typically use statistical measures to remove the least reliable branches, generally resulting in faster classification and an improvement in the ability of the tree to correctly classify independent test data.
(2) Classification Using Association Rules
Classification rule mining and association rule mining are two important data mining techniques. Classification rule mining aims to discover a small set of rules in the database to form an accurate classifier. Association rule mining finds all rules in the database that satisfy some minimum support and minimum confidence constraints. For association rule mining, the target of mining is not predetermined, while for classification rule mining there is one and only one pre-determined target, i.e., the class. Both classification rule mining and association rule mining are indispensable to practical applications. Thus, great savings and conveniences to the user could result if the two mining techniques can somehow be integrated. In this paper, we propose such an integrated framework, called associative classification. We show that the integration can be done efficiently and without loss of performance, i.e., the accuracy of the resultant classifier.[6]
The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. We refer to this subset of rules as the class association rules (CARs). An existing association rule mining algorithm is adapted to mine all the CARs that satisfy the minimum support and minimum confidence constraints. This adaptation is necessary for two main reasons:
1. Unlike a transactional database normally used in association rule mining that does not have many associations, classification data tends to contain a huge number of associations. Adaptation of the existing association rule mining algorithm to mine only the CARs is needed so as to reduce the number of rules generated, thus avoiding combinatorial explosion.
2. Classification datasets often contain many continuous (or numeric) attributes. Mining of association rules with continuous attributes is still a major research issue. Our adaptation involves discretization continuous attributes based on the classification predetermined class target. There are many good discretization algorithms for this purpose.[7]
(3) Bayesian Classification
are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class.
Bayesian classification is based on Bayes theorem, described below. Studies comparing classification algorithms have found a simple Bayesian classifier known as the naïve Bayesian classifier to be comparable in performance with decision tree and neural network classifiers. Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.
Naive Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computations involved, and in this sense, is considered "naive". Bayesian belief networks are graphical models, which unlike naive Bayesian classifiers, allow the representation of dependencies
among subsets of attributes. Bayesian belief networks can also be used for classification.
There are two main types of Bayesian Classification methods:
* The Basic Naive Bayes Classifier The most straightforward Bayesian learning method is the Naive Bayesian classifier. This method uses a set of discriminant functions for estimating the probability of a given instance to belong to a certain class. More specifically it uses Bayes rule to compute the probability of each possible value of the target attribute given the instance, assuming the input attributes are conditionally independent given the target attribute.
Due to the fact that this method is based on the simplistic, and rather unrealistic, assumption that the causes are conditionally independent given the effect, this method is well known as Naive Bayes.
* Bayesian belief networks The naive Bayesian classifier makes the assumption of class conditional independence, i.e., that given the class label of a sample, the values of the attributes are conditionally independent of one another. This assumption simplifies computation. When the assumption holds true, then the naive Bayesian classifier is the most accurate in comparison with all other classifiers. In practice, however, dependencies can exist between variables. Bayesian belief networks specify joint conditional probability distributions. They allow class conditional independencies to be defined between subsets of variables. They provide a graphical model of causal relationships, on which learning can be performed. These networks are also known as belief networks, Bayesian networks, and probabilistic networks. For brevity, we will refer to them as belief networks.

A belief network is defined by two components. The first is a directed acyclic graph, where each node represents a random variable, and each arc represents a probabilistic dependence[1].
(4) Backpropagation
Is a neural network learning algorithm. The field of neural networks was originally kindled by psychologists and neurobiologists who sought to develop and test computational analogues of neurons. Roughly speaking, a neural network is a set of connected input/output units where each connection has a weight associated with it. During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label of the input samples. Neural network learning is also referred to as connectionist learning due to the connections between units.
Neural networks involve long training times, and are therefore more suitable for applications where this is feasible. They require a number of parameters which are typically best determined empirically, such as the network topology or "structure". Neural networks have been criticized for their poor interpretability, since it is difficult for humans to interpret the symbolic meaning behind the learned weights. These features initially made neural networks less desirable for data mining.
Advantages of neural networks, however, include their high tolerance to noisy data as well as their ability to classify patterns on which they have not been trained. In addition, several algorithms have recently been developed for the extraction of rules from trained neural networks. These factors contribute towards the usefulness of neural networks for classification in data mining.

(5) Other classification methods
* K - Nearest Neighbor Method are based on learning by analogy. The training samples are described by n-dimensional numeric attributes. Each sample represents a point in an n-dimensional space. In this way, all of the training samples are stored in an n-dimensional pattern space.
* Case-based reasoning (CBR) classifiers are instanced-based. Unlike nearest neighbor classifiers, which store training samples as points in Euclidean space, the samples or "cases" stored by CBR are complex symbolic descriptions. CBR has also been applied to areas such as
engineering and law, where cases are either technical designs or legal rulings, respectively.
When given a new case to classify, a case-based reasoner will first check if an identical training case exists. If one is found, then the accompanying solution to that case is returned. If no identical case is found, then the case-based reasoner will search for training cases having components that are similar to those of the new case. Conceptually, these training cases may be considered as neighbors of the new case. If cases are represented as graphs, this involves searching for subgraphs which are similar to subgraphs within the new case. The case-based reasoner tries to combine the solutions of the neighboring training cases in order to propose a solution for the new case. If incompatibilities arise with the individual solutions, then backtracking to search for other solutions may be necessary. The case-based reasoner may employ background knowledge and problem-solving strategies in order to propose a feasible combined solution.
Challenges in case-based reasoning include finding a good similarity metric (e.g., for matching subgraphs), developing efficient techniques for indexing training cases, and methods for combining solutions.
* Genetic algorithms attempt to incorporate ideas of natural evolution. In general, genetic learning starts as follows. An initial population is created consisting of randomly generated rules. Each rule can be represented by a string of bits.
*Rough set theory can be used for classification to discover structural relationships within imprecise or noisy data. It applies to discrete-valued attributes. Continuous-valued attributes must therefore be discretized prior to its use.
Rough set theory is based on the establishment of equivalence classes within the given training data. All of the data samples forming an equivalence class are indiscernible, that is, the samples are identical with respect to the attributes describing the data.
Reference:
(Data Mining: Concepts and Techniques) Jiawei Han and Micheline Kamber Simon Fraser University c 2000

No comments:

Post a Comment