Suppose we want to classify some data points into two classes. Often we are interested in classifying data as part of a machine-learning process. These data points may not necessarily be points in R2 but may be multidimensional Rp (statistics notation) or Rn (computer science notation) points. We are interested in whether we can separate them by a hyperplane. As we examine a hyperplane, this form of classification is known as linear classification. We also want to choose a hyperplane that separates the data points "neatly", with maximum distance to the closest data point from both classes -- this distance is called the margin. We desire this property since if we add another data point to the points we already have, we can more accurately classify the new point since the separation between the two classes is greater. Now, if such a hyperplane exists, the hyperplane is clearly of interest and is known as the maximum-margin hyperplane or the optimal hyperplane, as are the vectors that are closest to this hyperplane, which are called the support vectors.
Formalization
We consider data points of the form:
where the ci is either 1 or −1 -- this constant denotes the class to which the point belongs. Each is a (statistics notation), or (computer science notation) dimensional vector of scaled or [-1,1 values. The scaling is important to guard against variables (attributes) with larger variance that might otherwise dominate the classification. We can view this as training data, which denotes the correct classification which we would like the SVM to eventually distinguish, by means of the dividing hyperplane, which takes the form
Kernel Machines - A central source of information on kernel based methods, including support vector machines, Gaussian processes.
Lagrangian Support Vector Machine - University of Wisconsin at Madison. Software and technical report.
Meta Description: [ Active Support Vector Machine Home page ]