

2018-03-28  本文已影响0人  陆文斌

1.4. Support Vector Machines


Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.


The advantages of support vector machines are:


Effective in high dimensional spaces.


Still effective in cases where number of dimensions is greater than the number of samples.


Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.


Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.


The disadvantages of support vector machines include:


If the number of features is much greater than the number of samples, the method is likely to give poor performances.


SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).


The support vector machines in scikit-learn support both dense (numpy.ndarray and convertible to that by numpy.asarray) and sparse (any scipy.sparse) sample vectors as input. However, to use an SVM to make predictions for sparse data, it must have been fit on such data. For optimal performance, use C-ordered numpy.ndarray (dense) or scipy.sparse.csr_matrix (sparse) with dtype=float64.


SVC, NuSVC and LinearSVC are classes capable of performing multi-class classification on a dataset.

SVN,NuSVC 和LinearSVC都可以用来进行数据的多类分类。


SVC and NuSVC are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section Mathematical formulation). On the other hand, LinearSVC is another implementation of Support Vector Classification for the case of a linear kernel. Note that LinearSVC does not accept keyword kernel, as this is assumed to be linear. It also lacks some of the members of SVC and NuSVC, like support_.

As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays: an array X of size [n_samples, n_features] holding the training samples, and an array y of class labels (strings or integers), size [n_samples]:


