支持向量机

本文最后更新于 2020年8月5日 早上

引入

上图中黑色和白色的点分别代表一类事物,我们想用一条直线把这两类事物区分开,显而易见红线区分最好,那么为什么红线区分最好呢?

这里引入了一个概念,边际。边际指的是做一个超平面,使得两侧离的最近的点的距离。

如何选取使边际(margin)最大的超平面 (Max Margin Hyperplane)?

超平面到一侧最近点的距离等于到另一侧最近点的距离,两侧的两个超平面平行。

选取最大超平面

超平面可以定义为: W * X + b = 0

W是一个类似于权重的向量, X是我们给出的实例的特征向量, b是偏好

超平面方程也可以写成这是二维的,b也就是w0.

上方的点满足大于零,下方的点小于零

H1就是上边界,也就是两个图中上面的这条线,至于为什么后面是1?这只是用来区分上边界还是下边界,可以通过w0进行调节。我们输入其实就是(x1,x2,…, yi),yi就是类别标记

两个公式可以合并

在边界上的点叫做支持向量

我们可以得出一个结论:分界的超平面和H1/H2之间的距离是1/||W||

也就是先平方再开方

所以最大边界距离是2/||W||

所以我们要找2/||W||最大值,也就是找w的最小值。所以我们需要找的是:

用1/2平方是因为好算

之后运用拉格朗日函数,KKT算法等得到了最大超平面方程

.其中a和b是通过计算过程得出,暂时不清楚具体情况

python使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
 from sklearn import svm


x = {% post_link 2, 0 %} # 特征向量
y = [0, 0, 1] # yi
clf = svm.SVC(kernel = 'linear')
clf.fit(x, y)

print clf

# get support vectors
print clf.support_vectors_

# get indices of support vectors
print clf.support_ # 支持向量点的索引

# get number of support vectors for each class
print clf.n_support_ # 每个类中有几个支持向量,yi所代表的类

把结果画出来

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
import pylab as pl
from sklearn import svm

# we create 40 separable points
X = np.r_[np.random.randn(20, 2) - [2, 2], np.random.randn(20, 2) + [2, 2]]
Y = [0]*20 +[1]*20

#fit the model
clf = svm.SVC(kernel='linear')
clf.fit(X, Y)

# get the separating hyperplane
w = clf.coef_[0] # 获得w
a = -w[0]/w[1] # 斜率
xx = np.linspace(-5, 5)
yy = a*xx - (clf.intercept_[0])/w[1] # 截距

# plot the parallels to the separating hyperplane that pass through the support vectors
b = clf.support_vectors_[0]
yy_down = a*xx + (b[1] - a*b[0])
b = clf.support_vectors_[-1]
yy_up = a*xx + (b[1] - a*b[0])

print "w: ", w
print "a: ", a

# print "xx: ", xx
# print "yy: ", yy
print "support_vectors_: ", clf.support_vectors_
print "clf.coef_: ", clf.coef_

# switching to the generic n-dimensional parameterization of the hyperplan to the 2D-specific equation
# of a line y=a.x +b: the generic w_0x + w_1y +w_3=0 can be rewritten y = -(w_0/w_1) x + (w_3/w_1)


# plot the line, the points, and the nearest vectors to the plane
pl.plot(xx, yy, 'k-')
pl.plot(xx, yy_down, 'k--')
pl.plot(xx, yy_up, 'k--')

pl.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1],
s=80, facecolors='none')
pl.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)

pl.axis('tight')
pl.show()

线性不可分的情况

要解决线性不可分的情况首先要看一个例子。

如图,左边明显不可分,但是投影到右边就可分了。因此解决线性不可分的基本思想就是投影到更高的维度。所以现在问题关键成了建立一个映射函数正确的映射到高维,然后找到超平面后再还原回原空间就可以找到超平面(其实现在超平面在原空间中是一个曲面)

核函数是为了把数据从低维到高维和减小运算量而使用的。

如果我们先解决多个类的问题,我们可以每次分成这个类和其他类,然后不断求解

人脸识别例子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
from __future__ import print_function

from time import time
import logging
import matplotlib.pyplot as plt

from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC


print(__doc__)

# Display progress logs on stdout
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')


###############################################################################
# Download the data, if not already on disk and load it as numpy arrays

lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4) #下载人脸

# introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = lfw_people.images.shape

# for machine learning we use the 2 data directly (as relative pixel
# positions info is ignored by this model)
X = lfw_people.data # 特征向量
n_features = X.shape[1] #有多少列

# the label to predict is the id of the person
y = lfw_people.target # 类
target_names = lfw_people.target_names # 所挑选的图片的人名
n_classes = target_names.shape[0] # 有多少行

print("Total dataset size:")
print("n_samples: %d" % n_samples)
print("n_features: %d" % n_features)
print("n_classes: %d" % n_classes)


###############################################################################
# Split into a training set and a test set using a stratified k fold

# split into a training and testing set
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25)
# train_test_split把实例分成训练集和测试集

###############################################################################
# Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled
# dataset): unsupervised feature extraction / dimensionality reduction
n_components = 150

print("Extracting the top %d eigenfaces from %d faces"
% (n_components, X_train.shape[0]))
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train)
print("done in %0.3fs" % (time() - t0)) # RandomizedPCA使用来降维的,因为这个维度抬高难以计算

eigenfaces = pca.components_.reshape((n_components, h, w))

print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)
print("done in %0.3fs" % (time() - t0))


###############################################################################
# Train a SVM classification model

print("Fitting the classifier to the training set")
t0 = time()
param_grid = {'C': [1e3, 5e3, 1e4, 5e4, 1e5],
'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1], }
clf = GridSearchCV(SVC(kernel='rbf', class_weight='auto'), param_grid)#核函数kernel, GridSearchCV是用来寻找最好的参数比例
clf = clf.fit(X_train_pca, y_train)
print("done in %0.3fs" % (time() - t0))
print("Best estimator found by grid search:")
print(clf.best_estimator_)


###############################################################################
# Quantitative evaluation of the model quality on the test set

print("Predicting people's names on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)
print("done in %0.3fs" % (time() - t0))

print(classification_report(y_test, y_pred, target_names=target_names))
print(confusion_matrix(y_test, y_pred, labels=range(n_classes)))


###############################################################################
# Qualitative evaluation of the predictions using matplotlib

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.8 * n_col, 2.4 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())


# plot the result of the prediction on a portion of the test set

def title(y_pred, y_test, target_names, i):
pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
return 'predicted: %s\ntrue: %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)
for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# plot the gallery of the most significative eigenfaces

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)

plt.show()

支持向量机
https://www.xinhecuican.tech/post/280b588e.html
作者
星河璀璨
发布于
2020年7月21日
许可协议