所以我正在尝试构建一个分类器并对其性能进行评分.这是我的代码:
def svc(train_data, train_labels, test_data, test_labels): from sklearn.svm import SVC from sklearn.metrics import accuracy_score svc = SVC(kernel='linear') svc.fit(train_data, train_labels) predicted = svc.predict(test_data) actual = test_labels score = svc.score(test_data, test_labels) print ('svc score') print (score) print ('svc accuracy') print (accuracy_score(predicted, actual))
现在当我运行函数svc(X,x,Y,y)时:
X.shape = (1000, 150) x.shape = (1000, ) Y.shape = (200, 150) y.shape = (200, )
我收到错误:
6 predicted = svc.predict(test_classed_data) 7 actual = test_classed_labels ----> 8 score = svc.score(test_classed_data, test_classed_labels) 9 print ('svc score') 10 print (score) local/lib/python3.4/site-packages/sklearn/base.py in score(self, X, y, sample_weight) 289 """ 290 from .metrics import accuracy_score --> 291 return accuracy_score(y, self.predict(X), sample_weight=sample_weight) 292 293 124 if (y_type not in ["binary", "multiclass", "multilabel-indicator", 125 "multilabel-sequences"]): --> 126 raise ValueError("{0} is not supported".format(y_type)) 127 128 if y_type in ["binary", "multiclass"]: ValueError: continuous is not supported
事情是我的test_labels或y格式为:
[ 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 15.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 20.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 25.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 30.5 35.5 35.5 35.5 35.5 35.5 35.5 35.5 35.5 35.5 35.5 35.5... ]
我真的很困惑,为什么SVC不能将这些识别为离散标签,因为我所看到的所有示例都具有相似的格式以供使用并且工作正常.请帮忙.
将y
同时在fit
和score
职能应该是整数或字符串,代表类的标签.
例如,如果您有两个类"foo"
和1
,你可以训练一个SVM像这样:
>>> from sklearn.svm import SVC >>> clf = SVC() >>> X = np.random.randn(10, 4) >>> y = ["foo"] * 5 + [1] * 5 >>> clf.fit(X, y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)
然后测试其准确性
>>> X_test = np.random.randn(6, 4) >>> y_test = ["foo", 1] * 3 >>> clf.score(X_test, y_test) 0.5
浮点值显然仍被接受fit
,但它们不应该被接受,因为类标签不应该是实际值.