问

sklearn的SVC评分方法需要什么样的输入？

手机用户2502873393 发布于 2023-01-06 21:31

label

所以我正在尝试构建一个分类器并对其性能进行评分.这是我的代码:

def svc(train_data, train_labels, test_data, test_labels):
    from sklearn.svm import SVC
    from sklearn.metrics import accuracy_score
    svc = SVC(kernel='linear')
    svc.fit(train_data, train_labels)
    predicted = svc.predict(test_data)
    actual = test_labels
    score = svc.score(test_data, test_labels)
    print ('svc score')
    print (score)
    print ('svc accuracy')
    print (accuracy_score(predicted, actual))

现在当我运行函数svc(X,x,Y,y)时:

X.shape = (1000, 150)    
x.shape = (1000, )   
Y.shape = (200, 150)   
y.shape = (200, )

我收到错误:

      6     predicted = svc.predict(test_classed_data)
      7     actual = test_classed_labels
----> 8     score = svc.score(test_classed_data, test_classed_labels)
      9     print ('svc score')
     10     print (score)

local/lib/python3.4/site-packages/sklearn/base.py in score(self, X, y, sample_weight)
    289         """
    290         from .metrics import accuracy_score
--> 291         return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
    292 
    293 

    124     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
    125                        "multilabel-sequences"]):
--> 126         raise ValueError("{0} is not supported".format(y_type))
    127 
    128     if y_type in ["binary", "multiclass"]:

ValueError: continuous is not supported

事情是我的test_labels或y格式为:

[ 15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  15.5  20.5
  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  20.5  25.5  25.5
  25.5  25.5  25.5  25.5  25.5  25.5  25.5  25.5  25.5  30.5  30.5  30.5
  30.5  30.5  30.5  30.5  30.5  30.5  30.5  30.5  35.5  35.5  35.5  35.5
  35.5  35.5  35.5  35.5  35.5  35.5  35.5... ]

我真的很困惑,为什么SVC不能将这些识别为离散标签,因为我所看到的所有示例都具有相似的格式以供使用并且工作正常.请帮忙.

1 个回答

将y同时在fit和score职能应该是整数或字符串,代表类的标签.

例如,如果您有两个类"foo"和1,你可以训练一个SVM像这样:

>>> from sklearn.svm import SVC
>>> clf = SVC()
>>> X = np.random.randn(10, 4)
>>> y = ["foo"] * 5 + [1] * 5
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

然后测试其准确性

>>> X_test = np.random.randn(6, 4)
>>> y_test = ["foo", 1] * 3
>>> clf.score(X_test, y_test)
0.5

浮点值显然仍被接受fit,但它们不应该被接受,因为类标签不应该是实际值.

2023-01-06 21:39 回答

劲朋_511

撰写答案

今天，你开发时遇到什么问题呢？

立即提问

热门标签