Multiclass SVM

Crammer and Singer (2001) have extended the binary SVM classifier to classification problems with more than two classes. The training problem of the Crammer-Singer multiclass SVM can be expressed as a QP

(1)\[\begin{split}\begin{array}{ll} \mbox{maximize} & -(1/2) \mathbf{Tr}(U^TQU) + \mathbf{Tr}(E^TU) \\ \mbox{subject to} & U \preceq \gamma E \\ & U \mathbf{1}_m = 0 \end{array}\end{split}\]

with variable \(U \in \mathbf{R}^{N \times m}\) where \(N\) is the number of training examples and \(m\) the number of classes. The \(N \times N\) kernel matrix \(Q\) is given by \(Q_{ij} = K(x_i, x_j)\) where \(K\) is a kernel function and \(x_i^T\) is the i’th row of the \(N \times n\) data matrix \(X\), and \(d\) is an \(N\)-vector with labels (i.e. \(d_i \in \{ 0,1,\ldots,m-1 \}\)).


A custom solver for the multiclass support vector machine training problem is available as a Python module mcsvm. The module implements the following function:

mcsvm(X, labels, gamma, kernel='linear', sigma=1.0, degree=1)

Solves the problem (1) using a custom KKT solver.

The input arguments are the data matrix \(X\) (with the \(N\) training examples \(x_i^T\) as rows), the label vector \(d\), and the positive parameter \(\gamma\).

Valid kernel functions are:


the linear kernel: \(K(x_i,x_j) = x_i^Tx_j\)


the polynomial kernel: \(K(x_i,x_j) = (x_i^Tx_j/\sigma)^d\)

The kernel parameters \(\sigma\) and \(d\) are specified using the input arguments sigma and degree, respectively.

Returns the function classifier(). If \(Y\) is \(M \times n\) then classifier(Y) returns a list with as its k’th element

\[\operatorname*{arg\,max}_{j=0,\ldots,m-1} \sum_{i=1}^N U_{ij} K(x_i, y_k)\]

where \(y_k^T\) is row \(k\) of \(Y\), \(x_i^T\) is row \(k\) of \(X\), and \(U\) is the optimal solution of the QP (1).