# Multiclass SVM

Crammer and Singer (2001) have extended the binary SVM classifier to classification problems with more than two classes. The training problem of the Crammer-Singer multiclass SVM can be expressed as a QP

(1)$\begin{split}\begin{array}{ll} \mbox{maximize} & -(1/2) \mathbf{Tr}(U^TQU) + \mathbf{Tr}(E^TU) \\ \mbox{subject to} & U \preceq \gamma E \\ & U \mathbf{1}_m = 0 \end{array}\end{split}$

with variable $$U \in \mathbf{R}^{N \times m}$$ where $$N$$ is the number of training examples and $$m$$ the number of classes. The $$N \times N$$ kernel matrix $$Q$$ is given by $$Q_{ij} = K(x_i, x_j)$$ where $$K$$ is a kernel function and $$x_i^T$$ is the i’th row of the $$N \times n$$ data matrix $$X$$, and $$d$$ is an $$N$$-vector with labels (i.e. $$d_i \in \{ 0,1,\ldots,m-1 \}$$).

### Documentation

A custom solver for the multiclass support vector machine training problem is available as a Python module mcsvm. The module implements the following function:

mcsvm(X, labels, gamma, kernel='linear', sigma=1.0, degree=1)

Solves the problem (1) using a custom KKT solver.

The input arguments are the data matrix $$X$$ (with the $$N$$ training examples $$x_i^T$$ as rows), the label vector $$d$$, and the positive parameter $$\gamma$$.

Valid kernel functions are:

'linear'

the linear kernel: $$K(x_i,x_j) = x_i^Tx_j$$

'poly'

the polynomial kernel: $$K(x_i,x_j) = (x_i^Tx_j/\sigma)^d$$

The kernel parameters $$\sigma$$ and $$d$$ are specified using the input arguments sigma and degree, respectively.

Returns the function classifier(). If $$Y$$ is $$M \times n$$ then classifier(Y) returns a list with as its k’th element

$\operatorname*{arg\,max}_{j=0,\ldots,m-1} \sum_{i=1}^N U_{ij} K(x_i, y_k)$

where $$y_k^T$$ is row $$k$$ of $$Y$$, $$x_i^T$$ is row $$k$$ of $$X$$, and $$U$$ is the optimal solution of the QP (1).