SVM logo

Web Interface

The support vector machine (SVM) algorithm learns to distinguish between two given classes of data. This page allows you to train an SVM on a labeled training set and then use the trained SVM to make predictions about the classifications of an unlabeled test set.

We appreciate suggestions or bug reports.

There is more documentation on file formats and the SVM algorithm plus a FAQ page.


Please note there are some limitations to how you can use this site. Very large or long-running jobs cannot be run. The number of concurrent or waiting jobs is limited. If you have large data sets, or want to run the SVM many times, try the command line tools. The command line tools also give you access to additional features of the software such as feature selection and built-in cross-validation.

Inputs

Instead of uploading your own files, you can run the SVM using a demonstration data set by checking this box and clicking the 'submit' button:

Important! 90% of problems users have with the site are due to problems with input file formats. Please look closely at the file format documentation before attempting a run. Thank you.

  1. Training data    File containing a set of fixed-length, real-valued vectors that will serve as the training set:

    This file should be a tab-delimited text file. The first row should contain feature names, and the first column should contain example names. The rest of the file should consist of a matrix of numbers, with each row of the matrix corresponding to an example. The limit on the total number of values in the file (rows * columns) is 1000000. Here is an example matrix containing 10 vectors with 4 features in each vector.

  2. Class labels    File containing the training set classification labels:

    This file should be a tab-delimited text file containing two columns. The first column should contain the same example identifiers as the training set file, in the same order. The second column should contain class labels (1 for the positive class, and -1 for the negative class). Here is an example.

  3. Test data    File containing the test set vectors:

    This file is similar in format to the training set. It should contain the same number of features as the training set, but may contain different examples. The data must not have any missing values. The limit on the total number of values in the file (rows * columns) is 1000000. Here is an example.

After you click the submit button, the server will produce two output files:

  1. The training set classifications. This tab-delimited output file will contain, for each training set member, the predicted class given by the trained SVM, as well as the corresponding discriminant value, which is proportional to the distance between the given example and the separating hyperplane. It also contains the weights for each example, which is the information used to classify test examples. Training examples with non-zero weights are support vectors.
  2. The test set classifications, similar to the training set classifications file (but with no weights).

You may click now to use the default SVM training options.

Training options

You may select alternate options from the following list:

Data adjustment and processing

These options control how the input data is treated. Adjusting the data so the mean is zero and variance is one can help performance if your variables are heterogenous in scale.

Kernel settings

The default kernel uses the simple dot product, making the SVM a linear classifier (the feature space and the input space are equivalent). You may change the kernel to be a polynomial or radial basis function by setting the following options. Kernel matrix normalization (performed by default) adjusts all points to lie on the surface of the unit hypersphere in the feature space.

Soft margin options

The soft margin allows errors during training, which can both allow the SVM to find solutions with noisy data and help prevent overfitting and improve generalization performance. You may use either the one-norm or two-norm soft margin by setting the following options. The two-norm soft margin is the default. Enabling the one-norm soft margin by setting the constraint options disables the two-norm soft margin.

Miscellaneous options

Send reports of problems to paul@chibi.ubc.ca

The svm software was developed by William Stafford Noble in the Department of Genome Sciences and Computer Science at the University of Washington and Paul Pavlidis (University of British Columbia). The web server was built and is maintained by Paul Pavlidis (paul\@chibi.ubc.ca), with contributions from Ilan Wapinski, Andrew Liu and Phan Lu. The project was funded by National Science Foundation grants DBI-0078523 and ISI-0093302.