Support Vector Machine

Support Vector Machine

Support Vector Machine (SVM) is a strong supervised machine learning technique that may be used for both classification and regression applications. Its ability to handle high-dimensional data, generalisation to complicated datasets, and resistance against overfitting make it frequently utilised in various applications.

The primary principle underlying SVM is to determine the best hyperplane in feature space that separates data points into distinct groups. The hyperplane is the decision border that maximises the margin between the two classes, allowing for superior generalisation and performance on previously unknown data.

The components of SVM include:

  1. Data: The labelled dataset comprises input features and class labels. A feature vector represents each data point.
  2. Feature space: Each data point is represented by its features in an n-dimensional space. The dimension count (n) is the number of attributes used to describe each data piece.
  3. Hyperplane: A hyperplane is a straight line in two dimensions that separates the data points of one class from the data points of another. It is a hyperplane or a flat affine subspace in higher-dimensional spaces. The SVM seeks the best hyperplane that maximises the margin between classes.
  4. Margin: The margin is the distance between the hyperplane and the nearest data points in each class, which are known as support vectors. SVM seeks the hyperplane with the greatest margin, leading to higher generalisation on unseen data.
  5. Support vectors: These are the data points closest to the hyperplane and have the most influence in determining the decision boundary. Support vectors are key to computing the margin and serve as the SVM model’s backbone.
  6. Kernel function: SVM can handle non-linearly separable data using a kernel function. The kernel function automatically translates the data into a higher-dimensional feature space from which it can be linearly separated. Kernel functions that are commonly used include linear, polynomial, radial basis function (RBF), and sigmoid kernels.
  7. Optimisation algorithm: The SVM optimisation algorithm seeks the ideal hyperplane by maximising margins and minimising classification errors. A quadratic programming problem must be solved to find the weights and biases of the hyperplane.
  8. Regularisation parameter (C): The regularisation parameter (abbreviated C) governs the trade-off between maximising the margin and minimising classification error. A higher C value indicates that the model allows for more misclassifications to obtain a narrower margin, which may result in overfitting.
  9. Decision function: Following training, the SVM creates a decision function that takes in incoming data points and predicts their class labels based on their position with respect to the learned hyperplane.
  10. Kernel Trick Parameters: Depending on the kernel used, additional parameters may be provided for SVM with a kernel function. The polynomial kernel, for example, may include a degree parameter, while the RBF kernel may have a gamma parameter that influences the shape of the kernel.

Understanding these components aids in training and fine-tuning SVM models for various classification problems. The right selection of the kernel function and control of the hyperparameters are critical in obtaining good performance and generalisation in SVM applications.