SVM classifies data by finding the optimal hyperplane that maximizes the margin between different classes. The kernel trick enables SVM to work in high-dimensional feature spaces using polynomial, radial basis function (RBF), and sigmoid kernels. The cost parameter (C) controls trade-offs between margin maximization and misclassification.