Problem
Suppose we use a linear SVM classifier for a binary classification problem with a set of data points shown in Figure 1 below, where the samples closest to the boundary are illustrated: samples with positive labels are (1, 3), (1, 2), (-1, 2), (-2, 0), and samples with negative labels are (2, 1), (1, 0), (0, -1), (1, -2).
1. If a new sample (0.5, -0.5) comes as a positive sample, will the decision boundary change? If so, what method should you use in this case?
2. In the soft margin SVM method, C is a hyperparameter (see Eqs. 14.10 or 14.11 in Chapter 14.3 of the textbook). What would happen when you use a very large value of C? How about using a very small one?
3. In real-world applications, how would you decide which SVM methods to use (hard margin vs. soft margin, linear vs. kernel)?