Tuesday, November 27, 2018

Kernel Methods and Basis Expansions in Machine Learning


               Kernel Methods and Basis Expansions




-- Introduction


    Use SVM or other models to deal with classification problems in machine learning(ML), our life is relatively easy when the data set is linearly separable. However, most of the real problems do not usually have such a feature and that is why kernel methods and basis expansions play a significant role in ML. (Note that if the big majority of the data is linearly separable and only a few points will affect our decision boundary, we could use soft margin SVM to deal with it.)

    So, we start with some data points that are not linearly separable, by applying kernel function we could map our data to a higher dimensional space where data become linear separable then we could easily classifier our data points in that high dimensional space.

    Mapping data to a higher dimension is, basically doing some calculation based on example's features, therefore we usually called it the primal form of kernel methods. The dural form of kernel methods, therefore, is a similarity function that corresponds to an inner product of two data points in some higher dimensional space. We usually could apply our domain knowledge to the dural form of kernel functions to define what kind of points are "similar" to each other in given a specific problem.

   The reason we want to use the basis expansion in our data is very similar to the reason we use the primal form of kernel methods to mapping our data, where we want to separate our data even when the decision boundary is not linear.


-- A Example for Basis Expansion: 


Here is a small example. Assume that we now have 2 data features x1 and x2. The decision boundary is something like x1 = x2^2 + 10, which is quadratic in 2D space. Then we extend our data to a 5D space by apply:  X= (x1, x2) --> X = (x1, x2, x1^2, x2^2, x1*x2). Then our decision boundary is linear in this 5D space. Originally x1 - x2^2 -10 = 0 --> wX + b = 0 , where w = (1, 0, 0, -1, 0) and b = -10.


-- Kernel Ridge Regression:


For a ridge linear regression, the goal is to find W that could minimize the loss function:

(a)

We could drive the optimal W* in a closed form:

(b)

After we applied kernel trick, we do not necessarily know the transformation function f(x), having the inner product of two samples in the new space suffices to give the prediction of a new given sample X(new) = X1 + X2 + ... Xn:

Y^new = W* f(x)

By substituting W* with form (b) above, but whenever we see x we change it to f(x)

--> Y^new = Y(X*X + Lambda*I)K(Xi, X)

Where Lambda is a constant, I is the identity matrix, Xi represents the different features of new input X(new).

--------------------

Then given a kernel function K( x1, x2 ) = ( 1 + x1*x2 )^2 , how do we actually perform kernel ridge regression?

-- Here is the main idea for dual form kernels:

We will apply kernel function K to train_x and test_x ( all the input data ), specifically:

kernel_trainX = K ( train_x , train_x ) 

kernel_testX = K (test_x , train_x) 

model.fit ( kernel_trainX , train_y ) 

Y^predict = model.predit( kernel_testX) 


How to perform K (train_x, train_x)?  Here is the python code for a general polynomial kernel of any degree:





-- The main idea is, again, very similar to polynomial basis expansion, except, like we discussed previously, basis expansion is more similar to a primal form of kernels.

expanded_trainX =  E( train_x) 

expanded_testX = E(test_x) 

model.fit(expanded_trainX, train_y) 

Y^predict = model.predict(expanded_testX) 

Here is an example to perform a general polynomial expansion of any degree:




After this concrete example, I hope you had a better understanding of how kernel methods an basis expansion work in ML.


Happy Exploring^^
Ye Jiang
 






No comments:

Post a Comment