Kernel Methods and Basis Expansions
-- Introduction
So, we start with some data points that are not linearly separable, by applying kernel function we could map our data to a higher dimensional space where data become linear separable then we could easily classifier our data points in that high dimensional space.
Mapping data to a higher dimension is, basically doing some calculation based on example's features, therefore we usually called it the primal form of kernel methods. The dural form of kernel methods, therefore, is a similarity function that corresponds to an inner product of two data points in some higher dimensional space. We usually could apply our domain knowledge to the dural form of kernel functions to define what kind of points are "similar" to each other in given a specific problem.
The reason we want to use the basis expansion in our data is very similar to the reason we use the primal form of kernel methods to mapping our data, where we want to separate our data even when the decision boundary is not linear.
-- A Example for Basis Expansion:
-- Kernel Ridge Regression:
For a ridge linear regression, the goal is to find W that could minimize the loss function:
(a)
We could drive the optimal W* in a closed form:
(b)
After we applied kernel trick, we do not necessarily know the transformation function f(x), having the inner product of two samples in the new space suffices to give the prediction of a new given sample X(new) = X1 + X2 + ... Xn:
Y^new = W* f(x)
By substituting W* with form (b) above, but whenever we see x we change it to f(x)
--> Y^new = Y(X*X + Lambda*I)K(Xi, X)
Where Lambda is a constant, I is the identity matrix, Xi represents the different features of new input X(new).
--------------------
Then given a kernel function K( x1, x2 ) = ( 1 + x1*x2 )^2 , how do we actually perform kernel ridge regression?
-- Here is the main idea for dual form kernels:
We will apply kernel function K to train_x and test_x ( all the input data ), specifically:
kernel_trainX = K ( train_x , train_x )
kernel_testX = K (test_x , train_x)
model.fit ( kernel_trainX , train_y )
Y^predict = model.predit( kernel_testX)
How to perform K (train_x, train_x)? Here is the python code for a general polynomial kernel of any degree:
-- The main idea is, again, very similar to polynomial basis expansion, except, like we discussed previously, basis expansion is more similar to a primal form of kernels.
expanded_trainX = E( train_x)
expanded_testX = E(test_x)
model.fit(expanded_trainX, train_y)
Y^predict = model.predict(expanded_testX)
Here is an example to perform a general polynomial expansion of any degree:
After this concrete example, I hope you had a better understanding of how kernel methods an basis expansion work in ML.
Happy Exploring^^
Ye Jiang
No comments:
Post a Comment