Tuesday, November 27, 2018

CNN with TensorFlow

            An Example of Creating CNN with Tensorflow  




-- Introduction of CNN  


 Convolutional Neural Networks (CNN), is a type of neural network in machine learning that is specialized in dealing with image recognition. It could find features that aren't in a specific spot such as a traffic light in a picture or words within a sentence.

 CNN's are actually inspired by the biology of the visual cortex (Thank God!), in which local receptive fields are groups of neurons that only respond to a part of what you see. The overlapping effect will create the entire visual fields, which is similar to the effect of convolution.

 A typical CNN network example will be like the picture in the top, which compost of few convolution layers to detect certain features and some pooling layer to reduce the computations in the network. In the end, we will use the softmax function to give the prediction probability to each candidate objects.

 For the detail calculation of convolutional layer, here is an example by using the filter matrix
        [ [ 0 0 1],
         [ 1 0 0],
         [ 1 1 0] ]


Note that different filtering matrix could detect different features of a given picture(data).


-- Create CNN for handwriting recognition 


 We will use MNIST dataset to train and test our CNN. The MNIST database is a large database of handwritten digits that is commonly used for training image processing systems. This data set is actually included inside of tensorflow so first let's import all the libraries we need for this task. 




Since the images were flattened in to 1D of 784 pixels, we want to reshape the data into 2D of 28*28 pixels:



Note that we actually convert it into a [28, 28, 1] data, the 1 represents the color channels. Since our data is black and white, so we only have 1 color channel, otherwise, it will be 3 for any color picture.

Then we will convert our label into categorical in the one hot format:



After we pre-processed our data, we will, then, build our CNN model!



After we added 2 convolutional layers at second the third steps, we reduced our computational complexity by using a 2*2 max pooling layer and a random drop rate of 0.25.

 In the end, we need to flatten our results into 1D and passing it into our final prediction layer, which is a dense layer of size 10 with a softmax activation function.

Then we compile our model with loss function and optimizer, note that keras actually saved us lots of time to build this CNN model.




We actually train our model with 10 epochs. The more epochs the better, but CNN is very computationally expansive to execute.



Finally, we could see that our model has over 99% accuracy! It is great, but it did take a lot of time to execute.



Reference:
https://www.mathworks.com/solutions/deep-learning/convolutional-neural-network.html


Happy Learning :_)
Ye Jiang



Kernel Methods and Basis Expansions in Machine Learning


               Kernel Methods and Basis Expansions




-- Introduction


    Use SVM or other models to deal with classification problems in machine learning(ML), our life is relatively easy when the data set is linearly separable. However, most of the real problems do not usually have such a feature and that is why kernel methods and basis expansions play a significant role in ML. (Note that if the big majority of the data is linearly separable and only a few points will affect our decision boundary, we could use soft margin SVM to deal with it.)

    So, we start with some data points that are not linearly separable, by applying kernel function we could map our data to a higher dimensional space where data become linear separable then we could easily classifier our data points in that high dimensional space.

    Mapping data to a higher dimension is, basically doing some calculation based on example's features, therefore we usually called it the primal form of kernel methods. The dural form of kernel methods, therefore, is a similarity function that corresponds to an inner product of two data points in some higher dimensional space. We usually could apply our domain knowledge to the dural form of kernel functions to define what kind of points are "similar" to each other in given a specific problem.

   The reason we want to use the basis expansion in our data is very similar to the reason we use the primal form of kernel methods to mapping our data, where we want to separate our data even when the decision boundary is not linear.


-- A Example for Basis Expansion: 


Here is a small example. Assume that we now have 2 data features x1 and x2. The decision boundary is something like x1 = x2^2 + 10, which is quadratic in 2D space. Then we extend our data to a 5D space by apply:  X= (x1, x2) --> X = (x1, x2, x1^2, x2^2, x1*x2). Then our decision boundary is linear in this 5D space. Originally x1 - x2^2 -10 = 0 --> wX + b = 0 , where w = (1, 0, 0, -1, 0) and b = -10.


-- Kernel Ridge Regression:


For a ridge linear regression, the goal is to find W that could minimize the loss function:

(a)

We could drive the optimal W* in a closed form:

(b)

After we applied kernel trick, we do not necessarily know the transformation function f(x), having the inner product of two samples in the new space suffices to give the prediction of a new given sample X(new) = X1 + X2 + ... Xn:

Y^new = W* f(x)

By substituting W* with form (b) above, but whenever we see x we change it to f(x)

--> Y^new = Y(X*X + Lambda*I)K(Xi, X)

Where Lambda is a constant, I is the identity matrix, Xi represents the different features of new input X(new).

--------------------

Then given a kernel function K( x1, x2 ) = ( 1 + x1*x2 )^2 , how do we actually perform kernel ridge regression?

-- Here is the main idea for dual form kernels:

We will apply kernel function K to train_x and test_x ( all the input data ), specifically:

kernel_trainX = K ( train_x , train_x ) 

kernel_testX = K (test_x , train_x) 

model.fit ( kernel_trainX , train_y ) 

Y^predict = model.predit( kernel_testX) 


How to perform K (train_x, train_x)?  Here is the python code for a general polynomial kernel of any degree:





-- The main idea is, again, very similar to polynomial basis expansion, except, like we discussed previously, basis expansion is more similar to a primal form of kernels.

expanded_trainX =  E( train_x) 

expanded_testX = E(test_x) 

model.fit(expanded_trainX, train_y) 

Y^predict = model.predict(expanded_testX) 

Here is an example to perform a general polynomial expansion of any degree:




After this concrete example, I hope you had a better understanding of how kernel methods an basis expansion work in ML.


Happy Exploring^^
Ye Jiang
 






Saturday, November 3, 2018

Binary Search, First/Last Occurrence and K Closest

                               Binary Search, Easy? 


   You probably already heard and wrote some pieces of code for binary search already. However, do you really know binary search well enough to handle its many variants?  Good for you if your answer is yes, but I would not say someone is a decent software engineer if he/she can not handle it well. 


1.  The Classic Binary Search: 

Find a target value in a sorted array, return its index value in the array if such target exists. Else return -1. 




2. The First Occurrence 


Given an array and a target number, return the index of the first occurrence of the target in the array or -1 if such target does not exist. 



To conclude the trick we used here: 

1. Stop when left < right-1 
2. Update right/left bound to mid instead of mid-1/mid+1 
3. Check both the left and right value in the end to give the optimal result. 


3. The last occurrence 


 Given an array and a target number, return the index of the last occurrence of the target in the array or -1 if such target does not exist. 

Could you come up with a solution based on the first occurrence we just solved?



Almost the same trick compare to the first occurrence. The only 2 differences are we consider to go right side of the array first in the while loop and check right pointer before left pointer in the end. 


4. Closest Number in Sorted Array 


Give an array and a target, find the closest number compare to the target and output its index.



1. Stop when left < right - 1 
2. Check absolute difference against target value in the end.


5. K Closest Number in Sorted Array 


Give an array and a target, find all the k closest number compare to the target and output all of them as an array.
You may assume that K is always a valid input, that is K < array. length. 




Two take away here:

1. We used the binary search technique in the helper function largestSmallerOrEqual 

2. Start from that position we expand left or right pointer to get the optimal result. The logic of how we expand the two pointers is worth practicing. 


6. Binary Search In a Sorted 2D Array 


Search in the sorted matrix, each row of the matrix is sorted in ascending order, and the first element of the row equal to or larger than the last element of the previous row. 

return the position if the target is found, otherwise return {-1. -1}. 


The trick here is virtually to convert the 2D array into a 1D array by and do the classic binary search. 

By:  row =  mid / cols;  

        col = mid % cols; 


* Bonus 7, find the median of two sorted array 


There are two sorted arrays A and B of size m and n respectively.
Find the median of the two sorted arrays. The overall runtime complexity should be O(log (m+n)).

The gist of this question is to realize that the split point i need to satisfy that 
1.  i + j =  ( m + n + 1 ) / 2   
2. A[i-1] <= B [j]   
3. B[j-1] <= A[i] 
We use binary search to find such position i in the first smaller sorted array. Therefore the runtime will be Log(Min( m, n) ). 


Next time when you need to solve a binary search related problem, hope you have a better sense -_ - 

Enjoy Coding^^ 
Ye Jiang