Machine Learning (CPSC 540): Deep Learning

In this lecture, the professor used image recognition to introduce deep learning. I think the model for learning image is an imitation of heuristics of recognizing image. When we try to imagine what we saw, we combine some features to restore that. That is exactly what autoencoder does.
Encoding is abstraction process. It extract features from image data. Next, decoding is restoration process. It combine features to get an image. Why is it plausible to use same weight vector for encoding and decoding?
Consider the above linear autoencoder and the image. The activation values are $0.3, 0.5, 0.7, 0, 0, 0$ because there is $30\%$ of $W_1$, $50\%$ of $W_2$, $70\%$ of $W_3$ in the image. Here, we can interpret $W_i$ as a feature. Since we got features of the image, we can restore the image with features. Remember that there is $30\%$ of $W_1$, $50\%$ of $W_2$, $70\%$ of $W_3$ in the image. Then, we can restore the image by $0.3W_1 + 0.5W_2 + 0.7W_3$ and the restored image is an abstraction of the original image.
I wonder why autoencoder extracts local features rather than global feature. I will compare autoencoder and PCA, which extracts global features. I think this happens because of regularizer of autoencoder. I will program it and test it later. (Question, Machine Learning (CPSC 540))

Next, I think pooling is a summarize step. If we choose maximum of activation values, this means we memorize well if there is strong impact. In the lecture, it is said that pooling strengthen robustness of the model but I have to figure it out more. I also want to know how many and which filter should we choose for L2 pooling. I will find how LCN works in vision too. (Question, Machine Learning (CPSC 540))

Finally, Greedy layer-wise training method is very similar to how we learn. For example, we can learn edges and teeth in Africa and in Canada, we can learn which animal is dangerous based on our previous learning in Africa.
To put it concretely, in the first stage, it breaks image into parts (= learning edges). Then, it groups the parts (= learning teeth) in the second stage. Last, based on these, it learns by logistic regression with softmax classifier (= learning which animal is dangerous).

There are so many useful papers in the lecture slide. I will read them later. Especially, I will read a paper of google autoencoder and think about the statistical problem of how to distinguish likelihood and prior in the objective function. (Question, Machine Learning (CPSC 540))

댓글