Machine Learning (CPSC 540): Random forests
I think we use random forests because in a real problem, there is lots of data and there are lots of features. Basically, we normally choose $\sqrt{n}$ data points from $n$ training data points and $p$ features out of $m$ features randomly to make a random tree. Then we make ensemble from those trees and this is a random forest.
Choosing subset of training data set for each tree is reasonable because there is a infinite number of data points and there is uncertainty in data. However, we can't escape from high variance for each tree because we trained tree with small amount of data and small number of features. By bias-variance trade off, we can assume that bias is low. (Question, Machine Learning (CPSC 540)) Also, we can assume that every tree is not correlated because we chose randomly. Thus, we can expect that if we average all of them, we will get good results.dd
There is a theoretical result that when $n$ tends to infinity, the result from forest is consistent while the result from tree is inconsistent. (Question, Machine Learning (CPSC 540))
There might be a smarter method than choosing features uniformly. In the spam filter example, we can assign weight to each feature by information gain. In addition, every time we build random forests, we have to decide the depth of a tree, number of features to choose, and how many data points to use for each tree, etc. How can we make a decision? We can use bayesian optimization. Let's regard a set of parameters as an input of GP, and evaluation of a random forest as an output of GP. Then, we can get a optimal random forest by optimization.
Choosing subset of training data set for each tree is reasonable because there is a infinite number of data points and there is uncertainty in data. However, we can't escape from high variance for each tree because we trained tree with small amount of data and small number of features. By bias-variance trade off, we can assume that bias is low. (Question, Machine Learning (CPSC 540)) Also, we can assume that every tree is not correlated because we chose randomly. Thus, we can expect that if we average all of them, we will get good results.dd
There is a theoretical result that when $n$ tends to infinity, the result from forest is consistent while the result from tree is inconsistent. (Question, Machine Learning (CPSC 540))
There might be a smarter method than choosing features uniformly. In the spam filter example, we can assign weight to each feature by information gain. In addition, every time we build random forests, we have to decide the depth of a tree, number of features to choose, and how many data points to use for each tree, etc. How can we make a decision? We can use bayesian optimization. Let's regard a set of parameters as an input of GP, and evaluation of a random forest as an output of GP. Then, we can get a optimal random forest by optimization.


댓글
댓글 쓰기