Machine Learning (CPSC 540): linear prediction
The basic of supervised machine learning is all about finding an optimal function with a given training dataset.
Formally, for an input space $X^n$ and a output space $Y$ with a training dataset of n instances of input-output pairs $(\mathbb{x}_i, y_i)_{i=1}^{n}$ we find a hypothesis $h:X^n \to Y$ in the hypothesis class $H$ such that $\mathrm{argmin}_{h \in H} J(h)$ where $J$ is the cost function.
The linear model represents the core concept of supervised machine learning well. For linear model, we define the hypothesis class $H = {\sigma_{i=0}^{n}w_ix_i}$ where $x_0=1$ is a bias (or offset) under the assumption that there is no outliers and the data follows linear pattern. (For non-linear case, we have to use another setting.) In matrix form, the expression for the linear model is $\hat{\mathbb{y}} = X\mathbb{w}$. If we define the mean square error for the cost function, we can find a hypothesis of this model easily by normal equation $\hat{\mathbb{w}} = (X^TX)^{-1}X^T\mathbb{y}$.
Finally, I want to talk about dealing with uncertainty. Even though we can find the best hypothesis, there is always some uncertainty. I want to show this by example.
Formally, for an input space $X^n$ and a output space $Y$ with a training dataset of n instances of input-output pairs $(\mathbb{x}_i, y_i)_{i=1}^{n}$ we find a hypothesis $h:X^n \to Y$ in the hypothesis class $H$ such that $\mathrm{argmin}_{h \in H} J(h)$ where $J$ is the cost function.
The linear model represents the core concept of supervised machine learning well. For linear model, we define the hypothesis class $H = {\sigma_{i=0}^{n}w_ix_i}$ where $x_0=1$ is a bias (or offset) under the assumption that there is no outliers and the data follows linear pattern. (For non-linear case, we have to use another setting.) In matrix form, the expression for the linear model is $\hat{\mathbb{y}} = X\mathbb{w}$. If we define the mean square error for the cost function, we can find a hypothesis of this model easily by normal equation $\hat{\mathbb{w}} = (X^TX)^{-1}X^T\mathbb{y}$.
Finally, I want to talk about dealing with uncertainty. Even though we can find the best hypothesis, there is always some uncertainty. I want to show this by example.
For above example, assume that the line is the best linear hypothesis for the data set. As you can see, there is uncertainty which is marked by red.
Next example shows the existence of huge amount of uncertainty even if we found the best linear prediction.
Then, how can we deal with this huge amount of uncertainty? Let's change hypothesis class from linear to quadratic. Then, we can see that the uncertainty become smaller.
By combining above results, we can conclude that how to choose hypothesis class is important for dealing with uncertainty.



댓글
댓글 쓰기