*gradient descent*, which we're going to illustrate here. To start with, let's consider a simple function with closed-form solution given by \begin{equation} f(\beta) \triangleq \beta^4 - 3\beta^3 + 2. \end{equation} We want to minimize this function with respect to $\beta$. The quick solution to this, as what calculus taught us, is to compute for the first derivative of the function, that is \begin{equation} \frac{\text{d}f(\beta)}{\text{d}\beta}=4\beta^3-9\beta^2. \end{equation} Setting this to 0 to obtain the stationary point gives us \begin{align} \frac{\text{d}f(\beta)}{\text{d}\beta}&\overset{\text{set}}{=}0\nonumber\\ 4\hat{\beta}^3-9\hat{\beta}^2&=0\nonumber\\ 4\hat{\beta}^3&=9\hat{\beta}^2\nonumber\\ 4\hat{\beta}&=9\nonumber\\ \hat{\beta}&=\frac{9}{4}. \end{align} The following plot shows the minimum of the function at $\hat{\beta}=\frac{9}{4}$ (red line in the plot below).

*R Script*Now let's consider minimizing this problem using gradient descent with the following algorithm:

- Initialize $\mathbf{x}_{r},r=0$
**while**$\lVert \mathbf{x}_{r}-\mathbf{x}_{r+1}\rVert > \nu$- $\mathbf{x}_{r+1}\leftarrow \mathbf{x}_{r} - \gamma\nabla f(\mathbf{x}_r)$
- $r\leftarrow r + 1$
**end while****return**$\mathbf{x}_{r}$ and $r$

*gradient*of the cost function, $\gamma$ is the

*learning-rate*parameter of the algorithm, and $\nu$ is the

*precision*parameter. For the function above, let the initial guess be $\hat{\beta}_0=4$ and $\gamma=.001$ with $\nu=.00001$. Then $\nabla f(\hat{\beta}_0)=112$, so that \[\hat{\beta}_1=\hat{\beta}_0-.001(112)=3.888.\] And $|\hat{\beta}_1 - \hat{\beta}_0| = 0.112> \nu$. Repeat the process until at some $r$, $|\hat{\beta}_{r}-\hat{\beta}_{r+1}| \ngtr \nu$. It will turn out that 350 iterations are needed to satisfy the desired inequality, the plot of which is in the following figure with estimated minimum $\hat{\beta}_{350}=2.250483\approx\frac{9}{4}$.

*R Script with Plot*

*Python Script*Obviously the convergence is slow, and we can adjust this by tuning the learning-rate parameter, for example if we try to increase it into $\gamma=.01$ (change

`gamma`

to .01 in the codes above) the algorithm will converge at 42nd iteration. To support that claim, see the steps of its gradient in the plot below.`beta_new`

to .1) with $\gamma=.01$, the algorithm converges at 173rd iteration with estimate $\hat{\beta}_{173}=2.249962\approx\frac{9}{4}$ (see the plot below).*R Script with Contour Plot*

*Python Script*Notice that I did not use ggplot for the contour plot, this is because the plot needs to be updated 23,374 times just to accommodate for the arrows for the trajectory of the gradient vectors, and ggplot is just slow. Finally, we can also visualize the gradient points on the surface as shown in the following figure.

*R Script*In my future blog post, I hope to apply this algorithm on statistical models like linear/nonlinear regression models for simple illustration.

## No comments:

## Post a Comment