Tuesday, 22 December 2015

R and Python: Gradient Descent

One of the problems often dealt in Statistics is minimization of the objective function. And contrary to the linear models, there is no analytical solution for models that are nonlinear on the parameters such as logistic regression, neural networks, and nonlinear regression models (like Michaelis-Menten model). In this situation, we have to use mathematical programming or optimization. And one popular optimization algorithm is the gradient descent, which we're going to illustrate here. To start with, let's consider a simple function with closed-form solution given by $$f(\beta) \triangleq \beta^4 - 3\beta^3 + 2.$$ We want to minimize this function with respect to $\beta$. The quick solution to this, as what calculus taught us, is to compute for the first derivative of the function, that is $$\frac{\text{d}f(\beta)}{\text{d}\beta}=4\beta^3-9\beta^2.$$ Setting this to 0 to obtain the stationary point gives us \begin{align} \frac{\text{d}f(\beta)}{\text{d}\beta}&\overset{\text{set}}{=}0\nonumber\\ 4\hat{\beta}^3-9\hat{\beta}^2&=0\nonumber\\ 4\hat{\beta}^3&=9\hat{\beta}^2\nonumber\\ 4\hat{\beta}&=9\nonumber\\ \hat{\beta}&=\frac{9}{4}. \end{align}

Tuesday, 15 December 2015

R and Python: Theory of Linear Least Squares

In my previous article, we talked about implementations of linear regression models in R, Python and SAS. On the theoretical sides, however, I briefly mentioned the estimation procedure for the parameter $\boldsymbol{\beta}$. So to help us understand how software does the estimation procedure, we'll look at the mathematics behind it. We will also perform the estimation manually in R and in Python, that means we're not going to use any special packages, this will help us appreciate the theory.

Linear Least Squares

Consider the linear regression model, $y_i=f_i(\mathbf{x}|\boldsymbol{\beta})+\varepsilon_i,\quad\mathbf{x}_i=\left[ \begin{array}{cccc} 1&x_{11}&\cdots&x_{1p} \end{array}\right],\quad\boldsymbol{\beta}=\left[\begin{array}{c}\beta_0\\\beta_1\\\vdots\\\beta_p\end{array}\right],$ where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,\cdots, N$. The $f_i(\mathbf{x}|\boldsymbol{\beta})$ is the deterministic part of the model that depends on both the parameters $\boldsymbol{\beta}\in\mathbb{R}^{p+1}$ and the predictor variable $\mathbf{x}_i$, which in matrix form, say $\mathbf{X}$, is represented as follows $\mathbf{X}=\left[ \begin{array}{cccccc} 1&x_{11}&\cdots&x_{1p}\\ 1&x_{21}&\cdots&x_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ 1&x_{N1}&\cdots&x_{Np}\\ \end{array} \right].$