Skip to main content

Posts

R and Python: Gradient Descent

One of the problems often dealt in Statistics is minimization of the objective function. And contrary to the linear models, there is no analytical solution for models that are nonlinear on the parameters such as logistic regression, neural networks, and nonlinear regression models (like Michaelis-Menten model). In this situation, we have to use mathematical programming or optimization. And one popular optimization algorithm is the gradient descent, which we're going to illustrate here. To start with, let's consider a simple function with closed-form solution given by \begin{equation} f(\beta) \triangleq \beta^4 - 3\beta^3 + 2. \end{equation} We want to minimize this function with respect to $\beta$. The quick solution to this, as what calculus taught us, is to compute for the first derivative of the function, that is \begin{equation} \frac{\text{d}f(\beta)}{\text{d}\beta}=4\beta^3-9\beta^2. \end{equation} Setting this to 0 to obtain the stationary point gives us \begin{align}…
Recent posts

R and Python: Theory of Linear Least Squares

In my previous article, we talked about implementations of linear regression models in R, Python and SAS. On the theoretical sides, however, I briefly mentioned the estimation procedure for the parameter $\boldsymbol{\beta}$. So to help us understand how software does the estimation procedure, we'll look at the mathematics behind it. We will also perform the estimation manually in R and in Python, that means we're not going to use any special packages, this will help us appreciate the theory.

Linear Least Squares Consider the linear regression model, \[ y_i=f_i(\mathbf{x}|\boldsymbol{\beta})+\varepsilon_i,\quad\mathbf{x}_i=\left[ \begin{array}{cccc} 1&x_{11}&\cdots&x_{1p} \end{array}\right],\quad\boldsymbol{\beta}=\left[\begin{array}{c}\beta_0\\\beta_1\\\vdots\\\beta_p\end{array}\right], \] where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,\cdots, N$. The $f_i(\mathbf{x}|\boldsymbol{\beta})$ is the deterministic part of the model that dep…

R, Python, and SAS: Getting Started with Linear Regression

Consider the linear regression model, $$ y_i=f_i(\boldsymbol{x}|\boldsymbol{\beta})+\varepsilon_i, $$ where $y_i$ is the response or the dependent variable at the $i$th case, $i=1,\cdots, N$ and the predictor or the independent variable is the $\boldsymbol{x}$ term defined in the mean function $f_i(\boldsymbol{x}|\boldsymbol{\beta})$. For simplicity, consider the following simple linear regression (SLR) model, $$ y_i=\beta_0+\beta_1x_i+\varepsilon_i. $$ To obtain the (best) estimate of $\beta_0$ and $\beta_1$, we solve for the least residual sum of squares (RSS) given by, $$ S=\sum_{i=1}^{n}\varepsilon_i^2=\sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2. $$ Now suppose we want to fit the model to the following data, Average Heights and Weights for American Women, where weight is the response and height is the predictor. The data is available in R by default.

Parametric Inference: Karlin-Rubin Theorem

A family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ for a univariate random variable $T$ with real-valued parameter $\theta$ has a monotone likelihood ratio (MLR) if, for every $\theta_2>\theta_1$, $g(t|\theta_2)/g(t|\theta_1)$ is a monotone (nonincreasing or nondecreasing) function of $t$ on $\{t:g(t|\theta_1)>0\;\text{or}\;g(t|\theta_2)>0\}$. Note that $c/0$ is defined as $\infty$ if $0< c$. Consider testing $H_0:\theta\leq \theta_0$ versus $H_1:\theta>\theta_0$. Suppose that $T$ is a sufficient statistic for $\theta$ and the family of pdfs or pmfs $\{g(t|\theta):\theta\in\Theta\}$ of $T$ has an MLR. Then for any $t_0$, the test that rejects $H_0$ if and only if $T >t_0$ is a UMP level $\alpha$ test, where $\alpha=P_{\theta_0}(T >t_0)$. Example 1
To better understand the theorem, consider a single observation, $X$, from $\mathrm{n}(\theta,1)$, and test the following hypotheses: $$ H_0:\theta\leq \theta_0\quad\mathrm{versus}\quad H_1:\theta>\theta_0. …

Parametric Inference: Likelihood Ratio Test Problem 2

More on Likelihood Ratio Test, the following problem is originally from Casella and Berger (2001), exercise 8.12.

Problem For samples of size $n=1,4,16,64,100$ from a normal population with mean $\mu$ and known variance $\sigma^2$, plot the power function of the following LRTs (Likelihood Ratio Tests). Take $\alpha = .05$. $H_0:\mu\leq 0$ versus $H_1:\mu>0$$H_0:\mu=0$ versus $H_1:\mu\neq 0$ Solution The LRT statistic is given by $$ \lambda(\mathbf{x})=\frac{\displaystyle\sup_{\mu\leq 0}\mathcal{L}(\mu|\mathbf{x})}{\displaystyle\sup_{-\infty<\mu<\infty}\mathcal{L}(\mu|\mathbf{x})}, \;\text{since }\sigma^2\text{ is known}. $$ The denominator can be expanded as follows: $$ \begin{aligned} \sup_{-\infty<\mu<\infty}\mathcal{L}(\mu|\mathbf{x})&=\sup_{-\infty<\mu<\infty}\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp\left[-\frac{(x_i-\mu)^2}{2\sigma^2}\right]\\ &=\sup_{-\infty<\mu<\infty}\frac{1}{(2\pi\sigma^2)^{1/n}}\exp\left[-\displaystyle\sum_{i=1}^{n}\…

Parametric Inference: Likelihood Ratio Test Problem 1

Another post for mathematical statistics, the problem below is originally from Casella and Berger (2001) (see Reference 1), exercise 8.6.

ProblemSuppose that we have two independent random samples $X_1,\cdots, X_n$ are exponential($\theta$), and $Y_1,\cdots, Y_m$ are exponential($\mu$). Find the LRT (Likelihood Ratio Test) of $H_0:\theta=\mu$ versus $H_1:\theta\neq\mu$. Show that the test in part (a) can be based on the statistic $$ T=\frac{\sum X_i}{\sum X_i+\sum Y_i}. $$ Find the distribution of $T$ when $H_0$ is true.Solution The Likelihood Ratio Test is given by $$ \lambda(\mathbf{x},\mathbf{y}) = \frac{\displaystyle\sup_{\theta = \mu,\mu>0}\mathrm{P}(\mathbf{x},\mathbf{y}|\theta,\mu)}{\displaystyle\sup_{\theta > 0,\mu>0}\mathrm{P}(\mathbf{x}, \mathbf{y}|\theta,\mu)}, $$ where the denominator is evaluated as follows: $$ \sup_{\theta > 0,\mu>0}\mathrm{P}(\mathbf{x}, \mathbf{y}|\theta,\mu)= \sup_{\theta > 0}\mathrm{P}(\mathbf{x}|\theta)\sup_{\mu > 0}\mathrm{P…

Parametric Inference: The Power Function of the Test

In Statistics, we model random phenomenon and make conclusions about its population. For example, in an experiment of determining the true heights of the students in the university. Suppose we take sample from the population of the students, and consider testing the null hypothesis that the average height is 5.4 ft against an alternative hypothesis that the average height is greater than 5.4 ft. Mathematically, we can represent this as $H_0:\theta=\theta_0$ vs $H_1:\theta>\theta_0$, where $\theta$ is the true value of the parameter, and $\theta_0=5.4$ is the testing value set by the experimenter. And because we only consider subset (the sample) of the population for testing the hypotheses, then we expect for errors we commit. To understand these errors, consider if the above test results into rejecting $H_0$ given that $\theta\in\Theta_0$, where $\Theta_0$ is the parameter space of the null hypothesis, in other words we mistakenly reject $H_0$, then in this case we committed a Type…