Friday, 30 January 2015

Multiple Random Variables Problems

To probability lovers, I just want to share (and discuss) few simple problems I solved in Chapter 4 of Casella, G. and Berger, R.L. (2002). Statistical Inference.
  1. A random point $(X,Y)$ is distributed uniformly on the square with vertices $(1, 1),(1,-1),(-1,1),$ and $(-1,-1)$. That is, the joint pdf is $f(x,y)=\frac{1}{4}$ on the square. Determine the probabilities of the following events.
    1. $X^2 + Y^2 < 1$
    2. $2X-Y>0$
    3. $|X+Y|<1$ (modified since the original $|X+Y|<2$ is trivial.)
    Solutions:
    1. $X^2 + Y^2 < 1$
      We need to consider the boundary of this inequality first in the unit square, so below is the plot of $X^2 + Y^2 = 1$,

Thursday, 15 January 2015

New Toy: SAS® University Edition

So I started using SAS® University Edition which is a FREE version of SAS® software. Again it's FREE, and that's the main reason why I want to relearn the language. The software was announced on March 24, 2014 and the download went available on May of that year. And for that, I salute Dr. Jim Goodnight. At least we can learn SAS® without paying for the expensive price tag, especially for single user like me.

The software requires a virtual machine, where it runs on top of that; and a 64-bit processor. To install, just follow the instruction in this video. Although the installation in the video is done in Windows, it also works on Mac. Below is the screenshot of my SAS® Studio running on Safari.

Monday, 5 January 2015

R: Canonical Correlation Analysis on Imaging

In imaging, we deal with multivariate data, like in array form with several spectral bands. And trying to come up with interpretation across correlations of its dimensions is very challenging, if not impossible. For example let's recall the number of spectral bands of AVIRIS data we used in the previous post. There are 152 bands, so in total there are 152$\cdot$152 = 23104 correlations of pairs of random variables. How will you be able to interpret that huge number of correlations?

To engage on this, it might be better if we group these variables into two and study the relationship between these sets of variables. Such statistical procedure can be done using the canonical correlation analysis (CCA). An example of this on health sciences (from Reference 2) is variables related to exercise and health. On one hand you have variables associated with exercise, observations such as the climbing rate on a stair stepper, how fast you can run, the amount of weight lifted on bench press, the number of push-ups per minute, etc. But you also might have health variables such as blood pressure, cholesterol levels, glucose levels, body mass index, etc. So two types of variables are measured and the relationships between the exercise variables and the health variables are to be studied.

Methodology

Mathematically we have the following procedures:
  1. Divide the random variables into two groups, and assign these to the following random vectors: \begin{equation}\nonumber \mathbf{X} = [X_1,X_2,\cdots, X_p]^T\;\text{and}\;\mathbf{Y} = [Y_1,Y_2,\cdots, Y_q]^T \end{equation}