Friday, 31 January 2014

Python and R: Is Python really faster than R?

A friend of mine asked me to code the following in R:
  1. Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5;
  2. Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence level;
  3. Repeat the process 100 times; then
  4. Compute the percentage of the confidence intervals containing the true mean.
 So here is what I got,

Staying with the default values, one would obtain

The output is a list of Matrix and Decision, wherein the first column of the first list (Matrix) is the computed $\bar{x}$; the second and third columns are the lower and upper limits of the confidence interval, respectively; and the fourth column, is an array of ones -- if true mean is contained in the interval and zeros -- true mean not contained.

Now how fast it would be if I were to code this in Python?

I do have a prior knowledge that Python beats R in terms of speed (confirmed from Nathan's post), but out of curiosity I wasn't satisfied with that fact; and leads me to the following Python equivalent,

Computing the elapsed time, we have
  • R
  • Python
As you can see, R executes at 0.008 seconds while Python runs at 0.089 seconds. I am surprised by this fact! I mean, what is happening with my Python? Firing up to 100000 repetitions,

and Python,

Gets even worst! 64 seconds over 7 seconds? That's a huge difference. I don't know what is happening here, but I did my best to literally translate the R codes to Python, and yet R?

Any thoughts guys, especially to the Python gurus?

UPDATE:

I just want to include some great suggestions from the comments below. From Chad Fulton, the above python code can be optimized into the following:

Translated to the proceeding R code by Willem Ligtenberg

And another version by wiekvoet using data frame,

Taking the elapsed time, we have

And in R,

Python :D

No comments:

Post a Comment