A friend of mine asked me to code the following in R:
Staying with the default values, one would obtain
The output is a list of
Now how fast it would be if I were to code this in Python?
I do have a prior knowledge that Python beats R in terms of speed (confirmed from Nathan's post), but out of curiosity I wasn't satisfied with that fact; and leads me to the following Python equivalent,
Computing the elapsed time, we have
and Python,
Gets even worst! 64 seconds over 7 seconds? That's a huge difference. I don't know what is happening here, but I did my best to literally translate the R codes to Python, and yet R?
Any thoughts guys, especially to the Python gurus?
- Generate samples of size 10 from Normal distribution with \mu = 3 and \sigma^2 = 5;
- Compute the \bar{x} and \bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}} using the 95% confidence level;
- Repeat the process 100 times; then
- Compute the percentage of the confidence intervals containing the true mean.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
case <- function(n = 10, mu = 3, sigma = sqrt(5), p = 0.025, rep = 100){ | |
m <- matrix(NA, nrow = rep, ncol = 4) | |
for(i in 1:rep){ | |
norm <- rnorm(mean = mu, sd = sigma, n = n) | |
xbar <- mean(norm) | |
low <- xbar - qnorm(p = 1 - p) * (sigma/sqrt(n)) | |
up <- xbar + qnorm(p = 1 - p) * (sigma/sqrt(n)) | |
if((mu > low) && (mu < up)){ | |
rem <- 1 | |
} else { | |
rem <- 0 | |
} | |
m[i, ] <- c(xbar, low, up, rem) | |
} | |
inside <- sum(m[, 4]) | |
per <- inside / rep | |
desc <- paste("There are ", inside, " confidence intervals that contain ", | |
"the true mean (", mu, "), that is ", per, " percent of the total CIs", sep = "") | |
return(list(Matrix = m, Decision = desc)) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$Matrix | |
[,1] [,2] [,3] [,4] | |
[1,] 3.8195089 2.4336051 5.205413 1 | |
[2,] 4.6144773 3.2285735 6.000381 0 | |
[3,] 2.0731334 0.6872296 3.459037 1 | |
[4,] 3.6641014 2.2781976 5.050005 1 | |
[5,] 3.8393909 2.4534871 5.225295 1 | |
: : : : : | |
: : : : : | |
[95,] 1.9025146 0.5166108 3.288418 1 | |
[96,] 2.4069555 1.0210517 3.792859 1 | |
[97,] 2.3141211 0.9282173 3.700025 1 | |
[98,] 3.2102493 1.8243455 4.596153 1 | |
[99,] 2.7873416 1.4014378 4.173245 1 | |
[100,] 3.1422372 1.7563333 4.528141 1 | |
$Decision | |
[1] "There are 95 confidence intervals that contain the true mean (3), that is 0.95 percent of the total CIs" |
Matrix
and Decision
, wherein the first column of the first list (Matrix
) is the computed \bar{x}; the second and third columns are the lower and upper limits of the confidence interval, respectively; and the fourth column, is an array of ones -- if true mean is contained in the interval and zeros -- true mean not contained.
Now how fast it would be if I were to code this in Python?
I do have a prior knowledge that Python beats R in terms of speed (confirmed from Nathan's post), but out of curiosity I wasn't satisfied with that fact; and leads me to the following Python equivalent,
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy.stats as ss | |
def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100): | |
m = np.zeros((rep, 4)) | |
for i in range(rep): | |
norm = np.random.normal(loc = mu, scale = sigma, size = n) | |
xbar = np.mean(norm) | |
low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n)) | |
up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n)) | |
if (mu > low) & (mu < up): | |
rem = 1 | |
else: | |
rem = 0 | |
m[i, :] = [xbar, low, up, rem] | |
inside = np.sum(m[:, 3]) | |
per = inside / rep | |
desc = "There are " + str(inside) + " confidence intervals that contain " \ | |
"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs" | |
return {"Matrix": m, "Decision": desc} |
- R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characterssystem.time(case()) #Output user system elapsed 0.008 0.000 0.008 - Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersimport time t0 = time.time() case() time.time() - t0 #Output 0.08859896659851074
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
system.time(case(rep = 100000)) | |
#Output | |
user system elapsed | |
7.076 0.000 7.066 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
t0 = time.time() | |
case(rep = 100000) | |
time.time() - t0 | |
#Output | |
64.88077402114868 |
Any thoughts guys, especially to the Python gurus?
UPDATE:
I just want to include some great suggestions from the comments below. From Chad Fulton, the above python code can be optimized into the following:
Translated to the proceeding R code by Willem Ligtenberg
And another version by wiekvoet using data frame,
Taking the elapsed time, we have
And in R,
Python :D
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import scipy.stats as ss | |
def case2(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100): | |
scaled_crit = ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n)) | |
norm = np.random.normal(loc = mu, scale = sigma, size = (rep, n)) | |
xbar = norm.mean(1) | |
low = xbar - scaled_crit | |
up = xbar + scaled_crit | |
rem = (mu > low) & (mu < up) | |
m = np.c_[xbar, low, up, rem] | |
inside = np.sum(m[:, 3]) | |
per = inside / rep | |
desc = "There are " + str(inside) + " confidence intervals that contain " \ | |
"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs" | |
return {"Matrix": m, "Decision": desc} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
case2 <- function(n = 10, mu = 3, sigma = sqrt(5), p = 0.025, rep = 100){ | |
scaledCrit <- qnorm(p = 1 - p) * (sigma/sqrt(n)) | |
norm <- matrix(data = rnorm(mean = mu, sd = sigma, n = n*rep), ncol = n, nrow = rep) | |
xbar <- rowMeans(norm) | |
low <- xbar - scaledCrit | |
up <- xbar + scaledCrit | |
rem <- (mu > low) & (mu < up) | |
m <- cbind(xbar, low, up, rem) | |
inside <- sum(m[, 4]) | |
per <- inside / rep | |
desc <- paste("There are ", inside, " confidence intervals that contain ", | |
"the true mean (", mu, "), that is ", per, " percent of the total CIs", sep = "") | |
return(list(Matrix = m, Decision = desc)) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
case3 <- function(n = 10, mu = 3, sigma = sqrt(5), p = 0.025, rep = 100){ | |
xbar <- rowMeans( | |
matrix(rnorm(mean = mu, sd = sigma, n = n*rep),nrow=rep) | |
) | |
q <- qnorm(p = 1 - p) * (sigma/sqrt(n)) | |
m <- data.frame( | |
xbar, | |
low=xbar+q, | |
high=xbar-q, | |
rem= abs(mu-xbar) < q ) | |
inside <- sum(m$rem) | |
per <- inside / rep | |
desc <- paste("There are ", inside, " confidence intervals that contain ", | |
"the true mean (", mu, "), that is ", per, " percent of the total CIs", sep = "") | |
return(list(Matrix = m, Decision = desc)) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import time | |
t0 = time.time() | |
case2(rep = 100000) | |
time.time() - t0 | |
#Output | |
0.1463780403137207 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
system.time(case2(rep = 100000)) | |
#Output | |
user system elapsed | |
0.296 0.016 0.315 | |
system.time(case3(rep = 100000)) | |
#Output | |
user system elapsed | |
0.300 0.024 0.322 |