Skip to main content

R: How to Encode your Data?

Every experiment starts with data, so the question is "how do you enter your data into R?". Well there are many ways to do that, but for this post, we will only consider the two functions below:
  • The concatenate, c; and,
  • the data.frame functions.
The concatenate function, c, is use for combining data points into single numeric R object, known as the vector. The usage of this function is simply

c(..., recursive = FALSE)
view raw hteyd.R hosted with ❤ by GitHub
Where ... is the objects to be concatenated. Run ?c, for description of the second argument. Let's try an example,

vec1 <- c(0.5, 0.3, 0.1, 0.6, 0.2)
vec1
# OUTPUT
[1] 0.5 0.3 0.1 0.6 0.2
view raw hteyd1.R hosted with ❤ by GitHub
What happened here is that, we defined a new object, vec1, into the workspace. We can then start manipulating the entries, say using the summary,

summary(vec1)
# OUTPUT
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.10 0.20 0.30 0.34 0.50 0.60
view raw hteyd2.R hosted with ❤ by GitHub
For dispersion, try this,

var(vec1) # Variance
# OUTPUT
[1] 0.043
sd(vec1) # Standard Deviation
# OUTPUT
[1] 0.2073644
view raw hteyd3.R hosted with ❤ by GitHub
What about the data.frame function? If the first function combines data points into a single vector, data.frame from the name itself constructs a frame of data points. Here is an example,

weights <- c(56.4, 45.6, 40.2, 50.1, 51.3)
volunteers <- c("Mirra", "Jeh-Jeh", "Amil", "Ikkah", "NG")
data1 <- data.frame(volunteers, weights)
data1
# OUTPUT
volunteers weights
1 Mirra 56.4
2 Jeh-Jeh 45.6
3 Amil 40.2
4 Ikkah 50.1
5 NG 51.3
view raw hteyd4.R hosted with ❤ by GitHub
What we did here is we defined two R objects, the weights and volunteers, then we combine the two into a table like structure, called the data frame. To extract columns of data1, try this,

# extract volunteers
data1$volunteers
# OUTPUT
[1] Mirra Jeh-Jeh Amil Ikkah NG
Levels: Amil Ikkah Jeh-Jeh Mirra NG
# extract weights
data1$weights
#OUTPUT
[1] 56.4 45.6 40.2 50.1 51.3
view raw hteyd5.R hosted with ❤ by GitHub
And the mean of the weights is,

mean(data1$weights)
# OUTPUT
[1] 48.72
view raw hteyd6.R hosted with ❤ by GitHub