## Posts

Showing posts from 2013

### Book Review: Learning Geospatial Analysis with Python by Joel Lawhead

I decided to read this book since I've been doing maps using R. Hence it is better to learn the literature and science behind mapping and how to do a proper analysis on it. In addition, I would like to see what Python can offer in this discipline.

The book has 10 chapters contained in a 364 pages. The first three chapters was a long reading, not much on coding, but rather on discussions of introduction to Geospatial Analysis.

Impression: I like the idea that the author spent three chapters talking about the overall story (I would say) of Geospatial Analysis. Just a preview, the first chapter is of course the introduction; second is the data types, which surprisingly has a variety of formats; and third is all about the libraries and packages used in the said study. I am familiar with ArcGIS and QGIS, but this book lets you aware with other tools as well.

The simple illustration that complements the discussion is very helpful in telling the overall story of the subject. And the s…

### R: Explore ARIMA(2, 2, 2) subclass family on Shiny

I've been thinking that it might be better to explore the Box-Jenkins ARIMA (Autoregressive Integrated Moving-Average) three-iterative modelling on Shiny. So here is what I got, this app is intended for ARIMA(2, 2, 2) subclass family only.

The app has six tabs, and these are:
Historical Plot;Identification;Estimation;Diagnostic; Forecast; andData The first tab is where the time plot of the simulated time series, the series can be simulated from different subclass family of ARIMA(2, 2, 2). The order is assigned using the controls in the side panel. The values of the parameters are set in the text field right below the plot. So for example, the ARIMA(1, 1, 1) has two text fields for AR (Autoregressive) and MA (Moving-Average) parameters as shown below. The default parameter for all models is 0.3.

The next tab is Identification, this is the first stage of Box-Jenkins iterative modelling. Here, the model is identified using the correlograms, Autocorrelation Function (ACF) and Partial …

### Book Review: Practical Data Analysis by Hector Cuesta

I have been reading this book since last week, and now I want to share my thoughts about it. I was excited to review this because I've never heard most of the tools it features, like OpenRefine, MongoDB, and MapReduce. The book has 360 pages and surprisingly it covers a lot of topics. Along with that, is the Github repository for all the codes.
Practical Data Analysis is all about applications of statistical methodologies on computer science. I find it very useful since this was not taught in my statistics class. In college, we only practice statistics on fields like sociology, psychology, agriculture, economics, chemistry, biology, industrial engineering, and many others, but we were not onto computer science, we only deal with it when coding in R or SAS. Hal Varian once said in this video that,
. . . we've got at least hundred statisticians on Google . . .
And I was curious about that, I mean, what are they doing on Google? What are the statistical tools do they use? Thanks …

### R: Mapping Super Typhoon Yolanda (Haiyan) Track

After reading Enrico Tonini post, I decided to map the super typhoon Haiyan track using OpenStreetMap, maptools, and ggplot2. If mapping with googleVis was possible with 13 lines only, that can also be achieved with the packages I used; but because I play with the aesthetic of the map, thus I got more than that. The data was taken from Weather Underground, and just to be consistent with the units from JMA Best Track Data, which I utilized for mapping typhoon Labuyo (Utor), the wind speed in miles per hour (mph) was converted to knots. So here is the final output,

Labels:
TD - Tropical DepressionTS - Tropical StormTY - TyphoonSTY - Super Typhoon As for the super typhoon, please don't visit my country again. I would like to thank all who prayed for Philippines, especially for countries who helped us recover from this tragedy.
Codes:

### Python: Venn Diagram

Venn Diagram is very useful for visualizing operations between events/sets. So in this post, we will learn how to visualize one in Python. First, we need to install the module matplotlib-venn. Open the terminal or command prompt, and run the following code:

Now that we have it, here are the three set operations I visualized:

### R: Mapping Philippine Earthquakes (October 2013)

Last month, October 15, 2013 around 8:12 am (Philippine Time), a magnitude 7.2 earthquake hit Bohol island, detroying several infrastructures and killing hundreds of residents. The Philippine Institute of Volcanology and Seismology or PhiVolcs recorded more than 3000 aftershocks, but only a fraction of these is available in their Earthquake Bulletin. There are 448 data points in total for last month's earthquakes, and here is the final output,
Adding layer for two-dimensional density of the data points, we have

### R: Mapping Typhoon Labuyo (Utor) Track

Inspired byJames Cheshire R maps, I tried mapping the track of Typhoon Labuyo (Utor) that by far the strongest one to hit Philippines this year. The data is available at Japan Meteorological Agency site, the 2013 - RSMC Best Track data. The code is available here.

Bonus, the Severe Tropical Storm Auring (Sonamu) path

### LaTeX: Itemize and Enumerate

Previously, we talk about paragraph, spacing, and indentation. What about listing in $\mathrm{\LaTeX}$?

Just like MS Word, $\mathrm{\LaTeX}$ has both bullet and alphanumeric environment for listing. Bullet listing is constructed inside the itemize domain. For example,

The codes above generate list with sublists. The outer list uses bullet for entries (Line 5 and Line 14) while the sublists (Line 7-12 and Line 16-20) use hyphen. The itemize domain can have up to three nested lists, each with a particular character for entries. Here is something worth to explore,

This returns Output 1, the first list with entries Line 5-6 uses bullets, then a sublist under Line 6 with two entries (Line 8-9) uses hyphen, another subsublist uses asterisk, and further down to subsubsublist, dot or period is utilize. If we go further to the fourth nested list we get error that says, "Too deeply nested".

### LaTeX: Paragraph, Spacing, and Indentation

Let us visit the history of Statistics (source here), and utilize this for demonstrations of paragraph, spacing, and indentation in $\mathrm{\LaTeX}$. Consider the following:

Executing these, we will have Output 1. This is the simplest way we can do to separate multiple paragraphs in $\mathrm{\LaTeX}$. That is, by allotting single white space between them (paragraph), or that is two white spaces from the last line of the preceding paragraph.

To extend the distance between passages, say into double line spacing, just put \\ at every end of it and you will have Output 2. Hence from the above codes, the first paragraph will end with Ibrahim Al-Kadi.\\, second with element in history.\\, and so on. There is option for the magnitude of space, say 0.5 centimeter, or 0.2 inch spacing is achieve by \\[0.5cm] and \\[0.2in], respectively. Now what happen when white spaces are omitted on the above codes, and leave \\ at the end of every paragraphs? I want you to explore that.

I have shown you …

### LaTeX: Introduction - First Document

Let us start with a simple $\mathrm{\LaTeX}$ document. Open your text editor (Texmaker, TeXnicCenter, TeXworks, etc.), then go to File>New, and paste the following:

Save this and make sure the file has .tex extension. You should have something like this
In Texmaker: Press F1 to compile and view the output (Mac users should manually click the blue right arrow next to the paste button, because F1 won't work). The output is in PDF format with text on the top-left corner of the page that says the statement in Output 1. Now let us investigate these line by line. $\mathrm{\LaTeX}$ document always starts with \documentclass which sets the document you are to create. In this case the document class is article, {article}, with font size option set to 12 point, [12pt].

All $\mathrm{\LaTeX}$ tag always start with backslash (\), use braces ({}) to enclose arguments, and brackets ([]) to enclose options of the arguments.
Click here for list of document classes and options. Proceeding to the…

Are you using MS Word for your paper works? And MS PowerPoint for your presentations? Why not try something that is FREE, powerful than those commercial products? Something that is quite challenging but FUN to use. Introducing $\mathrm{\LaTeX}$, a document markup language similar to HTML but outputs in PDF format.

I prefer $\mathrm{\LaTeX}$ because it produces beautiful mathematical equations that MS Word cannot. $\mathrm{\LaTeX}$ is used by publishers like O'Reilly and Springer for books that involve mathematics and statistics.

Now let me guide you on getting started with $\mathrm{\LaTeX}$ on Windows, Mac and Ubuntu.

For Windows users:
Go to this site, download what is recommended for your system;follow the instructions in installation; after that,download the TeXMaker here, and install. (TeXMaker is a $\mathrm{\LaTeX}$ text-editor with many features including syntax highlighting) For Mac users:

Python offers modules such as scipy, numpy, and pandas for data analysis. And I am going to use these as alternative to R. To get started, I recommend installing the Python IDE, Spyder. If you haven't yet installed python in your computer, don't worry as this will automatically be installed as well.
Open Ubuntu Software CenterSearch for SpyderClick Install
Once successfully installed, open it and try running some arithmetic on the console. Or try the script window and press F5 to execute.

### R: Interval Estimation of the Population Mean

Interval estimation of the population mean can be computed from functions of the following R packages:
stats - contains the t.test;TeachingDemos - contains the z.test; and,BSDA - contains the zsum.test and tsum.test. The t.test of the stats package is a student's t test, and is use when raw dataset is given. The same case for z.test, but this function is specifically for z-test of known population standard deviation. When dataset is not given and only the summary statistics (mean, and standard deviation) are presented, then the appropriate functions are zsum.test or tsum.test. Note that, t.test and tsum.test are functions of the same statistical test, and that of z.test and zsum.test. Consider the example below,

Example 1. The 2012-2013 SASE scores of the 33 random students from College of Science and Mathematics (CSM) of MSU-IIT were recorded: 84, 93, 101, 86, 82, 86, 88, 94, 89, 94, 93, 83, 95, 86, 94, 87, 91, 96, 89, 79, 99, 98, 81, 80, 88, 100, 90, 100, 81, 98, 87, 95, and 94. …

### R: Measures of Skewness and Kurtosis

Skewness and kurtosis in R are available in the moments package (to install a package, click here), and these are:
Skewness - skewness; and,Kurtosis - kurtosis.Example 1. Mirra is interested on the elapse time (in minutes) she spends on riding a tricycle from home, at Simandagit, to school, MSU-TCTO, Sanga-Sanga for three weeks (excluding weekends). She obtain the following data: 19.09, 19.55, 17.89, 17.73, 25.15, 27.27, 25.24, 21.05, 21.65, 20.92, 22.61, 15.71, 22.04, 22.60, and 24.25. Compute and interpret the skewness and kurtosis.

Interpretation: The skewness here is -0.01565162. This value implies that the distribution of the data is slightly skewed to the left or negatively skewed. It is skewed to the left because the computed value is negative, and is slightly, because the value is close to zero. For the kurtosis, we have 2.301051 implying that the distribution of the data is platykurtic, since the computed value is less than 3. Graphical illustration of the data is in Figure 1.

### R: Measure of Relative Variability

The measure of relative variability is the coefficient of variation (CV). Unlike measures of absolute variability, the CV is unitless when it comes to comparisons between the dispersions of two distributions of different units of measurement. In R, CV is obtained using the cv function of the raster package.

Example 1. Below are the mean and standard deviation of the number of hours spent by Jacob every time he study the Stochastic Process with the corresponding scores he got out of the 100 items. Basing from this data, should one say that the number of hours he spent in studying is more variable than his exam scores, or the other way around?

Variable  Mean  Standard Deviation  Study Hours   25  2.6 Scores695.3
To determine this, we use the function below

And thus,

Interpretation: It is very clear from the computed CV that, the study hours is more variable than the exam scores, even though the standard deviation of the scores is higher than the hours spent.

### R: Measures of Absolute Variability

Measures of absolute variability deals with the dispersion of the data points. This include the following:
Range - range;Interquartile Range - IQR;Quartile Deviation;Average Deviation; and,Standard Deviation - sd. These measures of variability are restricted to uniform units of measurement when comparing two distributions.

Example 1. The heights (in centimetres) of the 17 BS Stat students in section A23 of Statistical Inference under Dr. Supe were recorded. The data are the following: 151, 160, 162, 155, 154, 154, 153, 168, 169, 153, 158, 166, 152, 157, 150, 169, and 167. Compute the range, interquartile range, quartile deviation, average deviation, and standard deviation.

The range is computed using the function range, while the interquartile range is obtained by IQR. Thus,

### R: Quartiles, Deciles, and Percentiles

The measures of position such as quartiles, deciles, and percentiles are available in the quantile function. This function has a usage,
where: x - the data points;prob - the location to measure;na.rm - if FALSE, NA (Not Available) data points are not ignored;names - for attributes, FALSE means no attributes, hence speeds-up the computation;type - type of the quantile algorithms; and,... - further arguments.Example 1. The junior BS Stat students of MSU-IIT have the following SASE scores: 88, 84, 83, 80, 94, 90, 81, 79, 79, 81, 85, 87, 86, 89, and 92. Determine and interpret the quartiles of these scores.
Interpretation: Therefore, $Q_1$=25% implies that, 25% of the SASE scores fall below or equal to 81.0, while the other 75% of it is above 81.0. $Q_2$=50% is the median, and thus half of the scores are below or equal to 85.0, while the other half, are above 85.0. $Q_3$=75%, implies that three-fourth of the data are below or equal to 88.5, while the remaining one-fourth are above 88.5. And…

### R: Mean and Median

Mean in R is computed using the function mean. Consider the scores of 20 MSU-IIT students in Stat 101 exam with hundred items: 70, 78, 66, 65, 50, 53, 48, 88, 95, 80, 85, 84, 81, 63, 68, 73, 75, 84, 49, and 77. Compute and interpret the mean and median.

Interpretation: Therefore, the average score of the students is 71.6, and half of their scores are less than or equal to 74, while the other half are greater than 74.

### R: Matrix Operations

Matrix manipulation in R are very useful in Linear Algebra. Below are list of common yet important functions in dealing operations with matrices:
Transpose - t;Multiplication - %*%;Determinant - det; and,Inverse - solve, or ginv of MASS libraryEigenvalues and Eigenvectors - eigen Consider these matrices, $\left[\begin{array}{ccc}3&4&5\\2&1&3\\6&5&4\end{array}\right]$ and  $\left[\begin{array}{ccc}6&7&5\\4&5&8\\7&6&6\end{array}\right]$. In R, these would be,

Transposing these, simply use t

### R: Data Class Conversion

Data in R can be converted from one class to the other. The functions are prefixed with as. then followed by the name of the data class that we wish to convert to. Data class in R are the following:
numeric - as.numeric;vector - as.vector;character - as.character;matrix - as.matrix; and,data frame - as.data.frame. Hence, if one wishes to convert a numeric data points 32, 35, 38, 29, 27, 40, and 33 into a character. Then, this is achieved by

Notice the difference between the output of the data object and the converted one, data.ch? The output differs only with this character, ". This character that encloses every data points suggests that the data is now in character form. And this can be verified using the function class,

### R: Enter Data in Matrix Format

Matrix in R is constructed using matrix, rbind, or cbind function. These functions have the following descriptions:
matrix - used to transform a concatenated data into matrix, of compatible dimensions;rbind - short for row bind, that binds a concatenated data points of same sizes by row;cbind - short for column bind, that binds a concatenated data points of same sizes by column.Example 1. Consider this matrix, $\left[\begin{array}{ccc} 3&4&5\\ 2&1&3\\ 6&5&4 \end{array}\right]$. Using the matrix function, we can code this as

So here's what happened above, first the data was concatenated using the c function into a data.a object. Next, we transformed this into a matrix of compatible dimension, that is $3\times 3$. Below are the description of the arguments:
data.a - the datanrow - the number of rowsncol - the number of columns byrow - the orientation of how data is wrapped into a matrix. If TRUE, then it's row-wise, otherwise, column-wise.

### R: Basic Mathematical Functions

R can perform the usual mathematical operations, below are the functions:

Arithmetic
+    - addition-    - subtraction*    - multiplication/    - division
Trigonometry
sin      - sinecos      - cosinetan      - tangentasin    - sine inverseacos    - cosine inverseatan    - tangent inverse
Linear Algebra
+            - element-wise addition-            - element-wise subtraction*            - element-wise multiplication/            - element-wise division%*%        - matrix multiplicationt            - transpose eigen    - eigenvalues and eigenvectorssolve    - inverse of matrixginv      - generalized inverse, requires MASS packagerbind    - combines vectors of observations horizontally into matrix classcbind    - combines vectors of observations vertically into matrix class

Reference:
Kabacoff, R. I. Matrix Algebra. Quick-R. Retrieved April 20, 2013. Trigonometric Functions. R Documentation. R.2.15.1

To install a package in R, the function to be used is install.packages. Let say we want to install the ggplot2 package, simply code this with the following:

To install more than one package, utilize the concatenate function, c

Note that in executing the above codes, a dialogue box will pop up asking for the CRAN-Mirror, just choose the one that's in or near your country. Now to load these packages, run

And to detach these packages, run

### R: Importing Data

There are number of ways in importing data into R, and several formats are available,
From Excel to R;from SPSS to R; and,from Stata to R, and more here. In this post, we are going to talk about importing common data format that we often encounter, such as Excel and Text data. Most of the data are saved in MS Excel, and the best way to import this is to save this in CSV format, below is the procedure: Open your Excel data;go to File > Save As or press Ctrl+Shift+S;name this with anything you want, say "Data". Then before clicking Save, make sure to change the File Format to Comma Delimited Text and better set the directory to My Documents folder, for Windows. when saved, this file will have a name "Data.csv". Now open R, and run the following

The argument header = TRUE tells R that the first row of the data are the labels of every column. If set to FALSE, means the first row of the data are not the labels, but are considered as data points.

### R: How to Encode your Data?

Every experiment starts with data, so the question is "how do you enter your data into R?". Well there are many ways to do that, but for this post, we will only consider the two functions below:
The concatenate, c; and,the data.frame functions. The concatenate function, c, is use for combining data points into single numeric R object, known as the vector. The usage of this function is simply

Where ... is the objects to be concatenated. Run ?c, for description of the second argument. Let's try an example,

What happened here is that, we defined a new object, vec1, into the workspace. We can then start manipulating the entries, say using the summary,

For dispersion, try this,

What about the data.frame function? If the first function combines data points into a single vector, data.frame from the name itself constructs a frame of data points. Here is an example,

What we did here is we defined two R objects, the weights and volunteers, then we combine the two into a table li…

There are two ways to install R in Ubuntu. One is through the terminal, and the other is through the Ubuntu Software Center.
Through the Terminal
Press Ctrl+Alt+T to open the Terminal;then execute sudo apt-get update; after that,run sudo apt-get install r-base; To run R, execute R in the Terminal (see the picture below).

Through Ubuntu Software Center
Open Ubuntu Software Center;search for r-base;and click Install;then run R by executing R in the Terminal. Working in the Terminal would be inconvenient, so I suggest downloading a user-friendly interface. For example in Ubuntu, I recommend using RStudio IDE or RKWard KDE.