Skip to main content

Posts

Showing posts from 2014

R: Principal Component Analysis on Imaging

Ever wonder what's the mathematics behind face recognition on most gadgets like digital camera and smartphones? Well for most part it has something to do with statistics. One statistical tool that is capable of doing such feature is the Principal Component Analysis (PCA). In this post, however, we will not do (sorry to disappoint you) face recognition as we reserve this for future post while I'm still doing research on it. Instead, we go through its basic concept and use it for data reduction on spectral bands of the image using R. Let's view it mathematically Consider a line $L$ in a parametric form described as a set of all vectors $k\cdot\mathbf{u}+\mathbf{v}$ parameterized by $k\in \mathbb{R}$, where $\mathbf{v}$ is a vector orthogonal to a normalized vector $\mathbf{u}$. Below is the graphical equivalent of the statement:

ALUES: Agricultural Land Use Evaluation System, R package

Authors: Arnold R. Salvacion                                                                        arsalvacion@gmail.com Data Analysis and Visualization using R (blog)                                          Al-Ahmadgaid B. Asaad (maintainer) alstated@gmail.com Agricultural Land Use Evaluation System (ALUES) is an R package that evaluates land suitability for different crop production. The package is based on the Food and Agriculture Organization (FAO) and the International Rice Research Institute (IRRI) methodology for land evaluation. Development of ALUES is inspired by similar tool for land evaluation, Land Use Suitability Evaluation Tool (LUSET). The package uses fuzzy logic approach to evaluate land suitability of a particular area based on inputs such as rainfall, temperature, topography, and soil properties. The membership functions used for fuzzy modeling are the following: Triangular, Trapezoidal and Gaussian. The methods for computing the overall suitability of

Probability Theory Problems

Let's have fun on probability theory, here is my first problem set in the said subject. Problems It was noted that statisticians who follow the deFinetti school do not accept the Axiom of Countable Additivity, instead adhering to the Axiom of Finite Additivity. Show that the Axiom of Countable Additivity implies Finite Additivity. Although, by itself, the Axiom of Finite Additivity does not imply Countable Additivity, suppose we supplement it with the following. Let $A_1\supset A_2\supset\cdots\supset A_n\supset \cdots$ be an infinite sequence of nested sets whose limit is the empty set, which we denote by $A_n\downarrow\emptyset$. Consider the following: Axiom of Continuity: If $A_n\downarrow\emptyset$, then $P(A_n)\rightarrow 0$ Prove that the Axiom of Continuity and the Axiom of Finite Additivity imply Countable Additivity. Prove each of the following statements. (Assume that any conditioning event has positive probability.) If $P(B)=1$, then $P(A|B)=P(A)$ f

R: k-Means Clustering on Imaging

Enough with the theory we recently published, let's take a break and have fun on the application of Statistics used in Data Mining and Machine Learning, the k -Means Clustering. k-means clustering is a method of vector quantization , originally from signal processing, that is popular for cluster analysis in data mining. k -means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. (Wikipedia, Ref 1.) We will apply this method to an image, wherein we group the pixels into k different clusters. Below is the image that we are going to use, Colorful Bird From Wall321 We will utilize the following packages for input and output: jpeg - Read and write JPEG images; and, ggplot2 - An implementation of the Grammar of Graphics.

Lebesgue Measure and Outer Measure Problems

More proving, still on Real Analysis. This is my solution and if you find any errors, do let me know. Problems Lebesgue Measure : Let $\mu$ be set function defined for all set in $\sigma$-algebra $\mathscr{F}$ with values in $[0,\infty]$. Assume $\mu$ is countably additive over countable disjoint collections of sets in $\mathscr{F}$. Prove that if $A$ and $B$ are two sets in $\mathscr{F}$, with $A\subseteq B$, then $\mu(A)\leq \mu(B)$. This property is called monotonicity . Prove that if there is a set $A$ in the collection $\mathscr{F}$ for which $\mu(A)<\infty$, then $\mu(\emptyset)=0$. Let $\{E_{k}\}_{k=1}^{\infty}$ be a countable collection of sets in $\mathscr{F}$. Prove that $\mu\left(\displaystyle\bigcup_{k=1}^{\infty}E_{k}\right)\leq \displaystyle\sum_{k=1}^{\infty}\mu(E_k)$ Lebesgue Outer Measure : By using property of outer measure, prove that the interval $[0,1]$ is not countable. Let $A$ be the set of irrational numbers in the interval $[0,1]$. Prove that $

Translation Invariant of Lebesgue Outer Measure

Another proving problem, this time on Real Analysis. Problem Prove that the Lebesgue outer measure is translation invariant. (Use the property that, the length of an interval $l$ is translation invariant.) Solution Proof . The outer measure is translation invariant if for $y\in \mathbb{R}$, \begin{equation}\nonumber \mu^{*}(A)=\mu^{*}(A+y) \end{equation} Hence, we need to show that Case 1: $\mu^{*}(A)\leq \mu^{*}(A+y)$; and Case 2: $\mu^{*}(A+y)\leq \mu^{*}(A)$. Case 1 : Consider a countable collection $\{I_n\}_{n=1}^{\infty}$, and let \begin{equation}\nonumber W = \left\{\displaystyle\sum_{n=1}^{\infty}l(I_n)\mid A\subseteq\displaystyle\bigcup_{n=1}^{\infty}I_n\right\} \end{equation} Then the outer measure of $A$ is, \begin{equation}\nonumber \mu^{*}(A)=\inf\,\{W\}. \end{equation}

R: Image Analysis using EBImage

Currently, I am taking Statistics for Image Analysis on my masteral, and have been exploring this topic in R. One package that has the capability in this field is the EBImage from Bioconductor , which will be showcased in this post. Installation For those using Ubuntu, you may likely to encounter this error: It has something to do with the tiff.h C header file , but it's not that serious since mytechscribblings has an effective solution for this, do check that out. Importing Data To import a raw image, consider the following codes:

Monotonic Sequential Continuity

This problem is the continuation of my previous post on Monotonic Sequence. Problem Prove the following: If $A_k$ is monotone, then \begin{equation} \mathrm{P}\left(\displaystyle\lim_{n\to\infty} A_n\right)=\displaystyle\lim_{n\to \infty}\mathrm{P}(A_n). \end{equation} Solution Proof . If $\{A_k\}$ is monotone, then \begin{equation}\nonumber \mathrm{P}\left(\lim_{n\to \infty} A_n\right) = \begin{cases} \displaystyle\mathrm{P}\left(\bigcup_{k=1}^\infty A_k\right)&\text{if}\;\{A_k\}\;\text{is expanding}\\ \displaystyle\mathrm{P}\left(\bigcap_{k=1}^\infty A_k\right)&\text{if}\;\{A_k\}\;\text{is contracting} \end{cases}. \end{equation} So if $A_k$ is expanding, then we can write $\displaystyle\bigcup_{k=1}^\infty A_k$ as disjoint unions, \begin{eqnarray} \displaystyle\bigcup_{k=1}^\infty A_k &=& A_1\cup (A_2\cap A_1^c)\cup (A_3\cap A_2^c)\cup \cdots\nonumber\\ &=& A_1\cup (A_2\backslash A_1)\cup (A_3\backslash A_2)\cup \cdots\nonumber \end{eqnarray}

Monotonic Sequence

Analysis with Programming has recently been accepted as a contributing blog on Mathblogging.org , a blogosphere aiming to be the best place to discover mathematical writing on the web. And as a first post, being a member of the said site, I will do proving on the theory of probability. This problem by the way, is part of my first homework on my masteral. This is my solution and if you find errors, do let me know. Problem If $\{A_k\}$ is either expanding or contracting, we say that it is monotone, and for monotone sequence $\{A_k\}$, $\displaystyle\lim_{n\to \infty} A_n$ is defined as follows: \begin{equation}\nonumber \lim_{n\to \infty} A_n = \begin{cases} \displaystyle\bigcup_{k=1}^\infty A_k&\text{if}\;\{A_k\}\;\text{is expanding}\\[0.3cm] \displaystyle\bigcap_{k=1}^\infty A_k&\text{if}\;\{A_k\}\;\text{is contracting} \end{cases}. \end{equation} Prove the above equation. Solution Proof. If $\{A_k\}$ is either expanding or contracting, then for an infinite seque

LaTeX: Using gnuplot for Plotting Functions

$\mathrm{\LaTeX}$ has the capability to draw beautiful graphics. This feature is possible with Ti k Z package. Here is the plot of $f(x) = x$, In $\mathrm{\LaTeX}$, everything has to be coded. From axes, to labels, to points on the $xy$-plane; that explains why four lines of codes, only for single, very simple plot.

R and Python Meetups, Philippines

There will be upcoming meet ups for R User Group Philippines and Python Philippines (PythonPH) Community. Below are the details: R Meetup topic: R for SAS users, and planning of RUG activities   venue: 9/F Sun Life Centre, 5th Avenue corner Rizal Drive,             Bonifacio Global City, 1634, Taguig date: Thursday, June 19, 2014          7:00 pm outline: Introducing R to SAS users; common SAS functions used at PPD - c/o Mark Javellosa; group discussion on equivalent packages in R; and, Sharing of experiences of actual SAS converts. Question? Ask here .

R: Text Mining on Twitter #PrayForMH370 Malaysia Airlines

Warning: Twitter have redesigned the interface of their developers' page, thus the screenshots below are now useless. But this has nothing to do with the codes, so you can still use it. It's been two weeks for search and rescue operations of the Malaysia Airlines Flight MH370, after it vanished from the radar on March 8, 2014. And wherever they are, we hope and pray for them. Photo from VENUS - Wall of Hope & Prayers for MH370 In this post, we are going to do text data mining on Twitter tweets containing #PrayForMH370 from March 8, to March 20, 2014 using Twitter API. First, we need to have an authentication on the Twitter API, to obtain the data. In the proceeding tutorial, the idea and codes for Twitter authentication were based from Julianhi's amazing blog , and I am going to replicate his code to save a copy of it.

Python: Numerical Descriptions of the Data

We are going to explore the basics of Statistics using Python. And we'll go through the following: Importing the data; Apply summary statistics; Other measures of variability (variance and coefficient of variation); Other measures of position (percentile and decile); Estimate the Skewness and Kurtosis; and bonus, Visualize the histogram; Data -- volume of palay (rice) production from five regions (Abra, Apayao, Benguet, Ifugao, and Kalinga) of the central Luzon, Philippines. To import this, execute the following: To check the first and last five entries of the data, use head() and tail() methods, respectively; and to apply the summary statistics, use the describe() method,

RUG-Philippines Meetup: Markov Switching Models in R

To the R users based in the Philippines , there will be upcoming meetup, here are the details: topics: Markov Switching Models in R            by Ohly Santos            How to use the optim function in R            by Joe Brillantes venue: 9/F Sun Life Centre, 5th Avenue corner Rizal Drive,             Bonifacio Global City, 1634, Taguig date:    Friday March 28, 2014            7:00 PM to 9:00 PM (Philippine Standard Time) deadline for RSVP:            Sunday March 23, 2014            11:59 PM (Philippine Standard Time) important:            Please fill up the form before the deadline for RSVP. Your admittance to the venue will depend on whether or not you're in the list that will be submitted on 24 of March 2014. The event is still open to anyone interested as long as you're in the list. Final note, do join the R Users Group - Philippines on Meetup .

Mathematica: Introducing the Wolfram Language

Finally, here it is, check out the video below as Stephen Wolfram showcases the Wolfram language , From my previous post , I said that I used Wolfram Mathematica for about a year before I embrace R . And frankly, I've been in love with Mathematica; it never stops on suprising me every time I use it. You can have beautiful, interactive 3D plots or any type of plots in just few lines of code; you can estimate symbolically the maximum likelihood of a distribution; and many fun stuffs. In fact, I have screencasts on Youtube about Mathematica, here it is

LaTeX: How to install TeX Live - qtree package in Ubuntu 12.10

There is a question on TeX - StackExchange that has no direct solution to the installation of the qtree - TeX Live package in Ubuntu. And I want to answer that in this post, then just drop the link of this article to the comment section of the said query. So here is what I did: Open the Ubuntu Dash , and search for Ubuntu Software Center ; In the Software Center, search for qtree ; Select the first entry ( Humanities Packages ), and click on More Info to confirm if qtree is indeed included in this item; Finally, click . There you go, you can try it now. Here is a simple Statistics problem from Elementary Statistics Book of MSU-IIT Department of Mathematics and Statistics that uses tree diagram,

R: Fun with surf3D function

There is one package that I've been longing. A package that will give me the power to manipulate and do any 3D stuffs in R. I tried persp and wireframe , but I find them difficult to use especially on complicated mathematical functions, like doing parametric plots. And I am just frustrated about that, since I envy the 3D graphics of Wolfram Mathematica a lot, which I exploited for about a year, before I embrace R. However, that has come to an end after Joseph Rickert introduced the plot3D (authored by Karline Soetaert ) package in his post . And for the moment, we will be playing with the surf3D function. Here is the first one, the Mollusc Shell surface plot: With parametric equations: $$  \begin{eqnarray} x(u,v)&=&\left(1.16^v\right)(1 + \cos(u))\cos(v);\nonumber\\ y(u,v)&=&\left(-1.16^v\right)(1 + \cos(u))\sin(v);\nonumber\\ z(u,v)&=&\left(-2\times 1.16^v\right)(1 + \sin(u));\nonumber \end{eqnarray} $$where $u\in[0, 2\pi],\,v\in[-15, 6]$. B

R: Animating 2D and 3D plots

One great package in R is the animation made by Yihui Xie . And just for fun , we are going to explore that. Our aim is to create simple animated 2D and 3D plots. Here is the first one, 2D of course The code, It's a piece of cake right? The function we used for wrapping the plot is saveGIF , this function basically collects all the plots made and use these as frames of the GIF file. In other words, the above plot was generated/looped 100 times through the curve function, and in every iteration we increased the limits of the x axis; hence rolling all the generated plots, animates x-axis towards positive values. What about 3-dimensional? Speechless,

Python and R: Is Python really faster than R?

A friend of mine asked me to code the following in R: Generate samples of size 10 from Normal distribution with $\mu$ = 3 and $\sigma^2$ = 5; Compute the $\bar{x}$ and $\bar{x}\mp z_{\alpha/2}\displaystyle\frac{\sigma}{\sqrt{n}}$ using the 95% confidence level; Repeat the process 100 times; then Compute the percentage of the confidence intervals containing the true mean.  So here is what I got, Staying with the default values, one would obtain The output is a list of Matrix and Decision , wherein the first column of the first list ( Matrix ) is the computed $\bar{x}$; the second and third columns are the lower and upper limits of the confidence interval, respectively; and the fourth column, is an array of ones -- if true mean is contained in the interval and zeros -- true mean not contained. Now how fast it would be if I were to code this in Python?