Friday, 12 September 2014

R: k-Means Clustering on Imaging

Enough with the theory we recently published, let's take a break and have fun on the application of Statistics used in Data Mining and Machine Learning, the k-Means Clustering.
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. (Wikipedia, Ref 1.)
We will apply this method to an image, wherein we group the pixels into k different clusters. Below is the image that we are going to use,
Colorful Bird From Wall321
We will utilize the following packages for input and output:
  1. jpeg - Read and write JPEG images; and,
  2. ggplot2 - An implementation of the Grammar of Graphics.

Download and Read the Image

Let's get started by downloading the image to our workspace, and tell R that our data is a JPEG file.

Cleaning the Data

Extract the necessary information from the image and organize this for our computation:

The image is represented by large array of pixels with dimension rows by columns by channels -- red, green, and blue or RGB.


Plot the original image using the following codes:


Apply k-Means clustering on the image:

Plot the clustered colours:

Possible clusters of pixels on different k-Means:

Originalk = 6
Table 1: Different k-Means Clustering.
k = 5k = 4
k = 3k = 2

I suggest you try it!


  1. K-means clustering. Wikipedia. Retrieved September 11, 2014.

No comments:

Post a Comment