R Tutorial R Charts & Graphs R Statistics R Resources
Python Java C++ C C# PHP R SQL DS Algo InterviewQ

R - Dendrogram



A dendrogram (or tree diagram) is a network structure. It is constituted of a root node that gives birth to several nodes connected by edges or branches. The last nodes of the hierarchy are called leaves. The diagram can be visualized as a tree.

Basic Dendrogram

Hierarchical clustering is a common task in data science and can be performed with the hclust() function in R. The following examples will guide through the process, showing how to prepare the data, how to run the clustering and how to build an appropriate chart to visualize its result.

Example:

The input dataset is a matrix where each row is a sample, and each column is a variable. Clustering is performed on a square matrix that provides the distance between samples. It can be computed using the dist() or the cor() function. Then the hclust() function is used to perform the hierarchical clustering.

#Dataset 
data <- matrix(sample(seq(1, 2000),200), ncol = 10)
rownames(data) <- paste0("sample_" , seq(1,20))
colnames(data) <- paste0("variable", seq(1,10))

#Euclidean distance
dist <- dist(data[ , c(4:8)] , diag=TRUE)

#naming the file
png(file = "Dendrogram.png")

#Hierarchical Clustering with hclust
hc <- hclust(dist)

#Plot the result
plot(hc)

#saving the file
dev.off()

The output of the above code will be:

Dendrogram

Zoom on a group

It is possible to zoom on a specific part of the tree. Select the group of interest using the [[..]] operator.

Example:

Consider the example below, in which dendrogram is zoomed at the second group.

#Dataset 
data <- matrix(sample(seq(1, 2000),200), ncol = 10)
rownames(data) <- paste0("sample_" , seq(1,20))
colnames(data) <- paste0("variable", seq(1,10))

#Euclidean distance
dist <- dist(data[ , c(4:8)] , diag=TRUE)

#naming the file
png(file = "Dendrogram.png")

#Hierarchical Clustering with hclust
hc <- hclust(dist)

#store the dedrogram in an object
dhc <- as.dendrogram(hc)

#set the margin
par(mar=c(4,4,2,2))

#Plot the second group
plot(dhc[[2]], main= "Zoomed on the second group")

#saving the file
dev.off()

The output of the above code will be:

Dendrogram

5