R Tutorial R Charts & Graphs R Statistics R References

R - Mean, Median & Mode



R has a rich set of built-in functions for performing statistical operations. In this section, we will discuss functions for calculating mean, median and mode in R language.

Mean

The R mean() function is used to calculate the arithmetic mean of the elements of a given numeric vector. The syntax for using this function is given below:

Syntax

mean(x, trim = 0, na.rm = FALSE)

Parameters

x Required. Specify the numeric vector.
trim Optional. Specify a fraction (0 to 0.5) of observations to be trimmed from each end of x. It is used to drop some observations from both end of the sorted vector.
na.rm Optional. Specify TRUE to remove NA or NaN values before the computation. Default is FALSE.

Example:

The example below shows the usage of mean() function.

v <- c(10, 15, 20, 25, 30, 35)
cat("The vector contains:\n")
print(v)
cat("mean of all elements of the vector:", mean(v), "\n")

m <- matrix(c(10, 20, 30, 40, 50, 60), ncol=2)
cat("\nThe matrix contains:\n")
print(m)
cat("mean of all elements of the matrix:", mean(m))
cat("\nmean along first column of the matrix:", mean(m[,1]))

The output of the above code will be:

The vector contains:
[1] 10 15 20 25 30 35
mean of all elements of the vector: 22.5 

The matrix contains:
     [,1] [,2]
[1,]   10   40
[2,]   20   50
[3,]   30   60
mean of all elements of the matrix: 35
mean along first column of the matrix: 20

Using trim parameter

The trim parameter can be specified as fraction from (0 to 0.5). It is used to trim a fraction of observations from each end of the sorted vector.

Example:

In the example below, trim parameter is used to trim 25% of observation from each end of the vector. The sorted vector will contain -45, -25, -10, 5, 12, 23, 23, 40. The trim=0.25 option removed -45, -25, 23, 40 and therefore the mean is calculated over vector containing -10, 5, 12, 23, which is 7.5.

v <- c(-10, 12, 23, -25, 23, 5, 40, -45)
cat("The vector contains:\n")
print(v)
cat("mean of all elements of the vector:", mean(v), "\n")
cat("mean after trimmed it by 25% from each side:", 
       mean(v, trim=0.25), "\n")

The output of the above code will be:

The vector contains:
[1] -10  12  23 -25  23   5  40 -45
mean of all elements of the vector: 2.875 
mean after trimmed it by 25% from each side: 7.5 

Using na.rm parameter

The na.rm parameter can be set TRUE to remove NA or NaN values before the computation.

Example:

Consider the example below to see the usage of na.rm parameter.

v1 <- c(10, 20, NA)
v2 <- c(10, 20, NaN)

cat("mean of all elements of v1:", mean(v1), "\n")
cat("mean after removing NA:", mean(v1, na.rm=TRUE), "\n")

cat("\nmean of all elements of v2:", mean(v2), "\n")
cat("mean after removing NaN:", mean(v2, na.rm=TRUE), "\n")

The output of the above code will be:

mean of all elements of v1: NA 
mean after removing NA: 15 

mean of all elements of v2: NaN 
mean after removing NaN: 15 

Median

The R median() function is used to calculate the median of a given numeric vector. The syntax for using this function is given below:

Syntax

median(x, na.rm = FALSE)

Parameters

x Required. Specify the numeric vector.
na.rm Optional. Specify TRUE to remove NA or NaN values before the computation. Default is FALSE.

Example:

The example below shows the usage of median() function.

v <- c(10, 15, 20, 25, 30, 35)
cat("The vector contains:\n")
print(v)
cat("median of the vector:", median(v), "\n")

m <- matrix(c(10, 20, 30, 40, 50, 60), ncol=2)
cat("\nThe matrix contains:\n")
print(m)
cat("median of the matrix:", median(m))
cat("\nmedian along first column of the matrix:", median(m[,1]))

The output of the above code will be:

The vector contains:
[1] 10 15 20 25 30 35
median of the vector: 22.5 

The matrix contains:
     [,1] [,2]
[1,]   10   40
[2,]   20   50
[3,]   30   60
median of the matrix: 35
median along first column of the matrix: 20

Using na.rm parameter

The na.rm parameter can be set TRUE to remove NA or NaN values before the computation.

Example:

Consider the example below to see the usage of na.rm parameter.

v1 <- c(10, 20, NA)
v2 <- c(10, 20, NaN)

cat("median of v1:", median(v1), "\n")
cat("median after removing NA:", median(v1, na.rm=TRUE), "\n")

cat("\nmedian of v2:", median(v2), "\n")
cat("median after removing NaN:", median(v2, na.rm=TRUE), "\n")

The output of the above code will be:

median of v1: NA 
median after removing NA: 15 

median of v2: NA 
median after removing NaN: 15 

Mode

In statistics, mode is the value which has highest number of occurrences in a given dataset. R does not have built-in function to calculate mode. But, we can define a function which can be used to calculate mode of a given dataset.

Example:

Consider the example below, where Mode() function is defined to calculate the mode of the passed argument. It can be used with numeric vector as well as character vector.

#creating a function to calculate mode
Mode <- function(x) {
  uniqx <- unique(x)
  uniqx[which.max(tabulate(match(x, uniqx)))]
}

#creating a vector with numbers
v1 <- c(10, 20, 30, 40, 30, 30, 50)
#creating a vector with characters
v2 <- c("this", "is", "a", "dog", "this", "this")

#calculating mode of vector v1
cat("mode of vector v1:", Mode(v1), "\n")
#calculating mode of vector v2
cat("mode of vector v2:", Mode(v2), "\n")

The output of the above code will be:

mode of vector v1: 30 
mode of vector v2: this 

5