R Tutorial R Charts & Graphs R Statistics R References

R - Data Structures



In R, it is very important to understand the data structures. Data structures are the objects that you will use and manipulate on a day-to-day basis in R.

R has many data structures, which are categorized below:

  • Vectors
  • Lists
  • Matrices
  • Arrays
  • Factors
  • Data Frames

R base data structures can be categorized by their dimension (1-D, 2-D, or n-D) and whether they are homogeneous (data of same type) or heterogeneous (data of different types).

DimensionHomogeneousHeterogeneous
1-DAtomic vectorList
2-DMatrixData frame
n-DArray

Vectors

Vectors are the most common and basic data structures in R. They are a one-dimensional homogeneous data structures. There are six types of atomic vectors such as logical, integer, character, double, and raw.

Example:

In the example below, a vector is created using c() function.

#creating a vector.
vec <- c("Red", "Blue", "Green")

#printing the vector
print(vec)

#printing the class of the vector
print(class(vec))

The output of the above code will be:

[1] "Red"   "Blue"  "Green"
[1] "character"

Lists

Lists are a heterogeneous data structure and can contain many different types of elements inside it. The elements of a list can be numeric, characters, vectors, character vectors, matrices, arrays, lists, and functions.

Example:

In the example below, a list is created and printed.

#creating lists using c() function
list1 <- c(10, 20, 30)
list2 <- c("Red", "Blue", "Green")

#creating a atomic vector
NoOfColors <- 3

#combining all the created data types
#into a list using list() function 
MyList <- list(list1, list2, NoOfColors, 1:5)

#printing the list
print(MyList)

The output of the above code will be:

[[1]]
[1] 10 20 30

[[2]]
[1] "Red"   "Blue"  "Green"

[[3]]
[1] 3

[[4]]
[1] 1 2 3 4 5

Matrices

Matrices are two-dimensional, homogeneous data structures. Matrices are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As like atomic vectors, the elements of a matrix must be of the same data type.

A Matrix can be created using a vector input to the matrix function.

Example:

In the example below, a matrix is created and printed.

#creating a matrix
mat <- matrix( c(10, 20, 30, 40, 50, 60), 
              nrow = 2, ncol = 3, 
              byrow = TRUE)

#printing the matrix
print(mat)

The output of the above code will be:

     [,1] [,2] [,3]
[1,]   10   20   30
[2,]   40   50   60

Arrays

Arrays are n-dimensional homogeneous data structures. While matrices are confined to two dimensions, arrays can be of any number of dimensions. For example, an array of dimensions (2, 3, 3) contains 3 rectangular matrices each with 2 rows and 3 columns. The array function takes a dim attribute which creates the required number of dimension.

Example:

In the example below, an array is created using a vector.

#creating an arr
arr <- array(c(10, 20, 30, 40), dim = c(3,3,2))

#printing the array
print(arr)

The output of the above code will be:

, , 1

     [,1] [,2] [,3]
[1,]   10   40   30
[2,]   20   10   40
[3,]   30   20   10

, , 2

     [,1] [,2] [,3]
[1,]   20   10   40
[2,]   30   20   10
[3,]   40   30   20

Factors

Factors are the data objects which are used to categorize the data and store it as levels. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.

Factors are created using the factor() function. The nlevels function gives the count of levels.

Example:

In the example below, a factor is created using a vector.

#creating a vector
gender <- c("Male", "Female", "Female", "Male", "Male")

#creating a factor obkect
fac <- factor(gender)

#printing the array
print(fac)
print(nlevels(fac))

The output of the above code will be:

[1] Male   Female Female Male   Male  
Levels: Female Male
[1] 2

Data Frames

Data frames are tabular data objects which are used to store the tabular data. They are two-dimensional, heterogeneous data structures. Unlike a matrix, each column in a data frame can contain different types of data. It is a list of vectors of equal length.

Data Frames are created using the data.frame() function.

Example:

In the example below, a data frame is created which contains three columns.

#creating a data frame
Info <-  data.frame(
 Name = c("John", "Marry", "Kim", "Ramesh"), 
 City = c("London", "New York", "Paris", "Mumbai"),
 Age = c(28, 30, 25, 31)
)

print(Info)

The output of the above code will be:

    Name     City Age
1   John   London  28
2  Marry New York  30
3    Kim    Paris  25
4 Ramesh   Mumbai  31

5