<- list(10, 30, 50) list_1
Introduction
Data structures in R are tools for storing and organizing multiple values.
They help to organize stored data in a way that the data can be used more effectively. Data structures vary according to the number of dimensions and the data types (heterogeneous or homogeneous) contained. The primary data structures are:
Vectors (link)
Lists
Data frames
Matrices
Arrays
Factors
Data structures
1. Vectors
Discussed in a previous post
2. Lists
Lists are objects/containers that hold elements of the same or different types. They can containing strings, numbers, vectors, matrices, functions, or other lists. Lists are created with the list()
function
Examples
a. Three element list
b. Single element list
<- list(c(10, 30, 50)) list_2
c. Three element list
<- list(1:3, c(50,40), 3:-5) list_3
d. List with elements of different types
<- list(c("a", "b", "c"), 5:-1) list_4
e. List which contains a list
<- list(c("a", "b", "c"), 5:-1, list_1) list_5
f. Set names for the list elements
names(list_5)
NULL
names(list_5) <- c("character vector", "numeric vector", "list")
names(list_5)
[1] "character vector" "numeric vector" "list"
g. Access elements
1]] list_5[[
[1] "a" "b" "c"
"character vector"]] list_5[[
[1] "a" "b" "c"
h. Length of list
length(list_1)
[1] 3
length(list_5)
[1] 3
3. Data frames
A data frame is one of the most common data objects used to store tabular data in R. Tabular data has rows representing observations and columns representing variables. Dataframes contain lists of equal-length vectors. Each column holds a different type of data, but within each column, the elements must be of the same type. The most common data frame characteristics are listed below:
• Columns should have a name;
• Row names should be unique;
• Various data can be stored (such as numeric, factor, and character);
• The individual columns should contain the same number of data items.
Creation of data frames
<- c("Low", "Mid", "High")
level <- c("R", "RStudio", "Shiny")
language <- c(25, 36, 47)
age
<- data.frame(level, language, age) df_1
Functions used to manipulate data frames
a. Number of rows
nrow(df_1)
[1] 3
b. Number of columns
ncol(df_1)
[1] 3
c. Dimensions
dim(df_1)
[1] 3 3
d. Class of data frame
class(df_1)
[1] "data.frame"
e. Column names
colnames(df_1)
[1] "level" "language" "age"
f. Row names
rownames(df_1)
[1] "1" "2" "3"
g. Top and bottom values
head(df_1, n=2)
level language age
1 Low R 25
2 Mid RStudio 36
tail(df_1, n=2)
level language age
2 Mid RStudio 36
3 High Shiny 47
h. Access columns
$level df_1
[1] "Low" "Mid" "High"
i. Access individual elements
3,2] df_1[
[1] "Shiny"
2, 1:2] df_1[
level language
2 Mid RStudio
j. Access columns with index
3] df_1[,
[1] 25 36 47
c("language")] df_1[,
[1] "R" "RStudio" "Shiny"
k. Access rows with index
2, ] df_1[
level language age
2 Mid RStudio 36
4. Matrices
A matrix is a rectangular two-dimensional (2D) homogeneous data set containing rows and columns. It contains real numbers that are arranged in a fixed number of rows and columns. Matrices are generally used for various mathematical and statistical applications.
a. Creation of matrices
<- matrix(1:9, nrow = 3, ncol = 3)
m1 <- matrix(21:29, nrow = 3, ncol = 3)
m2 <- matrix(1:12, nrow = 2, ncol = 6) m3
b. Obtain the dimensions of the matrices
# m1
nrow(m1)
[1] 3
ncol(m1)
[1] 3
dim(m1)
[1] 3 3
# m3
nrow(m3)
[1] 2
ncol(m3)
[1] 6
dim(m3)
[1] 2 6
c. Arithmetic with matrices
+m2 m1
[,1] [,2] [,3]
[1,] 22 28 34
[2,] 24 30 36
[3,] 26 32 38
-m2 m1
[,1] [,2] [,3]
[1,] -20 -20 -20
[2,] -20 -20 -20
[3,] -20 -20 -20
*m2 m1
[,1] [,2] [,3]
[1,] 21 96 189
[2,] 44 125 224
[3,] 69 156 261
/m2 m1
[,1] [,2] [,3]
[1,] 0.04761905 0.1666667 0.2592593
[2,] 0.09090909 0.2000000 0.2857143
[3,] 0.13043478 0.2307692 0.3103448
== m2 m1
[,1] [,2] [,3]
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
d. Matrix multiplication
<- matrix(1:10, nrow = 5)
m5 <- matrix(43:34, nrow = 5)
m6
*m6 m5
[,1] [,2]
[1,] 43 228
[2,] 84 259
[3,] 123 288
[4,] 160 315
[5,] 195 340
# m5%*%m6 will not work because of the dimesions.
# the vector m6 needs to be transposed.
# Transpose
%*%t(m6) m5
[,1] [,2] [,3] [,4] [,5]
[1,] 271 264 257 250 243
[2,] 352 343 334 325 316
[3,] 433 422 411 400 389
[4,] 514 501 488 475 462
[5,] 595 580 565 550 535
e. Generate an identity matrix
diag(5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 0
[2,] 0 1 0 0 0
[3,] 0 0 1 0 0
[4,] 0 0 0 1 0
[5,] 0 0 0 0 1
f. Column and row names
colnames(m5)
NULL
rownames(m6)
NULL
5. Arrays
An array is a multidimensional vector that stores homogeneous data. It can be thought of as a stacked matrix and stores data in more than 2 dimensions (n-dimensional). An array is composed of rows by columns by dimensions. Example: an array with dimensions, dim = c(2,3,3), has 2 rows, 3 columns, and 3 matrices.
a. Creating arrays
<- array(1:12, dim = c(2,3,2))
arr_1
arr_1
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
b. Filter array by index
1, , ] arr_1[
[,1] [,2]
[1,] 1 7
[2,] 3 9
[3,] 5 11
1, ,1] arr_1[
[1] 1 3 5
1] arr_1[, ,
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
6. Factors
Factors are used to store integers or strings which are categorical. They categorize data and store the data in different levels. This form of data storage is useful for statistical modeling. Examples include TRUE or FALSE and male or female.
<- c("Male", "Female")
vector <- factor(vector)
factor_1 factor_1
[1] Male Female
Levels: Female Male
OR
<- as.factor(vector)
factor_2 factor_2
[1] Male Female
Levels: Female Male
as.numeric(factor_2)
[1] 2 1