R – Data Structures

Data Types / Data Structures

There are many types of data structures in R. Data structure means the kind of data there is in our data set.

Kind of data can be Numeric, Character or both.

 

 1. Vectors

They are data structures in R that have elements of same kind.

Vector can be numeric, character or logical

 

To create a numeric vector we first create a variable and assign some value to that vector:

example <- c(24,34,54)

Here, example is the name of variable.

<- is the assignment operator which means equal to in simple english.

c() is a function which in simple english we can say combine or concatenate.

Inside c() there are bunch of numbers that  are combined together to make a vector.

To check the type of vector created we can use function:

class(example)
[1] “numeric”

So we get output numeric:

Now check what the class of variable example2 will have:
example2 <- c("I", "Love","R")

class(example2)
example2 comes out to be character vector. Notice that elements inside c() function are put in between double quotations – “I” , “Love” , “R”

In numeric we did not put it like that so R interprets that the elements put without “” are not numeric or logical vector.

 

Logical vector can be created in R as follows:

test <- c(TRUE, TRUE, FALSE, FALSE)
class(test)
[1] “logical”

Notice that TRUE is not same as true. If we were to experiment with that the following message comes up:

test1 <- c(true, false)
Error: object ‘true’ not found

 

2. Factors 

They are those data types that are used for categorical distinction of data. You can call them as Nominal variables.

For example, Grades ( A+, A, B, C ..), Gender(M,F) etc.

So, to create a factor in R write the following code:
gender <- factor(c("M","F","M","M","F"))
Output:

gender
[1] M F M M F
Levels: F M

It gives the levels for the variable in ascending order.

To get a detail of gender factor we write the following function:

str(gender)
Factor w/ 2 levels “F”,”M”: 2 1 2 2 1

Now we see what the value of the level is in structure function.

Just understand the concept of this, we will use levels in later analysis then you will be able to relate why factors are useful.

Now, if we wish to change the level of F as 1 and M as 2 we can write:
gender <- factor(c("M","F","M","M","F"),levels = c("M","F"))
str(gender)
Factor w/ 2 levels “M”,”F”: 1 2 1 1 2

Now the level is changes when we put 2 arguments in our function of factor, the levels argument assigns the levels as per user but by default, factor function takes levels alphabetically.

 

3. Data Frames

Data frames are kind of data that generally is in excel, i.e like in rows and columns.

So far we have been dealing with single row data but its time to build out own excel like data which is called data frame in R.

The following code will create 4 new variables:
name <- c("Vaibhav", "Bruno","Rocksy")
Passed <- c(TRUE, FALSE, TRUE)
age <- c(23,2,2)
Gender <- c("M","M","F")
Name is Character vector,
Passed is Logical vector,
Age is Numeric vector,
Gender is again Categorical.

But we want the Gender variable to be a factor so we can do that by writing:

Gender <- factor(Gender)
str(Gender)
Factor w/ 2 levels “F”,”M”: 2 2 1

Now, as our Gender variable has become factor lets combine all these to a data frame.

dataframe <- data.frame(name, Passed, age, Gender)

Variable dataframe has now our data combined with help of function data.frame.
Let’s see what is in dataframe:

 

 

 

 

 

 

 

So, we finally have data in Rows and Columns.

Notice that R combines data row wise when we use data.frame default function. Now lets look at the structure:

 

 

 

 

 

 

 

 

Did you notice how our character variable name is changed to factor?

This happens if we use data.frame function without the second argument:

dataframe <- data.frame(name, Passed, age, Gender, stringsAsFactors = FALSE)

stringsAsFactors = FALSE is the second argument that does not convert any character vector to factor.
So, lets see the structure of dataframe now:

str(dataframe)
‘data.frame’: 3 obs. of 4 variables:
$ name : chr “Vaibhav” “Bruno” “Rocksy”
$ Passed: logi TRUE FALSE TRUE
$ age : num 23 2 2
$ Gender: Factor w/ 2 levels “F”,”M”: 2 2 1

So, name remains the way we wanted it and we have made our first data frame.

 

 

Congratulations on learning the type of data structures.

Leave a Reply

avatar
  Subscribe  
Notify of