This is a companion glossary for a previous post on working with large data sets. Its purpose is to highlight the relevant arguments for dealing with reading and working with large data sets.
- Loading the data file:
- Set directory
- Standard form
- setwd(“…”)
- Mac
- Windows
- Simplified
-
- this replaces “C:/Users/UserName/Documents”
-
- Additional information
- Standard form
- Read file
- read table
- read.table(‘file’, header=TRUE, sep = “ ”)
- Works with .txt and .csv
- ex:
- Works with .txt and .csv
- read.table(‘file’, header=TRUE, sep = “ ”)
- read csv
- read.csv(‘file’)
- functions the same as read.table but specifically for .csv files
- ex:
- functions the same as read.table but specifically for .csv files
- read.csv(‘file’)
- read excel file
- read.xls(‘file’, sheetname = ‘sheet’)
- similar to read.csv but for .xlsx files
- need to load package that can read .xlsx files
- need to specify tab being read in
- ex:
- read.xls(‘file’, sheetname = ‘sheet’)
- Choose file
- read.csv(file.choose())
- to choose files manually
- read.csv(file.choose())
- Scan
- scan(‘file’, what =, sep = “ “)
- scan file should not be used for larger data sets
- scan(‘file’, what =, sep = “ “)
- Additional information
- Extensive descriptions of reading data
- read table
- Set directory
- Cleaning up the data set:
- Blank values
- NA values
- na.omit(data)
- ex:
- This creates a new data set identical to the original without the NA values
- na.exclude(data)
- Functions similar to na.omit(data)
- Both return object with rows containing NAs removed
- na.fail(data)
- Checks for NAs and returns the tested object if none are found
- na.pass(data)
- Passes over NAs to return the object unchanged
- na.omit(data)
- Additional information
- More information with examples
- NA values
- Converting factors an numerics
- as.character(data)
- converts all factors to character strings
- ex:
- Additional information
- converts all factors to character strings
- as.numeric(data)
- converts all factors or characters into a numeric vector
- same function as as.double and as.real
- ex:
- Additional information
- as.character(data)
- Blank values
- Using the data:
- plotting
- Basic plot function
- plot(x,y)
- Other useful arguments inside the plot function
- type = ” ”
- plot type
- ie “l” for line plot
- main = ‘ ‘
- main plot title
- xlab = ‘ ‘
- x-axis title
- ylab = ‘ ‘
- y-axis title
- col = ” ”
- line color
- ie. “blue”
- lwd = <num>
- linewidth/thichkness
- type = ” ”
- Arguments outside plot function
- grid()
- turns on grid lines for plot
- par(new = T/F)
- for plotting multiple lines or data on one plot
- T if plotting another line after/F if not
- grid()
- ex:
- Additional information
- Basic plot function
- plotting
One thought on “Glossary for Working with Data sets”