R is a powerful tool that outperforms software like Microsoft Excel in statistical computing, data analysis, and visualization. It enables faster data loading, automated cleaning, and advanced statistical and predictive analysis. However, before diving into analysis, the first critical step is importing data into R efficiently. Data can come from various sources—CSV files, Excel spreadsheets, databases, web APIs, or even proprietary statistical software formats like SPSS, Stata, and SAS. Handling these different file types effectively is key to ensuring a smooth workflow.
I’ve put together a clear and comprehensive tutorial on importing data into R to simplify the process. From basic text files to more advanced SPSS and SAS formats, this guide will help you seamlessly load your data. Keep reading to learn how to import your files into R with ease!
What You’ll Learn in This Guide
In this guide, I will walk you through the best methods for importing different types of data into R, including:
- ✅ Flat files (CSV, TXT, TSV)
- ✅ Excel files (XLSX, XLS)
- ✅ Structured data (JSON, XML, HTML)
- ✅ Database connections (SQL, SQLite, MySQL, PostgreSQL)
- ✅ Statistical software formats (SPSS, Stata, SAS)
- ✅ Matlab files (MAT)
- ✅ Google Sheets (cloud-based data)
- ✅ Quantmod data (financial data)
- ✅ Large Datasets
By the end of this guide, you’ll be able to:
- Use the right package for each file type.
- Avoid common pitfalls in data importing.
- Optimize data imports for speed and efficiency.
- Handle large datasets without crashing R.
No matter your level of expertise, mastering data import will streamline your workflow and allow you to focus on data analysis and visualization rather than struggling with file compatibility issues.
Let’s get started! 🚀
First Step: Package CheatSheet & Installation
In R, different packages are used to import various types of data formats. Here’s a list of commonly used packages based on the type of data source:
Base R Functions (No Extra Packages Required)
read.table()
,read.csv()
,read.delim()
,read.csv2()
→ Read text-based data (CSV, TSV)scan()
→ Reads data line by linefile()
→ Reads raw file connections
Read Text, CSV, and Excel Files
Format | Package | Function |
---|---|---|
CSV/TSV/Delimited Files | utils | read.csv() , read.delim() |
CSV (Efficient) | data.table | fread() |
CSV/TSV (Tidy Approach) | readr | read_csv() , read_tsv() |
Excel (.xlsx, .xls) | readxl | read_excel() |
Excel (Write Support) | writexl | write_xlsx() |
Google Sheets | googlesheets4 | read_sheet() |
Read from Databases
Database | Package | Function |
---|---|---|
MySQL, PostgreSQL, SQLite | DBI , RMySQL , RPostgreSQL , RSQLite | dbConnect() , dbReadTable() |
Microsoft SQL Server | odbc , RODBC | odbcConnect() , sqlQuery() |
Google BigQuery | bigrquery | dbConnect() , bq_table_download() |
Read JSON, XML, and Web Data
Format | Package | Function |
---|---|---|
JSON | jsonlite | fromJSON() |
XML | XML , xml2 | xmlParse() , read_xml() |
Web Scraping | rvest | read_html() , html_table() |
Read Statistical & Specialized File Formats
Format | Package | Function |
---|---|---|
SPSS (.sav) | haven , foreign | read_sav() |
Stata (.dta) | haven , foreign | read_dta() |
SAS (.sas7bdat) | haven | read_sas() |
MATLAB (.mat) | R.matlab | readMat() |
HDF5 | hdf5r , rhdf5 | h5read() |
Read Big Data & Parallel Processing
Format | Package | Function |
---|---|---|
Large CSV | data.table | fread() |
Apache Parquet | arrow | read_parquet() |
Spark Data | sparklyr | spark_read_csv() |
Hadoop | rhdfs | hdfs.read() |
Read GIS & Spatial Data
Format | Package | Function |
---|---|---|
Shapefiles | sf , rgdal | st_read() |
GeoJSON | geojsonio | geojson_read() |
Raster Data | raster , terra | raster() |
To install multiple packages at once, use the c()
function:
install.packages(c('tidyverse',
'foreign',
'haven',
'quantmod',
'googlesheets4',
'ff',
'R.matlab'), dependencies=TRUE)
By including the dependencies=TRUE
parameter in install.packages()
, we ensure that all required dependencies are installed along with the main packages.
Now that your environment is ready, you’re all set to start importing data into R effortlessly!
Importing Flat Files (CSV, TXT, TSV)
CSV (Comma-Separated Values) Files
CSV data in this example:
Movie_ID,Title,Genre,Director,Release_Year,IMDb_Rating,Box_Office_Millions
1,Inception,Sci-Fi,Christopher Nolan,2010,8.8,829
2,The Matrix,Sci-Fi,Lana Wachowski,1999,8.7,466
3,Interstellar,Sci-Fi,Christopher Nolan,2014,8.6,677
4,The Dark Knight,Action,Christopher Nolan,2008,9.0,1005
5,Pulp Fiction,Crime,Quentin Tarantino,1994,8.9,213
CSV files are one of the most common data formats. The readr
package from tidyverse provides fast and efficient functions to read CSV files.
library(readr)
# Import CSV file
df <- read_csv("movies_dataset.csv") # Automatically detects column types
head(df, 5)

If your CSV file has no column names, specify them using the col_names
argument:
df <- read_csv("movies_dataset.csv", col_names = c("Movie_ID", "Title", "Genre", "Director", "Release_Year", "IMDb_Rating", "Box_Office_Millions"))
head(df, 5)
For base R users, you can use:
df <- read.csv("movies_dataset.csv", stringsAsFactors = FALSE)
head(df, 5)
In R, you can also use the read.table()
function to import the movies_dataset.csv
file. Since CSV files are comma-separated, you need to specify the separator (sep=","
) and ensure that headers are read correctly (header=TRUE
). Here’s how you can do it:
# Set the file path (update this if needed)
file_path <- "movies_dataset.csv"
# Read the CSV file
movies_df <- read.table(file_path, sep=",", header=TRUE, stringsAsFactors=FALSE)
# Display the first few rows
head(movies_df, 5)
file_path
: The location of your CSV file.sep=","
: Specifies that the separator is a comma (important for CSV files).header=TRUE
: Ensures that the first row is treated as column names.stringsAsFactors=FALSE
: Prevents automatic conversion of character columns into factors.

Importing TXT and TSV (Tab-Separated Values) Files
TXT data in this example:
[Intro]
Yeah, look
You ever feel like you’re trapped in a cage?
Like the whole world's watching but you’re stuck on the stage?
[Verse 1]
I was just a kid with a dream and a pen
Tryna scribble out my pain, make it bleed through again
Momma workin’ two jobs, pops gone in the wind
So I put my life in bars, let the therapy begin
[Chorus]
They don’t know what I’ve been through
Every scar, every battle that I lived through
I just fight, I just rise, I continue
Yeah, they don’t know, they don’t know
[Verse 2]
See, I came from the bottom, no silver spoon
Had to hustle just to eat, yeah, real life goon
Teachers said I wouldn’t make it, now they watch from the side
While I'm tearing up these stages with the fire inside
[Chorus]
They don’t know what I’ve been through
Every scar, every battle that I lived through
I just fight, I just rise, I continue
Yeah, they don’t know, they don’t know
[Outro]
Now they wanna see me fall, but I stand too tall
Through the pain, through the hate, yeah, I conquer it all
We will use an alternative to read.table()
, the read.delim()
function, to load the text file as an R dataframe. Other alternatives include read.csv()
, read.csv2()
, and read.delim2()
.
Note: By default, read.delim()
separates values using a tab (sep = "\t"
).
Since the text file contains lyrics and lacks a header row, we need to specify header = FALSE
to ensure all lyrics appear in a single row.
Additionally, you can customize the dataframe using other parameters, such as the fill
parameter, which adds blank fields to rows of unequal length.
# Import newline-separated file
df <- read_delim("lyrics.txt", delim = "\n")
head(df,5)
# Import semicolon-separated file
df <- read_delim("lyrics.txt", delim = ";")

For base R:
df <- read.table("lyrics.txt", sep="\t", header=TRUE, stringsAsFactors=FALSE)
Importing Excel Files (.xlsx, .xls)
Excel data in this example:

Movie_ID Title Genre Director Release_Year IMDb_Rating Box_Office_Millions
1 Inception Sci-Fi Christopher Nolan 2010 8,8 829
2 The Matrix Sci-Fi Lana Wachowski 1999 8,7 466
3 Interstellar Sci-Fi Christopher Nolan 2014 8,6 677
4 The Dark Knight Action Christopher Nolan 2008 9 1005
5 Pulp Fiction Crime Quentin Tarantino 1994 8,9 213
6 Fight Club Drama David Fincher 1999 8,8 101
7 Forrest Gump Drama Robert Zemeckis 1994 8,8 678
8 The Shawshank Redemption Drama Frank Darabont 1994 9,3 58
9 The Godfather Crime Francis Ford Coppola 1972 9,2 250
10 Gladiator Action Ridley Scott 2000 8,5 460
Excel files require the readxl
package (which does not depend on Java).
library(readxl)
# Read Excel file
df <- read_excel("movies_dataset.xlsx")
head(df,5)
# Specify sheet name or index
df <- read_excel("movies_dataset.xlsx", sheet = "Movies")
df <- read_excel("movies_dataset.xlsx", sheet = 1)

For writing data to an Excel file:
library(writexl)
write_xlsx(df, "output.xlsx")
For openxlsx
(another popular package):
library(openxlsx)
df <- read.xlsx("movies_dataset.xlsx", sheet = 1)
head(df,5)
Importing JSON Files
JSON data in this example:
[
{
"Movie_ID":1,
"Title":"Inception",
"Genre":"Sci-Fi",
"Director":"Christopher Nolan",
"Release_Year":2010,
"IMDb_Rating":8.8,
"Box_Office_Millions":829
},
{
"Movie_ID":2,
"Title":"The Matrix",
"Genre":"Sci-Fi",
"Director":"Lana Wachowski",
"Release_Year":1999,
"IMDb_Rating":8.7,
"Box_Office_Millions":466
},
{
"Movie_ID":3,
"Title":"Interstellar",
"Genre":"Sci-Fi",
"Director":"Christopher Nolan",
"Release_Year":2014,
"IMDb_Rating":8.6,
"Box_Office_Millions":677
},
{
"Movie_ID":4,
"Title":"The Dark Knight",
"Genre":"Action",
"Director":"Christopher Nolan",
"Release_Year":2008,
"IMDb_Rating":9.0,
"Box_Office_Millions":1005
},
{
"Movie_ID":5,
"Title":"Pulp Fiction",
"Genre":"Crime",
"Director":"Quentin Tarantino",
"Release_Year":1994,
"IMDb_Rating":8.9,
"Box_Office_Millions":213
},
{
"Movie_ID":6,
"Title":"Fight Club",
"Genre":"Drama",
"Director":"David Fincher",
"Release_Year":1999,
"IMDb_Rating":8.8,
"Box_Office_Millions":101
},
{
"Movie_ID":7,
"Title":"Forrest Gump",
"Genre":"Drama",
"Director":"Robert Zemeckis",
"Release_Year":1994,
"IMDb_Rating":8.8,
"Box_Office_Millions":678
},
{
"Movie_ID":8,
"Title":"The Shawshank Redemption",
"Genre":"Drama",
"Director":"Frank Darabont",
"Release_Year":1994,
"IMDb_Rating":9.3,
"Box_Office_Millions":58
},
{
"Movie_ID":9,
"Title":"The Godfather",
"Genre":"Crime",
"Director":"Francis Ford Coppola",
"Release_Year":1972,
"IMDb_Rating":9.2,
"Box_Office_Millions":250
},
{
"Movie_ID":10,
"Title":"Gladiator",
"Genre":"Action",
"Director":"Ridley Scott",
"Release_Year":2000,
"IMDb_Rating":8.5,
"Box_Office_Millions":460
}
]
JSON files are widely used in web APIs. The jsonlite
package helps read and parse JSON data into R.
library(jsonlite)
# Read JSON file
df <- fromJSON("movies_dataset.json")
print(df[2])
# Read JSON from a URL
df <- fromJSON("https://api.example.com/data.json")

To convert an R dataframe to JSON:
json_data <- toJSON(df, pretty=TRUE)
write(json_data, file = "output.json")
Importing XML Files
XML data in this example:
<?xml version="1.0" ?>
<Movies>
<Movie>
<Movie_ID>1</Movie_ID>
<Title>Inception</Title>
<Genre>Sci-Fi</Genre>
<Director>Christopher Nolan</Director>
<Release_Year>2010</Release_Year>
<IMDb_Rating>8.8</IMDb_Rating>
<Box_Office_Millions>829</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>2</Movie_ID>
<Title>The Matrix</Title>
<Genre>Sci-Fi</Genre>
<Director>Lana Wachowski</Director>
<Release_Year>1999</Release_Year>
<IMDb_Rating>8.7</IMDb_Rating>
<Box_Office_Millions>466</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>3</Movie_ID>
<Title>Interstellar</Title>
<Genre>Sci-Fi</Genre>
<Director>Christopher Nolan</Director>
<Release_Year>2014</Release_Year>
<IMDb_Rating>8.6</IMDb_Rating>
<Box_Office_Millions>677</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>4</Movie_ID>
<Title>The Dark Knight</Title>
<Genre>Action</Genre>
<Director>Christopher Nolan</Director>
<Release_Year>2008</Release_Year>
<IMDb_Rating>9.0</IMDb_Rating>
<Box_Office_Millions>1005</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>5</Movie_ID>
<Title>Pulp Fiction</Title>
<Genre>Crime</Genre>
<Director>Quentin Tarantino</Director>
<Release_Year>1994</Release_Year>
<IMDb_Rating>8.9</IMDb_Rating>
<Box_Office_Millions>213</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>6</Movie_ID>
<Title>Fight Club</Title>
<Genre>Drama</Genre>
<Director>David Fincher</Director>
<Release_Year>1999</Release_Year>
<IMDb_Rating>8.8</IMDb_Rating>
<Box_Office_Millions>101</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>7</Movie_ID>
<Title>Forrest Gump</Title>
<Genre>Drama</Genre>
<Director>Robert Zemeckis</Director>
<Release_Year>1994</Release_Year>
<IMDb_Rating>8.8</IMDb_Rating>
<Box_Office_Millions>678</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>8</Movie_ID>
<Title>The Shawshank Redemption</Title>
<Genre>Drama</Genre>
<Director>Frank Darabont</Director>
<Release_Year>1994</Release_Year>
<IMDb_Rating>9.3</IMDb_Rating>
<Box_Office_Millions>58</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>9</Movie_ID>
<Title>The Godfather</Title>
<Genre>Crime</Genre>
<Director>Francis Ford Coppola</Director>
<Release_Year>1972</Release_Year>
<IMDb_Rating>9.2</IMDb_Rating>
<Box_Office_Millions>250</Box_Office_Millions>
</Movie>
<Movie>
<Movie_ID>10</Movie_ID>
<Title>Gladiator</Title>
<Genre>Action</Genre>
<Director>Ridley Scott</Director>
<Release_Year>2000</Release_Year>
<IMDb_Rating>8.5</IMDb_Rating>
<Box_Office_Millions>460</Box_Office_Millions>
</Movie>
</Movies>
For XML data, the xml2
package provides easy parsing.
library(xml2)
# Read XML file
xml_data <- read_xml("movies_dataset.xml")
# Extract specific nodes
nodes <- xml_find_all(xml_data, "//Movie")
# Convert XML data into a dataframe
df <- data.frame(
Title = xml_text(xml_find_all(nodes, "Title")),
Genre = xml_text(xml_find_all(nodes, "Genre")),
Release_Year = as.numeric(xml_text(xml_find_all(nodes, "Release_Year")))
)

Importing an HTML Table from a URL
R provides several functions to read HTML tables directly from a webpage. The rvest and XML packages are commonly used for this purpose. Below, Ι’ll demonstrate how to extract the highest-grossing films table from the Wikipedia page.
Using rvest
The rvest
package is a modern and user-friendly way to scrape web data in R.
# Install and load the rvest package
install.packages("rvest")
library(rvest)
# Define the URL
url <- "https://en.wikipedia.org/wiki/List_of_highest-grossing_films"
# Read the HTML content
webpage <- read_html(url)
# Extract all tables from the page
tables <- html_table(webpage, fill = TRUE)
# Display the first few rows of the first table
head(tables[[1]],10)

Using XML
The XML
package provides another way to extract tables from an HTML page.
# Install and load the XML package
install.packages("XML")
library(XML)
library(RCurl)
url <- getURL("https://en.wikipedia.org/wiki/List_of_highest-grossing_films")
tables <- readHTMLTable(url)
highest_grossing_films <- tables[1]
highest_grossing_films
Choosing the Right Table
Wikipedia pages often contain multiple tables. The extracted tables are stored in a list, so you may need to inspect them and select the one that contains the required data.
# Check how many tables were extracted
length(tables)
# View the structure of the first table
str(tables[[1]])
# Assign the correct table to a dataframe
highest_grossing_films <- tables[[1]]
This method allows you to efficiently scrape HTML tables from Wikipedia or other websites and store them in an R dataframe for further analysis.
Importing SQL Databases (MySQL, PostgreSQL, SQLite)
R provides database connections using DBI
and specific database drivers.
SQLite Example
library(DBI)
# Connect to SQLite database
con <- dbConnect(RSQLite::SQLite(), "database.sqlite")
# List tables
dbListTables(con)
# Read table into R
df <- dbReadTable(con, "employees")
# Run custom SQL query
df <- dbGetQuery(con, "SELECT * FROM employees WHERE Age > 30")
# Disconnect
dbDisconnect(con)
MySQL Example
library(RMySQL)
con <- dbConnect(MySQL(), dbname="your_db", host="localhost", user="root", password="password")
df <- dbGetQuery(con, "SELECT * FROM employees")
dbDisconnect(con)
PostgreSQL Example
library(RPostgres)
con <- dbConnect(RPostgres::Postgres(), dbname="your_db", host="localhost", user="postgres", password="password")
df <- dbGetQuery(con, "SELECT * FROM employees")
dbDisconnect(con)
Importing SPSS, Stata, and SAS Files
For statistical software formats like SPSS (.sav), Stata (.dta), and SAS (.sas7bdat), the haven
and foreign
packages are commonly used. You can download the .sav file I used for this example from here (it’s “Text Messages.sav”).
SPSS (.sav) Files
Using haven
:
library(haven)
df <- read_sav("Text Messages.sav")
head(df,12)

Using foreign
:
library(foreign)
df <- read.spss("Text Messages.sav", to.data.frame = TRUE)
head(df,12)

Stata (.dta) Files
For this example, I used the airline.dta from here.
library(haven)
df <- read_dta("airline.dta") # Using haven
head(df,10)

library(foreign)
df <- read.dta("airline.dta") # Using foreign
head(df,10)

SAS (.sas7bdat) Files
For this example, I used the AP_VOTECAST_2018_DATA.sas7bdat from here (you must create an account to access it).
library(haven)
df <- read_sas("AP_VOTECAST_2018_DATA.sas7bdat") # Using haven
head(df,10)

XPT (.xpt) files:
library(haven)
df <- read_xpt("data.xpt")
Importing Matlab Files (.mat)
Matlab files can be imported using the R.matlab
package.
library(R.matlab)
# Read Matlab file
data <- readMat("movies_dataset.mat")
# Extract variables
df <- as.data.frame(data$Title)
df

Importing Data from Google Sheets
Google Sheets data can be imported using the googlesheets4
package. If your Google Sheet is publicly accessible (or shared with “Anyone with the link”), you can directly use something like this:
library(googlesheets4)
sheet_url = "https://docs.google.com/spreadsheets/d/1B-ZN2cgyKj-2R3I5EYhdLL3hkLlFRmf_jbyiMLaVHkg/edit?usp=sharing"
# Authorize and read Google Sheets
df <- read_sheet(sheet_url, sheet = "Music Codes")
df
You can also read a range of cells:
library(googlesheets4)
sheet_url = "https://docs.google.com/spreadsheets/d/1B-ZN2cgyKj-2R3I5EYhdLL3hkLlFRmf_jbyiMLaVHkg/edit?usp=sharing"
# Authorize and read Google Sheets
df <- read_sheet(sheet_url, sheet = "Music Codes", range = "A1:B20") # Reads only cells from A1 to B20
df
If the Google Sheet is private, you’ll need to authenticate with your Google account:
library(googlesheets4)
sheet_url = "https://docs.google.com/spreadsheets/d/1B-ZN2cgyKj-2R3I5EYhdLL3hkLlFRmf_jbyiMLaVHkg/edit?usp=sharing"
# Authenticate with Google
gs4_auth()
# Read the private sheet
df <- read_sheet(sheet_url)
df
Importing Financial Data with Quantmod
The quantmod
package is a powerful financial modeling and trading framework for R. We will use it to download and load the latest trading data into a dataframe.
You can retrieve historical stock prices for Apple (AAPL) from Yahoo Finance using the getSymbols()
function.
library(quantmod)
# Get Apple stock data from Yahoo Finance
getSymbols("AAPL", src = "yahoo")
# View the first few rows
head(AAPL)

To retrieve more specific historical stock data for Apple (AAPL), we will use the getSymbols()
function, specifying the “from” and “to” dates along with the desired data frequency.
library(quantmod)
getSymbols("AAPL", src="yahoo", from = "2025/2/1", to = "2025/2/13", periodicity = "daily")
head(AAPL,5)

Importing Large Datasets into R Efficiently
When working with large datasets in R, selecting the right function can significantly impact loading time and memory usage. Below are various approaches to importing large datasets efficiently.
In this section, we will be looking at popular functions used for loading CSV files greater than 1 GB. We are using the US Accidents (2016 – 2023) dataset from Kaggle, which is about 3.06GB in size and has 7,728,394 records.
Using read.table()
with a ZIP File
Instead of extracting the ZIP file manually, we can use the unz()
function to load the CSV directly from the archive. This saves time and disk space.
# Load data directly from a ZIP file
file <- unz("US Accidents.zip", "US_Accidents_March23.csv")
# Read the first 10,000 rows with proper separators
df <- read.table(file, header = TRUE, sep = ",", nrows = 700)
💡 Tip: By default, read.table()
assumes space-separated values, so always specify sep=","
for CSV files.
Note: I have limited the number of rows to 700 as read.table() requires a tabular structure which is not maintained in row 732.

Using read_csv()
from readr
The readr
package in R provides the read_csv()
function, which supports multi-threading for faster data loading. By default, read_csv()
automatically detects the number of available processor cores and uses multiple threads to read large CSV files efficiently. It allows specifying n_max
instead of nrow
to limit the number of records.
library(readr)
file <- unz("US Accidents.zip", "US_Accidents_March23.csv")
# Read the full dataset (can take ~1 minute)
df <- read_csv(file)
# To load only 10,000 rows:
df <- read_csv(file, n_max = 10000)
💡 Tip: To speed up loading, adjust the number of threads. Refer to the function documentation for details.
library(readr)
# Unzip the file
unzip("US Accidents.zip")
# Read the full dataset (can take ~1 minute)
df <- read_csv("US_Accidents_March23.csv", num_threads = parallel::detectCores(), n_max = 10000)

Using ff
for Memory-Efficient Loading
The ff
package loads large datasets in chunks, reducing memory usage and improving performance.
library(ff)
# Load data using read.table.ffdf() for large datasets
df <- read.table.ffdf(file = "US_Accidents_March23.csv", nrows = 10000, header = TRUE, sep = ",")
# Preview first 5 rows and first 25 columns
df[1:5, 1:10]
💡 Tip: The ff
package is ideal for handling datasets that exceed system memory.

Using fread()
from data.table
The fread()
function from the data.table
package is the fastest option for reading large datasets. It automatically detects file format and column types, requiring minimal manual configuration.
library(data.table)
# Load the first 10,000 rows efficiently
df <- fread("US_Accidents_March23.csv", sep = ",", nrows = 10000, na.strings = c("NA", "N/A", ""), stringsAsFactors = FALSE)
💡 Tip: fread()
is significantly faster than read.csv()
and can automatically determine the file structure.
Reading from Large Datasets CheatSheet
Method | Best For | Pros | Cons |
---|---|---|---|
read.table() | Standard datasets | Works with ZIP files | Slow for large datasets |
read_csv() | Large CSV files | Faster than read.table() | Requires readr package |
ff | Extremely large datasets | Efficient memory usage | More complex syntax |
fread() | Best overall performance | Very fast, auto-detects format | Requires data.table package |
For small to medium datasets, read_csv()
works well. For very large datasets, fread()
is the best choice due to its speed and efficiency.
Conclusion
This guide covers the most common ways to import data into R, from simple CSV files to complex SQL databases and statistical software formats. Mastering these techniques will allow you to work seamlessly with data from multiple sources, making your analysis more efficient and powerful.
By leveraging R’s extensive package ecosystem, you can automate data imports, clean and manipulate data efficiently, and focus on analysis rather than struggling with file compatibility issues.
FAQ
1. What types of data files can I import into R?
R supports a wide range of data formats, including:
- CSV files (
.csv
) - Excel files (
.xlsx
,.xls
) - Text files (
.txt
,.dat
) - R Data files (
.rds
,.RData
) - JSON files (
.json
) - XML files (
.xml
) - SPSS, Stata, and SAS files (
.sav
,.dta
,.sas7bdat
) - SQL Databases
2. How do I import a CSV file into R?
You can use the read.csv()
function:
data <- read.csv("path/to/your/file.csv", header = TRUE, stringsAsFactors = FALSE)
Alternatively, you can use readr::read_csv()
for faster performance:
library(readr)
data <- read_csv("path/to/your/file.csv")
3. How do I import an Excel file into R?
Use the readxl
package:
library(readxl)
data <- read_excel("path/to/your/file.xlsx", sheet = 1)
For writing Excel files, use the writexl
package:
library(writexl)
write_xlsx(data, "output.xlsx")
4. How do I import a text file into R?
For space- or tab-delimited text files, use:
data <- read.table("path/to/your/file.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
For more flexibility, use readr::read_delim()
:
data <- read_delim("path/to/your/file.txt", delim = "\t")
5. How do I import JSON data into R?
Use the jsonlite
package:
library(jsonlite)
data <- fromJSON("path/to/your/file.json")
6. How do I import data from an SQL database?
Use the DBI
package for database connections:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), "database.sqlite")
data <- dbGetQuery(con, "SELECT * FROM table_name")
dbDisconnect(con)
7. How do I load R-specific data files (.rds, .RData)?
For .rds
files:
data <- readRDS("path/to/your/file.rds")<br>
For .RData
files:
load("path/to/your/file.RData") # Loads all objects stored in the file<br>
8. How do I check if my data was imported correctly?
After importing, use these functions to inspect the data:
head(data) # View first few rows
str(data) # Check structure and data types
summary(data) # Get summary statistics
dim(data) # Get dimensions (rows & columns)
9. What should I do if my file is not found or there’s an error?
- Check the file path – Ensure it’s correct and properly formatted (use
file.choose()
to select a file interactively). - Ensure the file exists – Run
file.exists("path/to/file")
to check. - Check delimiters – If data isn’t structured correctly, specify the separator explicitly (
sep=","
for commas,sep="\t"
for tabs). - Handle encoding issues – Use
encoding="UTF-8"
if there are special characters causing problems.
10. How do I handle missing values after importing data?
Use functions like:
sum(is.na(data)) # Count missing values
data_clean <- na.omit(data) # Remove rows with missing values
data[is.na(data)] <- 0 # Replace missing values with 0
- The Tower – Idle Tower Defense Events Guide - June 20, 2025
- The Tower – Idle Tower Defense Ultimate Weapon Unlock Order - June 20, 2025
- The Tower – Idle Tower Defense Bot Unlock Order - June 20, 2025