Title: | A Swiss-Army Knife for Data I/O |
---|---|
Description: | Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types. |
Authors: | Jason Becker [aut], Chung-hong Chan [aut, cre] , David Schoch [aut] , Geoffrey CH Chan [ctb], Thomas J. Leeper [aut] , Christopher Gandrud [ctb], Andrew MacDonald [ctb], Ista Zahn [ctb], Stanislaus Stadlmann [ctb], Ruaridh Williamson [ctb], Patrick Kennedy [ctb], Ryan Price [ctb], Trevor L Davis [ctb], Nathan Day [ctb], Bill Denney [ctb] , Alex Bokov [ctb] , Hugo Gruson [ctb] |
Maintainer: | Chung-hong Chan <[email protected]> |
License: | GPL-2 |
Version: | 1.2.4 |
Built: | 2025-01-16 06:10:22 UTC |
Source: | https://github.com/gesistsa/rio |
Convert labelled variables to character or factor
characterize(x, ...) factorize(x, ...) ## Default S3 method: characterize(x, ...) ## S3 method for class 'data.frame' characterize(x, ...) ## Default S3 method: factorize(x, coerce_character = FALSE, ...) ## S3 method for class 'data.frame' factorize(x, ...)
characterize(x, ...) factorize(x, ...) ## Default S3 method: characterize(x, ...) ## S3 method for class 'data.frame' characterize(x, ...) ## Default S3 method: factorize(x, coerce_character = FALSE, ...) ## S3 method for class 'data.frame' factorize(x, ...)
x |
A vector or data frame. |
... |
additional arguments passed to methods |
coerce_character |
A logical indicating whether to additionally coerce character columns to factor (in |
characterize
converts a vector with a labels
attribute of named levels into a character vector. factorize
does the same but to factors. This can be useful at two stages of a data workflow: (1) importing labelled data from metadata-rich file formats (e.g., Stata or SPSS), and (2) exporting such data to plain text files (e.g., CSV) in a way that preserves information.
a character vector (for characterize
) or factor vector (for factorize
)
## vector method x <- structure(1:4, labels = c("A" = 1, "B" = 2, "C" = 3)) characterize(x) factorize(x) ## data frame method x <- data.frame(v1 = structure(1:4, labels = c("A" = 1, "B" = 2, "C" = 3)), v2 = structure(c(1,0,0,1), labels = c("foo" = 0, "bar" = 1))) str(factorize(x)) str(characterize(x)) ## Application csv_file <- tempfile(fileext = ".csv") ## comparison of exported file contents import(export(x, csv_file)) import(export(factorize(x), csv_file))
## vector method x <- structure(1:4, labels = c("A" = 1, "B" = 2, "C" = 3)) characterize(x) factorize(x) ## data frame method x <- data.frame(v1 = structure(1:4, labels = c("A" = 1, "B" = 2, "C" = 3)), v2 = structure(c(1,0,0,1), labels = c("foo" = 0, "bar" = 1))) str(factorize(x)) str(characterize(x)) ## Application csv_file <- tempfile(fileext = ".csv") ## comparison of exported file contents import(export(x, csv_file)) import(export(factorize(x), csv_file))
This function constructs a data frame from a data file using import()
and uses export()
to write the data to disk in the format indicated by the file extension.
convert(in_file, out_file, in_opts = list(), out_opts = list())
convert(in_file, out_file, in_opts = list(), out_opts = list())
in_file |
A character string naming an input file. |
out_file |
A character string naming an output file. |
in_opts |
A named list of options to be passed to |
out_opts |
A named list of options to be passed to |
A character string containing the name of the output file (invisibly).
Luca Braglia has created a Shiny app called rioweb that provides access to the file conversion features of rio through a web browser.
## For demo, a temp. file path is created with the file extension .dta (Stata) dta_file <- tempfile(fileext = ".dta") ## .csv csv_file <- tempfile(fileext = ".csv") ## .xlsx xlsx_file <- tempfile(fileext = ".xlsx") ## Create a Stata data file export(mtcars, dta_file) ## convert Stata to CSV and open converted file convert(dta_file, csv_file) import(csv_file) ## correct an erroneous file format export(mtcars, xlsx_file, format = "tsv") ## DON'T DO THIS ## import(xlsx_file) ## ERROR ## convert the file by specifying `in_opts` convert(xlsx_file, xlsx_file, in_opts = list(format = "tsv")) import(xlsx_file) ## convert from the command line: ## Rscript -e "rio::convert('mtcars.dta', 'mtcars.csv')"
## For demo, a temp. file path is created with the file extension .dta (Stata) dta_file <- tempfile(fileext = ".dta") ## .csv csv_file <- tempfile(fileext = ".csv") ## .xlsx xlsx_file <- tempfile(fileext = ".xlsx") ## Create a Stata data file export(mtcars, dta_file) ## convert Stata to CSV and open converted file convert(dta_file, csv_file) import(csv_file) ## correct an erroneous file format export(mtcars, xlsx_file, format = "tsv") ## DON'T DO THIS ## import(xlsx_file) ## ERROR ## convert the file by specifying `in_opts` convert(xlsx_file, xlsx_file, in_opts = list(format = "tsv")) import(xlsx_file) ## convert from the command line: ## Rscript -e "rio::convert('mtcars.dta', 'mtcars.csv')"
Write data.frame to a file
export(x, file, format, ...)
export(x, file, format, ...)
x |
A data frame, matrix or a single-item list of data frame to be written into a file. Exceptions to this rule are that |
file |
A character string naming a file. Must specify |
format |
An optional character string containing the file format, which can be used to override the format inferred from |
... |
Additional arguments for the underlying export functions. This can be used to specify non-standard arguments. See examples. |
This function exports a data frame or matrix into a file with file format based on the file extension (or the manually specified format, if format
is specified).
The output file can be to a compressed directory, simply by adding an appropriate additional extensiont to the file
argument, such as: “mtcars.csv.tar”, “mtcars.csv.zip”, or “mtcars.csv.gz”.
export
supports many file formats. See the documentation for the underlying export functions for optional arguments that can be passed via ...
Comma-separated data (.csv), using data.table::fwrite()
Pipe-separated data (.psv), using data.table::fwrite()
Tab-separated data (.tsv), using data.table::fwrite()
SAS (.sas7bdat), using haven::write_sas()
.
SAS XPORT (.xpt), using haven::write_xpt()
.
SPSS (.sav), using haven::write_sav()
SPSS compressed (.zsav), using haven::write_sav()
Stata (.dta), using haven::write_dta()
. Note that variable/column names containing dots (.) are not allowed and will produce an error.
Excel (.xlsx), using writexl::write_xlsx()
. x
can also be a list of data frames; the list entry names are used as sheet names.
R syntax object (.R), using base::dput()
(by default) or base::dump()
(if format = 'dump'
)
Saved R objects (.RData,.rda), using base::save()
. In this case, x
can be a data frame, a named list of objects, an R environment, or a character vector containing the names of objects if a corresponding envir
argument is specified.
Serialized R objects (.rds), using base::saveRDS()
. In this case, x
can be any serializable R object.
Serialized R objects (.qs), using qs::qsave()
, which is
significantly faster than .rds. This can be any R
object (not just a data frame).
"XBASE" database files (.dbf), using foreign::write.dbf()
Weka Attribute-Relation File Format (.arff), using foreign::write.arff()
Fixed-width format data (.fwf), using utils::write.table()
with row.names = FALSE
, quote = FALSE
, and col.names = FALSE
CSVY (CSV with a YAML metadata header) using data.table::fwrite()
.
Apache Arrow Parquet (.parquet), using nanoparquet::write_parquet()
Feather R/Python interchange format (.feather), using arrow::write_feather()
Fast storage (.fst), using fst::write.fst()
JSON (.json), using jsonlite::toJSON()
. In this case, x
can be a variety of R objects, based on class mapping conventions in this paper: https://arxiv.org/abs/1403.2805.
Matlab (.mat), using rmatio::write.mat()
OpenDocument Spreadsheet (.ods, .fods), using readODS::write_ods()
or readODS::write_fods()
.
HTML (.html), using a custom method based on xml2::xml_add_child()
to create a simple HTML table and xml2::write_xml()
to write to disk.
XML (.xml), using a custom method based on xml2::xml_add_child()
to create a simple XML tree and xml2::write_xml()
to write to disk.
YAML (.yml), using yaml::write_yaml()
, default to write the content with UTF-8. Might not work on some older systems, e.g. default Windows locale for R <= 4.2.
Clipboard export (on Windows and Mac OS), using utils::write.table()
with row.names = FALSE
When exporting a data set that contains label attributes (e.g., if imported from an SPSS or Stata file) to a plain text file, characterize()
can be a useful pre-processing step that records value labels into the resulting file (e.g., export(characterize(x), "file.csv")
) rather than the numeric values.
Use export_list()
to export a list of dataframes to separate files.
The name of the output file as a character string (invisibly).
characterize()
, import()
, convert()
, export_list()
## For demo, a temp. file path is created with the file extension .csv csv_file <- tempfile(fileext = ".csv") ## .xlsx xlsx_file <- tempfile(fileext = ".xlsx") ## create CSV to import export(iris, csv_file) ## You can certainly export your data with the file name, which is not a variable: ## import(mtcars, "car_data.csv") ## pass arguments to the underlying function ## data.table::fwrite is the underlying function and `col.names` is an argument export(iris, csv_file, col.names = FALSE) ## export a list of data frames as worksheets export(list(a = mtcars, b = iris), xlsx_file) # NOT RECOMMENDED ## specify `format` to override default format export(iris, xlsx_file, format = "csv") ## That's confusing ## You can also specify only the format; in the following case ## "mtcars.dta" is written [also confusing] ## export(mtcars, format = "stata")
## For demo, a temp. file path is created with the file extension .csv csv_file <- tempfile(fileext = ".csv") ## .xlsx xlsx_file <- tempfile(fileext = ".xlsx") ## create CSV to import export(iris, csv_file) ## You can certainly export your data with the file name, which is not a variable: ## import(mtcars, "car_data.csv") ## pass arguments to the underlying function ## data.table::fwrite is the underlying function and `col.names` is an argument export(iris, csv_file, col.names = FALSE) ## export a list of data frames as worksheets export(list(a = mtcars, b = iris), xlsx_file) # NOT RECOMMENDED ## specify `format` to override default format export(iris, xlsx_file, format = "csv") ## That's confusing ## You can also specify only the format; in the following case ## "mtcars.dta" is written [also confusing] ## export(mtcars, format = "stata")
Use export()
to export a list of data frames to a vector of file names or a filename pattern.
export_list(x, file, archive = "", ...)
export_list(x, file, archive = "", ...)
x |
A list of data frames to be written to files. |
file |
A character vector string containing a single file name with a |
archive |
character. Either empty string (default) to save files in current directory, a path to a (new) directory, or a .zip/.tar file to compress all files into an archive. |
... |
Additional arguments passed to |
export()
can export a list of data frames to a single multi-dataset file (e.g., an Rdata or Excel .xlsx file). Use export_list
to export such a list to multiple files.
The name(s) of the output file(s) as a character vector (invisibly).
import()
, import_list()
, export()
## For demo, a temp. file path is created with the file extension .xlsx xlsx_file <- tempfile(fileext = ".xlsx") export( list( mtcars1 = mtcars[1:10, ], mtcars2 = mtcars[11:20, ], mtcars3 = mtcars[21:32, ] ), xlsx_file ) # import a single file from multi-object workbook import(xlsx_file, sheet = "mtcars1") # import all worksheets, the return value is a list import_list(xlsx_file) library('datasets') export(list(mtcars1 = mtcars[1:10,], mtcars2 = mtcars[11:20,], mtcars3 = mtcars[21:32,]), xlsx_file <- tempfile(fileext = ".xlsx") ) # import all worksheets list_of_dfs <- import_list(xlsx_file) # re-export as separate named files ## export_list(list_of_dfs, file = c("file1.csv", "file2.csv", "file3.csv")) # re-export as separate files using a name pattern; using the names in the list ## This will be written as "mtcars1.csv", "mtcars2.csv", "mtcars3.csv" ## export_list(list_of_dfs, file = "%s.csv")
## For demo, a temp. file path is created with the file extension .xlsx xlsx_file <- tempfile(fileext = ".xlsx") export( list( mtcars1 = mtcars[1:10, ], mtcars2 = mtcars[11:20, ], mtcars3 = mtcars[21:32, ] ), xlsx_file ) # import a single file from multi-object workbook import(xlsx_file, sheet = "mtcars1") # import all worksheets, the return value is a list import_list(xlsx_file) library('datasets') export(list(mtcars1 = mtcars[1:10,], mtcars2 = mtcars[11:20,], mtcars3 = mtcars[21:32,]), xlsx_file <- tempfile(fileext = ".xlsx") ) # import all worksheets list_of_dfs <- import_list(xlsx_file) # re-export as separate named files ## export_list(list_of_dfs, file = c("file1.csv", "file2.csv", "file3.csv")) # re-export as separate files using a name pattern; using the names in the list ## This will be written as "mtcars1.csv", "mtcars2.csv", "mtcars3.csv" ## export_list(list_of_dfs, file = "%s.csv")
gather_attrs
moves variable-level attributes to the data frame level and spread_attrs
reverses that operation.
gather_attrs(x) spread_attrs(x)
gather_attrs(x) spread_attrs(x)
x |
A data frame. |
import()
attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg
is stored in attributes(mtcars$mpg)
rather than attributes(mtcars)
). gather_attrs
moves these to the data frame level (i.e., in attributes(mtcars)
). spread_attrs
moves attributes back to the variable level.
x
, with variable-level attributes stored at the data frame level.
A utility function to retrieve the file information of a filename, path, or URL.
get_info(file) get_ext(file)
get_info(file) get_ext(file)
file |
A character string containing a filename, file path, or URL. |
For get_info()
, a list is return with the following slots
input
file extension or information used to identify the possible file format
format
file format, see format
argument of import()
type
"import" (supported by default); "suggest" (supported by suggested packages, see install_formats()
); "enhance" and "known " are not directly supported; NA
is unsupported
format_name
name of the format
import_function
What function is used to import this file
export_function
What function is used to export this file
file
file
For get_ext()
, just input
(usually file extension) is returned; retained for backward compatibility.
get_info("starwars.xlsx") get_info("starwars.ods") get_info("https://github.com/ropensci/readODS/raw/v2.1/starwars.ods") get_info("~/duran_duran_rio.mp3") get_ext("clipboard") ## "clipboard" get_ext("https://github.com/ropensci/readODS/raw/v2.1/starwars.ods")
get_info("starwars.xlsx") get_info("starwars.ods") get_info("https://github.com/ropensci/readODS/raw/v2.1/starwars.ods") get_info("~/duran_duran_rio.mp3") get_ext("clipboard") ## "clipboard" get_ext("https://github.com/ropensci/readODS/raw/v2.1/starwars.ods")
Read in a data.frame from a file. Exceptions to this rule are Rdata, RDS, and JSON input file formats, which return the originally saved object without changing its class.
import( file, format, setclass = getOption("rio.import.class", "data.frame"), which, ... )
import( file, format, setclass = getOption("rio.import.class", "data.frame"), which, ... )
file |
A character string naming a file, URL, or single-file (can be Gzip or Bzip2 compressed), .zip or .tar archive. |
format |
An optional character string code of file format, which can be used to override the format inferred from |
setclass |
An optional character vector specifying one or more classes
to set on the import. By default, the return object is always a
“data.frame”. Allowed values include “tbl_df”, “tbl”, or
“tibble” (if using tibble), “arrow”, “arrow_table” (if using arrow table; the suggested package |
which |
This argument is used to control import from multi-object files; as a rule |
... |
Additional arguments passed to the underlying import functions. For example, this can control column classes for delimited file types, or control the use of haven for Stata and SPSS or readxl for Excel (.xlsx) format. See details below. |
This function imports a data frame or matrix from a data file with the file format based on the file extension (or the manually specified format, if format
is specified).
import
supports the following file formats:
Comma-separated data (.csv), using data.table::fread()
Pipe-separated data (.psv), using data.table::fread()
Tab-separated data (.tsv), using data.table::fread()
SAS (.sas7bdat), using haven::read_sas()
SAS XPORT (.xpt), using haven::read_xpt()
SPSS (.sav), using haven::read_sav()
SPSS compressed (.zsav), using haven::read_sav()
.
Stata (.dta), using haven::read_dta()
SPSS Portable Files (.por), using haven::read_por()
.
Excel (.xls and .xlsx), using readxl::read_xlsx()
or readxl::read_xls()
. Use which
to specify a sheet number.
R syntax object (.R), using base::dget()
, see trust
below.
Saved R objects (.RData,.rda), using base::load()
for single-object .Rdata files. Use which
to specify an object name for multi-object .Rdata files. This can be any R object (not just a data frame), see trust
below.
Serialized R objects (.rds), using base::readRDS()
. This can be any R object (not just a data frame), see trust
below.
Serialized R objects (.qs), using qs::qread()
, which is
significantly faster than .rds. This can be any R
object (not just a data frame).
Epiinfo (.rec), using foreign::read.epiinfo()
Minitab (.mtp), using foreign::read.mtp()
Systat (.syd), using foreign::read.systat()
"XBASE" database files (.dbf), using foreign::read.dbf()
Weka Attribute-Relation File Format (.arff), using foreign::read.arff()
Data Interchange Format (.dif), using utils::read.DIF()
Fortran data (no recognized extension), using utils::read.fortran()
Fixed-width format data (.fwf), using a faster version of utils::read.fwf()
that requires a widths
argument and by default in rio has stringsAsFactors = FALSE
CSVY (CSV with a YAML metadata header) using data.table::fread()
.
Apache Arrow Parquet (.parquet), using nanoparquet::read_parquet()
Feather R/Python interchange format (.feather), using arrow::read_feather()
Fast storage (.fst), using fst::read.fst()
JSON (.json), using jsonlite::fromJSON()
Matlab (.mat), using rmatio::read.mat()
EViews (.wf1), using hexView::readEViews()
OpenDocument Spreadsheet (.ods, .fods), using readODS::read_ods()
or readODS::read_fods()
. Use which
to specify a sheet number.
Single-table HTML documents (.html), using xml2::read_html()
. There is no standard HTML table and we have only tested this with HTML tables exported with this package. HTML tables will only be read correctly if the HTML file can be converted to a list via xml2::as_list()
. This import feature is not robust, especially for HTML tables in the wild. Please use a proper web scraping framework, e.g. rvest
.
Shallow XML documents (.xml), using xml2::read_xml()
. The data structure will only be read correctly if the XML file can be converted to a list via xml2::as_list()
.
YAML (.yml), using yaml::yaml.load()
Clipboard import, using utils::read.table()
with row.names = FALSE
Google Sheets, as Comma-separated data (.csv)
GraphPad Prism (.pzfx) using pzfx::read_pzfx()
import
attempts to standardize the return value from the various import functions to the extent possible, thus providing a uniform data structure regardless of what import package or function is used. It achieves this by storing any optional variable-related attributes at the variable level (i.e., an attribute for mtcars$mpg
is stored in attributes(mtcars$mpg)
rather than attributes(mtcars)
). If you would prefer these attributes to be stored at the data.frame-level (i.e., in attributes(mtcars)
), see gather_attrs()
.
After importing metadata-rich file formats (e.g., from Stata or SPSS), it may be helpful to recode labelled variables to character or factor using characterize()
or factorize()
respectively.
A data frame. If setclass
is used, this data frame may have additional class attribute values, such as “tibble” or “data.table”.
For serialization formats (.R, .RDS, and .RData), please note that you should only load these files from trusted sources. It is because these formats are not necessarily for storing rectangular data and can also be used to store many things, e.g. code. Importing these files could lead to arbitary code execution. Please read the security principles by the R Project (Plummer, 2024). When importing these files via rio
, you should affirm that you trust these files, i.e. trust = TRUE
. See example below. If this affirmation is missing, the current version assumes trust
to be true for backward compatibility and a deprecation notice will be printed. In the next major release (2.0.0), you must explicitly affirm your trust when importing these files.
For compressed archives (zip and tar, where a compressed file can contain multiple files), it is possible to come to a situation where the parameter which
is used twice to indicate two different concepts. For example, it is unclear for .xlsx.zip
whether which
refers to the selection of an exact file in the archive or the selection of an exact sheet in the decompressed Excel file. In these cases, rio
assumes that which
is only used for the selection of file. After the selection of file with which
, rio
will return the first item, e.g. the first sheet.
Please note, however, .gz
and .bz2
(e.g. .xlsx.gz
) are compressed, but not archive format. In those cases, which
is used the same way as the non-compressed format, e.g. selection of sheet for Excel.
For csv and txt files with row names exported from export()
, it may be helpful to specify row.names
as the column of the table which contain row names. See example below.
Plummer, M (2024). Statement on CVE-2024-27322. https://blog.r-project.org/2024/05/10/statement-on-cve-2024-27322/
import_list()
, characterize()
, gather_attrs()
, export()
, convert()
## For demo, a temp. file path is created with the file extension .csv csv_file <- tempfile(fileext = ".csv") ## .xlsx xlsx_file <- tempfile(fileext = ".xlsx") ## create CSV to import export(iris, csv_file) ## specify `format` to override default format: see export() export(iris, xlsx_file, format = "csv") ## basic import(csv_file) ## You can certainly import your data with the file name, which is not a variable: ## import("starwars.csv"); import("mtcars.xlsx") ## Override the default format ## import(xlsx_file) # Error, it is actually not an Excel file import(xlsx_file, format = "csv") ## import CSV as a `data.table` import(csv_file, setclass = "data.table") ## import CSV as a tibble (or "tbl_df") import(csv_file, setclass = "tbl_df") ## pass arguments to underlying import function ## data.table::fread is the underlying import function and `nrows` is its argument import(csv_file, nrows = 20) ## data.table::fread has an argument `data.table` to set the class explicitely to data.table. The ## argument setclass, however, takes precedents over such undocumented features. class(import(csv_file, setclass = "tibble", data.table = TRUE)) ## the default import class can be set with options(rio.import.class = "data.table") ## options(rio.import.class = "tibble"), or options(rio.import.class = "arrow") ## Security rds_file <- tempfile(fileext = ".rds") export(iris, rds_file) ## You should only import serialized formats from trusted sources ## In this case, you can trust it because it's generated by you. import(rds_file, trust = TRUE)
## For demo, a temp. file path is created with the file extension .csv csv_file <- tempfile(fileext = ".csv") ## .xlsx xlsx_file <- tempfile(fileext = ".xlsx") ## create CSV to import export(iris, csv_file) ## specify `format` to override default format: see export() export(iris, xlsx_file, format = "csv") ## basic import(csv_file) ## You can certainly import your data with the file name, which is not a variable: ## import("starwars.csv"); import("mtcars.xlsx") ## Override the default format ## import(xlsx_file) # Error, it is actually not an Excel file import(xlsx_file, format = "csv") ## import CSV as a `data.table` import(csv_file, setclass = "data.table") ## import CSV as a tibble (or "tbl_df") import(csv_file, setclass = "tbl_df") ## pass arguments to underlying import function ## data.table::fread is the underlying import function and `nrows` is its argument import(csv_file, nrows = 20) ## data.table::fread has an argument `data.table` to set the class explicitely to data.table. The ## argument setclass, however, takes precedents over such undocumented features. class(import(csv_file, setclass = "tibble", data.table = TRUE)) ## the default import class can be set with options(rio.import.class = "data.table") ## options(rio.import.class = "tibble"), or options(rio.import.class = "arrow") ## Security rds_file <- tempfile(fileext = ".rds") export(iris, rds_file) ## You should only import serialized formats from trusted sources ## In this case, you can trust it because it's generated by you. import(rds_file, trust = TRUE)
Use import()
to import a list of data frames from a vector of file names or from a multi-object file (Excel workbook, .Rdata file, compressed directory in a zip file or tar archive, or HTML file)
import_list( file, setclass = getOption("rio.import.class", "data.frame"), which, rbind = FALSE, rbind_label = "_file", rbind_fill = TRUE, ... )
import_list( file, setclass = getOption("rio.import.class", "data.frame"), which, rbind = FALSE, rbind_label = "_file", rbind_fill = TRUE, ... )
file |
A character string containing a single file name for a multi-object file (e.g., Excel workbook, zip file, tar archive, or HTML file), or a vector of file paths for multiple files to be imported. |
setclass |
An optional character vector specifying one or more classes
to set on the import. By default, the return object is always a
“data.frame”. Allowed values include “tbl_df”, “tbl”, or
“tibble” (if using tibble), “arrow”, “arrow_table” (if using arrow table; the suggested package |
which |
If |
rbind |
A logical indicating whether to pass the import list of data frames through |
rbind_label |
If |
rbind_fill |
If |
... |
Additional arguments passed to |
When file is a vector of file paths and any files are missing, those files are ignored (with warnings) and this function will not raise any error. For compressed files, the file name must also contain information about the file format of all compressed files, e.g. files.csv.zip
for this function to work.
If rbind=FALSE
(the default), a list of a data frames. Otherwise, that list is passed to data.table::rbindlist()
with fill = TRUE
and returns a data frame object of class set by the setclass
argument; if this operation fails, the list is returned.
For serialization formats (.R, .RDS, and .RData), please note that you should only load these files from trusted sources. It is because these formats are not necessarily for storing rectangular data and can also be used to store many things, e.g. code. Importing these files could lead to arbitary code execution. Please read the security principles by the R Project (Plummer, 2024). When importing these files via rio
, you should affirm that you trust these files, i.e. trust = TRUE
. See example below. If this affirmation is missing, the current version assumes trust
to be true for backward compatibility and a deprecation notice will be printed. In the next major release (2.0.0), you must explicitly affirm your trust when importing these files.
For compressed archives (zip and tar, where a compressed file can contain multiple files), it is possible to come to a situation where the parameter which
is used twice to indicate two different concepts. For example, it is unclear for .xlsx.zip
whether which
refers to the selection of an exact file in the archive or the selection of an exact sheet in the decompressed Excel file. In these cases, rio
assumes that which
is only used for the selection of file. After the selection of file with which
, rio
will return the first item, e.g. the first sheet.
Please note, however, .gz
and .bz2
(e.g. .xlsx.gz
) are compressed, but not archive format. In those cases, which
is used the same way as the non-compressed format, e.g. selection of sheet for Excel.
Plummer, M (2024). Statement on CVE-2024-27322. https://blog.r-project.org/2024/05/10/statement-on-cve-2024-27322/
import()
, export_list()
, export()
## For demo, a temp. file path is created with the file extension .xlsx xlsx_file <- tempfile(fileext = ".xlsx") export( list( mtcars1 = mtcars[1:10, ], mtcars2 = mtcars[11:20, ], mtcars3 = mtcars[21:32, ] ), xlsx_file ) # import a single file from multi-object workbook import(xlsx_file, sheet = "mtcars1") # import all worksheets, the return value is a list import_list(xlsx_file) # import and rbind all worksheets, the return value is a data frame import_list(xlsx_file, rbind = TRUE)
## For demo, a temp. file path is created with the file extension .xlsx xlsx_file <- tempfile(fileext = ".xlsx") export( list( mtcars1 = mtcars[1:10, ], mtcars2 = mtcars[11:20, ], mtcars3 = mtcars[21:32, ] ), xlsx_file ) # import a single file from multi-object workbook import(xlsx_file, sheet = "mtcars1") # import all worksheets, the return value is a list import_list(xlsx_file) # import and rbind all worksheets, the return value is a data frame import_list(xlsx_file, rbind = TRUE)
Not all suggested packages are installed by default. These packages are not installed or loaded by default in order to create a slimmer and faster package build, install, and load. Use show_unsupported_formats()
to check all unsupported formats. install_formats()
installs all missing ‘Suggests’ dependencies for rio that expand its support to the full range of support import and export formats.
install_formats(...) show_unsupported_formats()
install_formats(...) show_unsupported_formats()
... |
Additional arguments passed to |
For show_unsupported_formats()
, if there is any missing unsupported formats, it return TRUE invisibly; otherwise FALSE. For install_formats()
it returns TRUE invisibly if the installation is succuessful; otherwise errors.
if (interactive()) { install_formats() }
if (interactive()) { install_formats() }
The aim of rio is to make data file input and output as easy as possible. export()
and import()
serve as a Swiss-army knife for painless data I/O for data from almost any file format by inferring the data structure from the file extension, natively reading web-based data sources, setting reasonable defaults for import and export, and relying on efficient data import and export packages. An additional convenience function, convert()
, provides a simple method for converting between file types.
Note that some of rio's functionality is provided by ‘Suggests’ dependendencies, meaning they are not installed by default. Use install_formats()
to make sure these packages are available for use.
Maintainer: Chung-hong Chan [email protected] (ORCID)
Authors:
Jason Becker [email protected]
David Schoch [email protected] (ORCID)
Thomas J. Leeper [email protected] (ORCID)
Other contributors:
Geoffrey CH Chan [email protected] [contributor]
Christopher Gandrud [contributor]
Andrew MacDonald [contributor]
Ista Zahn [contributor]
Stanislaus Stadlmann [contributor]
Ruaridh Williamson [email protected] [contributor]
Patrick Kennedy [contributor]
Ryan Price [email protected] [contributor]
Trevor L Davis [email protected] [contributor]
Nathan Day [email protected] [contributor]
Bill Denney [email protected] (ORCID) [contributor]
Alex Bokov [email protected] (ORCID) [contributor]
Hugo Gruson (ORCID) [contributor]
datamods provides Shiny modules for importing data via rio
.
GREA provides an RStudio add-in to import data using rio.
import()
, import_list()
, export()
, export_list()
, convert()
, install_formats()
# export library("datasets") export(mtcars, csv_file <- tempfile(fileext = ".csv")) # comma-separated values export(mtcars, rds_file <- tempfile(fileext = ".rds")) # R serialized export(mtcars, sav_file <- tempfile(fileext = ".sav")) # SPSS # import x <- import(csv_file) y <- import(rds_file) z <- import(sav_file) # convert sav (SPSS) to dta (Stata) convert(sav_file, dta_file <- tempfile(fileext = ".dta")) # cleanup unlink(c(csv_file, rds_file, sav_file, dta_file))
# export library("datasets") export(mtcars, csv_file <- tempfile(fileext = ".csv")) # comma-separated values export(mtcars, rds_file <- tempfile(fileext = ".rds")) # R serialized export(mtcars, sav_file <- tempfile(fileext = ".sav")) # SPSS # import x <- import(csv_file) y <- import(rds_file) z <- import(sav_file) # convert sav (SPSS) to dta (Stata) convert(sav_file, dta_file <- tempfile(fileext = ".dta")) # cleanup unlink(c(csv_file, rds_file, sav_file, dta_file))