The core advantage of rio is that it makes assumptions that the user is probably willing to make. Eight of these are important:
rio uses the file extension of a file name to
determine what kind of file it is. This is the same logic used by
Windows OS, for example, in determining what application is associated
with a given file type. By removing the need to manually match a file
type (which a beginner may not recognize) to a particular import or
export function, rio allows almost all common data
formats to be read with the same function. And if a file extension is
incorrect, users can force a particular import method by specifying the
format
argument.
rio uses data.table::fread()
for
text-delimited files to automatically determine the file format
regardless of the extension. So, a CSV that is actually tab-separated
will still be correctly imported. It’s also crazy fast.
rio, wherever possible, does not import character strings as factors.
rio supports web-based imports natively, including from SSL (HTTPS) URLs, from shortened URLs, from URLs that lack proper extensions, and from (public) Google Documents Spreadsheets.
rio imports from from single-file .zip and .tar archives automatically, without the need to explicitly decompress them. Export to compressed directories is also supported.
rio wraps a variety of faster, more stream-lined I/O packages than those provided by base R or the foreign package. It uses data.table for delimited formats, haven for SAS, Stata, and SPSS files, smarter and faster fixed-width file import and export routines, and readxl and writexl for reading and writing Excel workbooks.
rio stores metadata from rich file formats (SPSS, Stata, etc.) in variable-level attributes in a consistent form regardless of file type or underlying import function. These attributes are identified as:
label
: a description of variablelabels
: a vector mapping numeric values to character
strings those values representformat
: a character string describing the variable
storage type in the original fileThe gather_attrs()
function makes it easy to move
variable-level attributes to the data frame level (and
spread_attrs()
reverses that gathering process). These can
be useful, especially, during file conversion to more easily modify
attributes that are handled differently across file formats. As an
example, the following idiom can be used to trim SPSS value labels to
the 32-character maximum allowed by Stata:
dat <- gather_attrs(rio::import("data.sav"))
attr(dat, "labels") <- lapply(attributes(dat)$labels, function(x) {
if (!is.null(x)) {
names(x) <- substring(names(x), 1, 32)
}
x
})
export(spread_attrs(dat), "data.dta")
In addition, two functions (added in v0.5.5) provide easy ways to
create character and factor variables from these “labels” attributes.
characterize()
converts a single variable or all variables
in a data frame that have “labels” attributes into character vectors
based on the mapping of values to value labels. factorize()
does the same but returns factor variables. This can be especially
helpful for converting these rich file formats into open formats (e.g.,
export(characterize(import("file.dta")), "file.csv")
.
rio imports and exports files based on an
internal S3 class infrastructure. This means that other packages can
contain extensions to rio by registering S3 methods.
These methods should take the form .import.rio_X()
and
.export.rio_X()
, where X
is the file extension
of a file type. An example is provided in the rio.db package.