R: Load a SparkDataFrame

read.df {SparkR}

R Documentation

Load a SparkDataFrame

Description

Returns the dataset in a data source as a SparkDataFrame

Usage

## Default S3 method:
read.df(path = NULL, source = NULL, schema = NULL,
  na.strings = "NA", ...)

## Default S3 method:
loadDF(path = NULL, source = NULL, schema = NULL,
  ...)

Arguments

`path`	The path of files to load
`source`	The name of external data source
`schema`	The data schema defined in structType or a DDL-formatted string.
`na.strings`	Default string value for NA when source is "csv"
`...`	additional external data source specific named properties.

Details

The data source is specified by the source and a set of options(...). If source is not specified, the default data source configured by "spark.sql.sources.default" will be used.
Similar to R read.csv, when source is "csv", by default, a value of "NA" will be interpreted as NA.

Value

SparkDataFrame

Note

read.df since 1.4.0

loadDF since 1.6.0

Examples

## Not run: 
##D sparkR.session()
##D df1 <- read.df("path/to/file.json", source = "json")
##D schema <- structType(structField("name", "string"),
##D                      structField("info", "map<string,double>"))
##D df2 <- read.df(mapTypeJsonPath, "json", schema, multiLine = TRUE)
##D df3 <- loadDF("data/test_table", "parquet", mergeSchema = "true")
##D stringSchema <- "name STRING, info MAP<STRING, DOUBLE>"
##D df4 <- read.df(mapTypeJsonPath, "json", stringSchema, multiLine = TRUE)
## End(Not run)

[Package SparkR version 2.3.3 Index]