R: Get the existing SparkSession or initialize a new...

sparkR.session {SparkR}

R Documentation

Get the existing SparkSession or initialize a new SparkSession.

Description

SparkSession is the entry point into SparkR. sparkR.session gets the existing SparkSession or initializes a new SparkSession. Additional Spark properties can be set in ..., and these named parameters take priority over values in master, appName, named lists of sparkConfig.

Usage

sparkR.session(master = "", appName = "SparkR",
  sparkHome = Sys.getenv("SPARK_HOME"), sparkConfig = list(),
  sparkJars = "", sparkPackages = "", enableHiveSupport = TRUE, ...)

Arguments

`master`	the Spark master URL.
`appName`	application name to register with cluster manager.
`sparkHome`	Spark Home directory.
`sparkConfig`	named list of Spark configuration to set on worker nodes.
`sparkJars`	character vector of jar files to pass to the worker nodes.
`sparkPackages`	character vector of package coordinates
`enableHiveSupport`	enable support for Hive, fallback if not built with Hive support; once set, this cannot be turned off on an existing session
`...`	named Spark properties passed to the method.

Details

For details on how to initialize and use SparkR, refer to SparkR programming guide at http://spark.apache.org/docs/latest/sparkr.html#starting-up-sparksession.

Note

sparkR.session since 2.0.0

Examples

## Not run: 
##D sparkR.session()
##D df <- read.json(path)
##D 
##D sparkR.session("local[2]", "SparkR", "/home/spark")
##D sparkR.session("yarn-client", "SparkR", "/home/spark",
##D                list(spark.executor.memory="4g"),
##D                c("one.jar", "two.jar", "three.jar"),
##D                c("com.databricks:spark-avro_2.10:2.0.1"))
##D sparkR.session(spark.master = "yarn-client", spark.executor.memory = "4g")
## End(Not run)

[Package SparkR version 2.0.1 Index]