read.jdbc {SparkR} | R Documentation |
Additional JDBC database connection properties can be set (...)
read.jdbc(sqlContext, url, tableName, partitionColumn = NULL, lowerBound = NULL, upperBound = NULL, numPartitions = 0L, predicates = list(), ...)
sqlContext |
SQLContext to use |
url |
JDBC database url of the form 'jdbc:subprotocol:subname' |
tableName |
the name of the table in the external database |
partitionColumn |
the name of a column of integral type that will be used for partitioning |
lowerBound |
the minimum value of 'partitionColumn' used to decide partition stride |
upperBound |
the maximum value of 'partitionColumn' used to decide partition stride |
numPartitions |
the number of partitions, This, along with 'lowerBound' (inclusive), 'upperBound' (exclusive), form partition strides for generated WHERE clause expressions used to split the column 'partitionColumn' evenly. This defaults to SparkContext.defaultParallelism when unset. |
predicates |
a list of conditions in the where clause; each one defines one partition |
Only one of partitionColumn or predicates should be set. Partitions of the table will be retrieved in parallel based on the 'numPartitions' or by the predicates.
Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.
SparkDataFrame
## Not run:
##D sc <- sparkR.init()
##D sqlContext <- sparkRSQL.init(sc)
##D jdbcUrl <- "jdbc:mysql://localhost:3306/databasename"
##D df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username")
##D df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0,
##D upperBound = 10000, user = "username", password = "password")
## End(Not run)