public class SQLContext
extends Object
implements scala.Serializable
As of Spark 2.0, this is replaced by SparkSession
. However, we are keeping the class
here for backward compatibility.
Modifier and Type | Class and Description |
---|---|
class |
SQLContext.implicits$
:: Experimental ::
(Scala-specific) Implicit methods available in Scala for converting
common Scala objects into
DataFrame s. |
Constructor and Description |
---|
SQLContext(JavaSparkContext sparkContext)
Deprecated.
Use SparkSession.builder instead. Since 2.0.0.
|
SQLContext(SparkContext sc)
Deprecated.
Use SparkSession.builder instead. Since 2.0.0.
|
Modifier and Type | Method and Description |
---|---|
Dataset<Row> |
applySchema(JavaRDD<?> rdd,
Class<?> beanClass)
Deprecated.
Use createDataFrame instead. Since 1.3.0.
|
Dataset<Row> |
applySchema(JavaRDD<Row> rowRDD,
StructType schema)
Deprecated.
Use createDataFrame instead. Since 1.3.0.
|
Dataset<Row> |
applySchema(RDD<?> rdd,
Class<?> beanClass)
Deprecated.
Use createDataFrame instead. Since 1.3.0.
|
Dataset<Row> |
applySchema(RDD<Row> rowRDD,
StructType schema)
Deprecated.
Use createDataFrame instead. Since 1.3.0.
|
Dataset<Row> |
baseRelationToDataFrame(BaseRelation baseRelation) |
void |
cacheTable(String tableName)
Caches the specified table in-memory.
|
static void |
clearActive()
Deprecated.
Use SparkSession.clearActiveSession instead. Since 2.0.0.
|
void |
clearCache()
Removes all cached tables from the in-memory cache.
|
Dataset<Row> |
createDataFrame(JavaRDD<?> rdd,
Class<?> beanClass) |
Dataset<Row> |
createDataFrame(JavaRDD<Row> rowRDD,
StructType schema) |
Dataset<Row> |
createDataFrame(java.util.List<?> data,
Class<?> beanClass) |
Dataset<Row> |
createDataFrame(java.util.List<Row> rows,
StructType schema) |
Dataset<Row> |
createDataFrame(RDD<?> rdd,
Class<?> beanClass) |
<A extends scala.Product> |
createDataFrame(RDD<A> rdd,
scala.reflect.api.TypeTags.TypeTag<A> evidence$1) |
Dataset<Row> |
createDataFrame(RDD<Row> rowRDD,
StructType schema) |
<A extends scala.Product> |
createDataFrame(scala.collection.Seq<A> data,
scala.reflect.api.TypeTags.TypeTag<A> evidence$2) |
<T> Dataset<T> |
createDataset(java.util.List<T> data,
Encoder<T> evidence$5) |
<T> Dataset<T> |
createDataset(RDD<T> data,
Encoder<T> evidence$4) |
<T> Dataset<T> |
createDataset(scala.collection.Seq<T> data,
Encoder<T> evidence$3) |
Dataset<Row> |
createExternalTable(String tableName,
String path) |
Dataset<Row> |
createExternalTable(String tableName,
String source,
java.util.Map<String,String> options) |
Dataset<Row> |
createExternalTable(String tableName,
String source,
scala.collection.immutable.Map<String,String> options) |
Dataset<Row> |
createExternalTable(String tableName,
String path,
String source) |
Dataset<Row> |
createExternalTable(String tableName,
String source,
StructType schema,
java.util.Map<String,String> options) |
Dataset<Row> |
createExternalTable(String tableName,
String source,
StructType schema,
scala.collection.immutable.Map<String,String> options) |
void |
dropTempTable(String tableName) |
Dataset<Row> |
emptyDataFrame()
Returns a
DataFrame with no rows or columns. |
ExperimentalMethods |
experimental()
:: Experimental ::
A collection of methods that are considered experimental, but can be used to hook into
the query planner for advanced functionality.
|
scala.collection.immutable.Map<String,String> |
getAllConfs()
Return all the configuration properties that have been set (i.e.
|
String |
getConf(String key)
Return the value of Spark SQL configuration property for the given key.
|
String |
getConf(String key,
String defaultValue)
Return the value of Spark SQL configuration property for the given key.
|
static SQLContext |
getOrCreate(SparkContext sparkContext)
Deprecated.
Use SparkSession.builder instead. Since 2.0.0.
|
SQLContext.implicits$ |
implicits()
Accessor for nested Scala object
|
boolean |
isCached(String tableName)
Returns true if the table is currently cached in-memory.
|
Dataset<Row> |
jdbc(String url,
String table)
Deprecated.
As of 1.4.0, replaced by
read().jdbc() . |
Dataset<Row> |
jdbc(String url,
String table,
String[] theParts)
Deprecated.
As of 1.4.0, replaced by
read().jdbc() . |
Dataset<Row> |
jdbc(String url,
String table,
String columnName,
long lowerBound,
long upperBound,
int numPartitions)
Deprecated.
As of 1.4.0, replaced by
read().jdbc() . |
Dataset<Row> |
jsonFile(String path)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonFile(String path,
double samplingRatio)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonFile(String path,
StructType schema)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonRDD(JavaRDD<String> json)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonRDD(JavaRDD<String> json,
double samplingRatio)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonRDD(JavaRDD<String> json,
StructType schema)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonRDD(RDD<String> json)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonRDD(RDD<String> json,
double samplingRatio)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
Dataset<Row> |
jsonRDD(RDD<String> json,
StructType schema)
Deprecated.
As of 1.4.0, replaced by
read().json() . |
ExecutionListenerManager |
listenerManager()
An interface to register custom
QueryExecutionListener s
that listen for execution metrics. |
Dataset<Row> |
load(String path)
Deprecated.
As of 1.4.0, replaced by
read().load(path) . |
Dataset<Row> |
load(String source,
java.util.Map<String,String> options)
Deprecated.
As of 1.4.0, replaced by
read().format(source).options(options).load() . |
Dataset<Row> |
load(String source,
scala.collection.immutable.Map<String,String> options)
Deprecated.
As of 1.4.0, replaced by
read().format(source).options(options).load() . |
Dataset<Row> |
load(String path,
String source)
Deprecated.
As of 1.4.0, replaced by
read().format(source).load(path) . |
Dataset<Row> |
load(String source,
StructType schema,
java.util.Map<String,String> options)
Deprecated.
As of 1.4.0, replaced by
read().format(source).schema(schema).options(options).load() . |
Dataset<Row> |
load(String source,
StructType schema,
scala.collection.immutable.Map<String,String> options)
Deprecated.
As of 1.4.0, replaced by
read().format(source).schema(schema).options(options).load() . |
SQLContext |
newSession()
Returns a
SQLContext as new session, with separated SQL configurations, temporary
tables, registered functions, but sharing the same SparkContext , cached data and
other things. |
Dataset<Row> |
parquetFile(scala.collection.Seq<String> paths)
Deprecated.
Use read.parquet() instead. Since 1.4.0.
|
Dataset<Row> |
parquetFile(String... paths)
Deprecated.
As of 1.4.0, replaced by
read().parquet() . |
Dataset<Row> |
range(long end) |
Dataset<Row> |
range(long start,
long end) |
Dataset<Row> |
range(long start,
long end,
long step) |
Dataset<Row> |
range(long start,
long end,
long step,
int numPartitions) |
DataFrameReader |
read() |
DataStreamReader |
readStream() |
static void |
setActive(SQLContext sqlContext)
Deprecated.
Use SparkSession.setActiveSession instead. Since 2.0.0.
|
void |
setConf(java.util.Properties props)
Set Spark SQL configuration properties.
|
void |
setConf(String key,
String value)
Set the given Spark SQL configuration property.
|
SparkContext |
sparkContext() |
SparkSession |
sparkSession() |
Dataset<Row> |
sql(String sqlText) |
StreamingQueryManager |
streams() |
Dataset<Row> |
table(String tableName) |
String[] |
tableNames() |
String[] |
tableNames(String databaseName) |
Dataset<Row> |
tables() |
Dataset<Row> |
tables(String databaseName) |
UDFRegistration |
udf()
A collection of methods for registering user-defined functions (UDF).
|
void |
uncacheTable(String tableName)
Removes the specified table from the in-memory cache.
|
public SQLContext(SparkContext sc)
public SQLContext(JavaSparkContext sparkContext)
public static SQLContext getOrCreate(SparkContext sparkContext)
This function can be used to create a singleton SQLContext object that can be shared across the JVM.
If there is an active SQLContext for current thread, it will be returned instead of the global one.
sparkContext
- (undocumented)public static void setActive(SQLContext sqlContext)
sqlContext
- (undocumented)public static void clearActive()
public Dataset<Row> parquetFile(String... paths)
read().parquet()
.DataFrame
. This function returns an empty
DataFrame
if no paths are passed in.
paths
- (undocumented)public SparkSession sparkSession()
public SparkContext sparkContext()
public SQLContext newSession()
SQLContext
as new session, with separated SQL configurations, temporary
tables, registered functions, but sharing the same SparkContext
, cached data and
other things.
public ExecutionListenerManager listenerManager()
QueryExecutionListener
s
that listen for execution metrics.public void setConf(java.util.Properties props)
props
- (undocumented)public void setConf(String key, String value)
key
- (undocumented)value
- (undocumented)public String getConf(String key)
key
- (undocumented)public String getConf(String key, String defaultValue)
defaultValue
.
key
- (undocumented)defaultValue
- (undocumented)public scala.collection.immutable.Map<String,String> getAllConfs()
public ExperimentalMethods experimental()
public Dataset<Row> emptyDataFrame()
DataFrame
with no rows or columns.
public UDFRegistration udf()
The following example registers a Scala closure as UDF:
sqlContext.udf.register("myUDF", (arg1: Int, arg2: String) => arg2 + arg1)
The following example registers a UDF in Java:
sqlContext.udf().register("myUDF",
new UDF2<Integer, String, String>() {
@Override
public String call(Integer arg1, String arg2) {
return arg2 + arg1;
}
}, DataTypes.StringType);
Or, to use Java 8 lambda syntax:
sqlContext.udf().register("myUDF",
(Integer arg1, String arg2) -> arg2 + arg1,
DataTypes.StringType);
public boolean isCached(String tableName)
tableName
- (undocumented)public void cacheTable(String tableName)
tableName
- (undocumented)public void uncacheTable(String tableName)
tableName
- (undocumented)public void clearCache()
public SQLContext.implicits$ implicits()
public <A extends scala.Product> Dataset<Row> createDataFrame(RDD<A> rdd, scala.reflect.api.TypeTags.TypeTag<A> evidence$1)
public <A extends scala.Product> Dataset<Row> createDataFrame(scala.collection.Seq<A> data, scala.reflect.api.TypeTags.TypeTag<A> evidence$2)
public Dataset<Row> baseRelationToDataFrame(BaseRelation baseRelation)
public Dataset<Row> createDataFrame(RDD<Row> rowRDD, StructType schema)
public <T> Dataset<T> createDataset(scala.collection.Seq<T> data, Encoder<T> evidence$3)
public Dataset<Row> createDataFrame(JavaRDD<Row> rowRDD, StructType schema)
public Dataset<Row> createDataFrame(java.util.List<Row> rows, StructType schema)
public DataFrameReader read()
public DataStreamReader readStream()
public Dataset<Row> createExternalTable(String tableName, String path, String source)
public Dataset<Row> createExternalTable(String tableName, String source, java.util.Map<String,String> options)
public Dataset<Row> createExternalTable(String tableName, String source, scala.collection.immutable.Map<String,String> options)
public Dataset<Row> createExternalTable(String tableName, String source, StructType schema, java.util.Map<String,String> options)
public Dataset<Row> createExternalTable(String tableName, String source, StructType schema, scala.collection.immutable.Map<String,String> options)
public void dropTempTable(String tableName)
public StreamingQueryManager streams()
public String[] tableNames()
public String[] tableNames(String databaseName)
public Dataset<Row> applySchema(RDD<Row> rowRDD, StructType schema)
public Dataset<Row> applySchema(JavaRDD<Row> rowRDD, StructType schema)
public Dataset<Row> applySchema(RDD<?> rdd, Class<?> beanClass)
public Dataset<Row> applySchema(JavaRDD<?> rdd, Class<?> beanClass)
public Dataset<Row> parquetFile(scala.collection.Seq<String> paths)
public Dataset<Row> jsonFile(String path)
read().json()
.DataFrame
.
It goes through the entire dataset once to determine the schema.
path
- (undocumented)public Dataset<Row> jsonFile(String path, StructType schema)
read().json()
.DataFrame
.
path
- (undocumented)schema
- (undocumented)public Dataset<Row> jsonFile(String path, double samplingRatio)
read().json()
.path
- (undocumented)samplingRatio
- (undocumented)public Dataset<Row> jsonRDD(RDD<String> json)
read().json()
.DataFrame
.
It goes through the entire dataset once to determine the schema.
json
- (undocumented)public Dataset<Row> jsonRDD(JavaRDD<String> json)
read().json()
.DataFrame
.
It goes through the entire dataset once to determine the schema.
json
- (undocumented)public Dataset<Row> jsonRDD(RDD<String> json, StructType schema)
read().json()
.DataFrame
.
json
- (undocumented)schema
- (undocumented)public Dataset<Row> jsonRDD(JavaRDD<String> json, StructType schema)
read().json()
.DataFrame
.
json
- (undocumented)schema
- (undocumented)public Dataset<Row> jsonRDD(RDD<String> json, double samplingRatio)
read().json()
.DataFrame
.
json
- (undocumented)samplingRatio
- (undocumented)public Dataset<Row> jsonRDD(JavaRDD<String> json, double samplingRatio)
read().json()
.DataFrame
.
json
- (undocumented)samplingRatio
- (undocumented)public Dataset<Row> load(String path)
read().load(path)
.path
- (undocumented)public Dataset<Row> load(String path, String source)
read().format(source).load(path)
.path
- (undocumented)source
- (undocumented)public Dataset<Row> load(String source, java.util.Map<String,String> options)
read().format(source).options(options).load()
.source
- (undocumented)options
- (undocumented)public Dataset<Row> load(String source, scala.collection.immutable.Map<String,String> options)
read().format(source).options(options).load()
.source
- (undocumented)options
- (undocumented)public Dataset<Row> load(String source, StructType schema, java.util.Map<String,String> options)
read().format(source).schema(schema).options(options).load()
.source
- (undocumented)schema
- (undocumented)options
- (undocumented)public Dataset<Row> load(String source, StructType schema, scala.collection.immutable.Map<String,String> options)
read().format(source).schema(schema).options(options).load()
.source
- (undocumented)schema
- (undocumented)options
- (undocumented)public Dataset<Row> jdbc(String url, String table)
read().jdbc()
.DataFrame
representing the database table accessible via JDBC URL
url named table.
url
- (undocumented)table
- (undocumented)public Dataset<Row> jdbc(String url, String table, String columnName, long lowerBound, long upperBound, int numPartitions)
read().jdbc()
.DataFrame
representing the database table accessible via JDBC URL
url named table. Partitions of the table will be retrieved in parallel based on the parameters
passed to this function.
columnName
- the name of a column of integral type that will be used for partitioning.lowerBound
- the minimum value of columnName
used to decide partition strideupperBound
- the maximum value of columnName
used to decide partition stridenumPartitions
- the number of partitions. the range minValue
-maxValue
will be split
evenly into this many partitionsurl
- (undocumented)table
- (undocumented)public Dataset<Row> jdbc(String url, String table, String[] theParts)
read().jdbc()
.DataFrame
representing the database table accessible via JDBC URL
url named table. The theParts parameter gives a list expressions
suitable for inclusion in WHERE clauses; each one defines one partition
of the DataFrame
.
url
- (undocumented)table
- (undocumented)theParts
- (undocumented)