pyspark.sql.Catalog

class pyspark.sql.Catalog(sparkSession)[source]

User-facing catalog API, accessible through SparkSession.catalog.

This is a thin wrapper around its Scala implementation org.apache.spark.sql.catalog.Catalog.

Methods

cacheTable(tableName)

Caches the specified table in-memory.

clearCache()

Removes all cached tables from the in-memory cache.

createExternalTable(tableName[, path, …])

Creates a table based on the dataset in a data source.

createTable(tableName[, path, source, …])

Creates a table based on the dataset in a data source.

currentDatabase()

Returns the current default database in this session.

dropGlobalTempView(viewName)

Drops the global temporary view with the given view name in the catalog.

dropTempView(viewName)

Drops the local temporary view with the given view name in the catalog.

isCached(tableName)

Returns true if the table is currently cached in-memory.

listColumns(tableName[, dbName])

Returns a list of columns for the given table/view in the specified database.

listDatabases()

Returns a list of databases available across all sessions.

listFunctions([dbName])

Returns a list of functions registered in the specified database.

listTables([dbName])

Returns a list of tables/views in the specified database.

recoverPartitions(tableName)

Recovers all the partitions of the given table and update the catalog.

refreshByPath(path)

Invalidates and refreshes all the cached data (and the associated metadata) for any DataFrame that contains the given data source path.

refreshTable(tableName)

Invalidates and refreshes all the cached data and metadata of the given table.

registerFunction(name, f[, returnType])

An alias for spark.udf.register().

setCurrentDatabase(dbName)

Sets the current default database in this session.

uncacheTable(tableName)

Removes the specified table from the in-memory cache.