Computes the absolute value.
Computes the absolute value.
1.3.0
Computes the cosine inverse of the given column; the returned angle is in the range 0.
Computes the cosine inverse of the given column; the returned angle is in the range 0.0 through pi.
1.4.0
Computes the cosine inverse of the given value; the returned angle is in the range 0.
Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.
1.4.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Aggregate function: returns the approximate number of distinct items in a group.
Aggregate function: returns the approximate number of distinct items in a group.
1.3.0
Creates a new array column.
Creates a new array column. The input columns must all have the same data type.
1.4.0
Creates a new array column.
Creates a new array column. The input columns must all have the same data type.
1.4.0
Returns a sort expression based on ascending order of the column.
Returns a sort expression based on ascending order of the column.
// Sort by dept in ascending order, and then age in descending order. df.sort(asc("dept"), desc("age"))
1.3.0
Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.
Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.
1.4.0
Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.
Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.
1.4.0
Computes the tangent inverse of the given column.
Computes the tangent inverse of the given column.
1.4.0
Computes the tangent inverse of the given value.
Computes the tangent inverse of the given value.
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).
1.4.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group.
1.3.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group.
1.3.0
Computes bitwise NOT.
Computes bitwise NOT.
1.4.0
Call a Scala function of 10 arguments as user-defined function (UDF).
Call a Scala function of 10 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 9 arguments as user-defined function (UDF).
Call a Scala function of 9 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 8 arguments as user-defined function (UDF).
Call a Scala function of 8 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 7 arguments as user-defined function (UDF).
Call a Scala function of 7 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 6 arguments as user-defined function (UDF).
Call a Scala function of 6 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 5 arguments as user-defined function (UDF).
Call a Scala function of 5 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 4 arguments as user-defined function (UDF).
Call a Scala function of 4 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 3 arguments as user-defined function (UDF).
Call a Scala function of 3 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 2 arguments as user-defined function (UDF).
Call a Scala function of 2 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 1 arguments as user-defined function (UDF).
Call a Scala function of 1 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call a Scala function of 0 arguments as user-defined function (UDF).
Call a Scala function of 0 arguments as user-defined function (UDF). This requires you to specify the return data type.
1.3.0
Call an user-defined function.
Call an user-defined function. Example:
import org.apache.spark.sql._ val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value") val sqlContext = df.sqlContext sqlContext.udf.register("simpleUdf", (v: Int) => v * v) df.select($"id", callUdf("simpleUdf", $"value"))
1.4.0
Computes the cube-root of the given column.
Computes the cube-root of the given column.
1.4.0
Computes the cube-root of the given value.
Computes the cube-root of the given value.
1.4.0
Computes the ceiling of the given column.
Computes the ceiling of the given column.
1.4.0
Computes the ceiling of the given value.
Computes the ceiling of the given value.
1.4.0
Returns the first column that is not null.
Returns the first column that is not null.
df.select(coalesce(df("a"), df("b")))
1.3.0
Returns a Column based on the given column name.
Returns a Column based on the given column name.
1.3.0
Returns a Column based on the given column name.
Computes the cosine of the given column.
Computes the cosine of the given column.
1.4.0
Computes the cosine of the given value.
Computes the cosine of the given value.
1.4.0
Computes the hyperbolic cosine of the given column.
Computes the hyperbolic cosine of the given column.
1.4.0
Computes the hyperbolic cosine of the given value.
Computes the hyperbolic cosine of the given value.
1.4.0
Aggregate function: returns the number of items in a group.
Aggregate function: returns the number of items in a group.
1.3.0
Aggregate function: returns the number of items in a group.
Aggregate function: returns the number of items in a group.
1.3.0
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
1.3.0
Aggregate function: returns the number of distinct items in a group.
Aggregate function: returns the number of distinct items in a group.
1.3.0
Window function: returns the cumulative distribution of values within a window partition, i.
Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.
N = total number of rows in the partition cumeDist(x) = number of values before (and including) x / N
This is equivalent to the CUME_DIST function in SQL.
1.4.0
Window function: returns the rank of rows within a window partition, without any gaps.
Window function: returns the rank of rows within a window partition, without any gaps.
The difference between rank and denseRank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using denseRank and had three people tie for second place, you would say that all three were in second place and that the next person came in third.
This is equivalent to the DENSE_RANK function in SQL.
1.4.0
Returns a sort expression based on the descending order of the column.
Returns a sort expression based on the descending order of the column.
// Sort by dept in ascending order, and then age in descending order. df.sort(asc("dept"), desc("age"))
1.3.0
Computes the exponential of the given column.
Computes the exponential of the given column.
1.4.0
Computes the exponential of the given value.
Computes the exponential of the given value.
1.4.0
Creates a new row for each element in the given array or map column.
Computes the exponential of the given column.
Computes the exponential of the given column.
1.4.0
Computes the exponential of the given value minus one.
Computes the exponential of the given value minus one.
1.4.0
Aggregate function: returns the first value of a column in a group.
Aggregate function: returns the first value of a column in a group.
1.3.0
Aggregate function: returns the first value in a group.
Aggregate function: returns the first value in a group.
1.3.0
Computes the floor of the given column.
Computes the floor of the given column.
1.4.0
Computes the floor of the given value.
Computes the floor of the given value.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
Computes sqrt(a2 + b2)
without intermediate overflow or underflow.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
Window function: returns the value that is offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
1.4.0
Aggregate function: returns the last value of the column in a group.
Aggregate function: returns the last value of the column in a group.
1.3.0
Aggregate function: returns the last value in a group.
Aggregate function: returns the last value in a group.
1.3.0
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
Window function: returns the value that is offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
1.4.0
Creates a Column of literal value.
Computes the natural logarithm of the given column.
Computes the natural logarithm of the given column.
1.4.0
Computes the natural logarithm of the given value.
Computes the natural logarithm of the given value.
1.4.0
Computes the logarithm of the given value in Base 10.
Computes the logarithm of the given value in Base 10.
1.4.0
Computes the logarithm of the given value in Base 10.
Computes the logarithm of the given value in Base 10.
1.4.0
Computes the natural logarithm of the given column plus one.
Computes the natural logarithm of the given column plus one.
1.4.0
Computes the natural logarithm of the given value plus one.
Computes the natural logarithm of the given value plus one.
1.4.0
Converts a string exprsesion to lower case.
Converts a string exprsesion to lower case.
1.3.0
Aggregate function: returns the maximum value of the column in a group.
Aggregate function: returns the maximum value of the column in a group.
1.3.0
Aggregate function: returns the maximum value of the expression in a group.
Aggregate function: returns the maximum value of the expression in a group.
1.3.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group. Alias for avg.
1.4.0
Aggregate function: returns the average of the values in a group.
Aggregate function: returns the average of the values in a group. Alias for avg.
1.4.0
Aggregate function: returns the minimum value of the column in a group.
Aggregate function: returns the minimum value of the column in a group.
1.3.0
Aggregate function: returns the minimum value of the expression in a group.
Aggregate function: returns the minimum value of the expression in a group.
1.3.0
A column expression that generates monotonically increasing 64-bit integers.
A column expression that generates monotonically increasing 64-bit integers.
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs: 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
1.4.0
Unary minus, i.
Unary minus, i.e. negate the expression.
// Select the amount column and negates all values. // Scala: df.select( -df("amount") ) // Java: df.select( negate(df.col("amount")) );
1.3.0
Inversion of boolean expression, i.
Inversion of boolean expression, i.e. NOT.
// Scala: select rows that are not active (isActive === false) df.filter( !df("isActive") ) // Java: df.filter( not(df.col("isActive")) );
1.3.0
Window function: returns the ntile group id (from 1 to n
inclusive) in an ordered window
partition.
Window function: returns the ntile group id (from 1 to n
inclusive) in an ordered window
partition. Fow example, if n
is 4, the first quarter of the rows will get value 1, the second
quarter will get 2, the third quarter will get 3, and the last quarter will get 4.
This is equivalent to the NTILE function in SQL.
1.4.0
Window function: returns the relative rank (i.
Window function: returns the relative rank (i.e. percentile) of rows within a window partition.
This is computed by:
(rank of row in its partition - 1) / (number of rows in the partition - 1)
This is equivalent to the PERCENT_RANK function in SQL.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Returns the value of the first argument raised to the power of the second argument.
Returns the value of the first argument raised to the power of the second argument.
1.4.0
Generate a random column with i.
Generate a random column with i.i.d. samples from U[0.0, 1.0].
1.4.0
Generate a random column with i.
Generate a random column with i.i.d. samples from U[0.0, 1.0].
1.4.0
Generate a column with i.
Generate a column with i.i.d. samples from the standard normal distribution.
1.4.0
Generate a column with i.
Generate a column with i.i.d. samples from the standard normal distribution.
1.4.0
Window function: returns the rank of rows within a window partition.
Window function: returns the rank of rows within a window partition.
The difference between rank and denseRank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using denseRank and had three people tie for second place, you would say that all three were in second place and that the next person came in third.
This is equivalent to the RANK function in SQL.
1.4.0
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
1.4.0
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
1.4.0
Window function: returns a sequential number starting at 1 within a window partition.
Window function: returns a sequential number starting at 1 within a window partition.
This is equivalent to the ROW_NUMBER function in SQL.
1.4.0
Computes the signum of the given column.
Computes the signum of the given column.
1.4.0
Computes the signum of the given value.
Computes the signum of the given value.
1.4.0
Computes the sine of the given column.
Computes the sine of the given column.
1.4.0
Computes the sine of the given value.
Computes the sine of the given value.
1.4.0
Computes the hyperbolic sine of the given column.
Computes the hyperbolic sine of the given column.
1.4.0
Computes the hyperbolic sine of the given value.
Computes the hyperbolic sine of the given value.
1.4.0
Partition ID of the Spark task.
Partition ID of the Spark task.
Note that this is indeterministic because it depends on data partitioning and task scheduling.
1.4.0
Computes the square root of the specified float value.
Computes the square root of the specified float value.
1.3.0
Creates a new struct column that composes multiple input columns.
Creates a new struct column that composes multiple input columns.
1.4.0
Creates a new struct column.
Creates a new struct column. The input column must be a column in a DataFrame, or a derived column expression that is named (i.e. aliased).
1.4.0
Aggregate function: returns the sum of all values in the given column.
Aggregate function: returns the sum of all values in the given column.
1.3.0
Aggregate function: returns the sum of all values in the expression.
Aggregate function: returns the sum of all values in the expression.
1.3.0
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
1.3.0
Aggregate function: returns the sum of distinct values in the expression.
Aggregate function: returns the sum of distinct values in the expression.
1.3.0
Computes the tangent of the given column.
Computes the tangent of the given column.
1.4.0
Computes the tangent of the given value.
Computes the tangent of the given value.
1.4.0
Computes the hyperbolic tangent of the given column.
Computes the hyperbolic tangent of the given column.
1.4.0
Computes the hyperbolic tangent of the given value.
Computes the hyperbolic tangent of the given value.
1.4.0
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
1.4.0
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
1.4.0
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
1.4.0
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
1.4.0
Defines a user-defined function of 10 arguments as user-defined function (UDF).
Defines a user-defined function of 10 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 9 arguments as user-defined function (UDF).
Defines a user-defined function of 9 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 8 arguments as user-defined function (UDF).
Defines a user-defined function of 8 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 7 arguments as user-defined function (UDF).
Defines a user-defined function of 7 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 6 arguments as user-defined function (UDF).
Defines a user-defined function of 6 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 5 arguments as user-defined function (UDF).
Defines a user-defined function of 5 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 4 arguments as user-defined function (UDF).
Defines a user-defined function of 4 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 3 arguments as user-defined function (UDF).
Defines a user-defined function of 3 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 2 arguments as user-defined function (UDF).
Defines a user-defined function of 2 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 1 arguments as user-defined function (UDF).
Defines a user-defined function of 1 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Defines a user-defined function of 0 arguments as user-defined function (UDF).
Defines a user-defined function of 0 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.
1.3.0
Converts a string expression to upper case.
Converts a string expression to upper case.
1.3.0
Evaluates a list of conditions and returns one of multiple possible result expressions.
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer. // Scala: people.select(when(people("gender") === "male", 0) .when(people("gender") === "female", 1) .otherwise(2)) // Java: people.select(when(col("gender").equalTo("male"), 0) .when(col("gender").equalTo("female"), 1) .otherwise(2))
1.4.0
:: Experimental :: Functions available for DataFrame.
1.3.0