|
|||||||||
PREV NEXT | FRAMES NO FRAMES |
R
, may be different from the element type being added, T
.Accumulable
shared variable of the given type, to which tasks
can "add" values with add
.
Accumulable
shared variable of the given type, to which tasks
can "add" values with add
.
Accumulable
shared variable, to which tasks can add values
with +=
.
Accumulable
shared variable, with a name for display in the
Spark UI.
Accumulable
modified during a task or stage.Accumulable
where the result type being accumulated is the same
as the types of elements being merged, i.e.Accumulator
integer variable, which tasks can "add" values
to using the add
method.
Accumulator
integer variable, which tasks can "add" values
to using the add
method.
Accumulator
double variable, which tasks can "add" values
to using the add
method.
Accumulator
double variable, which tasks can "add" values
to using the add
method.
Accumulator
variable of a given type, which tasks can "add"
values to using the add
method.
Accumulator
variable of a given type, which tasks can "add"
values to using the add
method.
Accumulator
variable of a given type, which tasks can "add"
values to using the +=
method.
Accumulator
variable of a given type, with a name for display
in the Spark UI.
AccumulableParam
where the only data type you can add
in is the same type as the accumulated value.StreamingListener
object for
receiving system events related to streaming.
StreamingListener
object for
receiving system events related to streaming.
DataFrame
without groups.
DataFrame
without groups.
DataFrame
without groups.
DataFrame
without groups.
DataFrame
without groups.
messages
that have the same ids using reduceFunc
, returning a
VertexRDD co-indexed with this
.
true
iff both left
or right
evaluate to true
.1.0
(bias) appended to the input vector.
VertexRDD
(one that is not set up for efficient joins with an
EdgeRDD
) from an RDD of vertex-attribute pairs.
VertexRDD
from an RDD of vertex-attribute pairs.
VertexRDD
from an RDD of vertex-attribute pairs.
Column
.
ArrayType
object with the given element type.
MapType
object with the given key type and value type.
StructField
of the given name.
StructType
containing StructField
s of the given names, preserving the
original order of fields.
BlockManagerId
for the given configuration.
StructField
of type array.
DataFrame
with an alias set.
DataFrame
with an alias set.
AttributeType$.Numeric
, AttributeType$.Nominal
,
and AttributeType$.Binary
.awaitTerminationOrTimeout(Long)
.
awaitTerminationOrTimeout(Long)
.
StructField
of type binary.
numBins
to 0.
Array[Byte]
values.Param[Boolean
] for Java.Boolean
values.GradientBoostedTrees
.Broadcast
object for reading it in distributed functions.
Broadcast
object for reading it in distributed functions.
Bucketizer
maps a column of continuous features to a column of feature buckets.Metadata
instance.
RDD[Row]
containing all rows within
this relation.
RDD[Row]
containing all rows within
this relation.
RDD[Row]
containing all rows within
this relation.
Byte
values.MEMORY_ONLY
.
this
and b is in other
.
this
and b is in other
.
1 / observed.size
.
addFile
so that they do not get downloaded to
any new nodes.
addFile
so that they do not get downloaded to
any new nodes.
addJar
so that they do not get downloaded to
any new nodes.
addJar
so that they do not get downloaded to
any new nodes.
predict
will output raw prediction scores.
predict
will output raw prediction scores.
OutputWriter
.
numPartitions
partitions.
numPartitions
partitions.
numPartitions
partitions.
numPartitions
partitions.
numPartitions
partitions.
numPartitions
partitions.
numPartitions
partitions.
DataFrame
that has exactly numPartitions
partitions.
this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
this
or other
, return a resulting RDD that contains a tuple with the
list of values for that key in this
as well as other
.
this
or other1
or other2
, return a resulting RDD that contains a
tuple with the list of values for that key in this
, other1
and other2
.
this
or other1
or other2
or other3
,
return a resulting RDD that contains a tuple with the list of values
for that key in this
, other1
, other2
and other3
.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
Column
.
Column
based on the given column name.
f
.
Row
s in this DataFrame
.
Row
s in this DataFrame
.
collect
, which returns a future for
retrieving an array containing all of the elements in this RDD.
DataFrame
.Column
based on the given column name.
FutureAction
for actions that could trigger multiple Spark jobs.RDD[(VertexId, VD)]
equivalent output.
A^T A
.
SparkContext
that this RDD was created on.
SparkContext
that this RDD was created on.
StreamingContext
associated with this DStream
Row
object.
corr()
corr()
DataFrame
.
count
, which returns a
future for counting the number of elements in this RDD.
Row
from the given arguments.
elementType
).
elementType
) and
whether the array contains null values (containsNull
).
write().jdbc()
.
keyType
) and values
(keyType
).
keyType
), the data type of
values (keyType
), and whether values contain any null value
(valueContainsNull
).
name
), data type (dataType
) and
whether values of this field can be null values (nullable
).
fields
).
fields
).
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
s.DataFrame
from external storage systems (e.g.DataFrame
s.DataFrame
to external storage systems (e.g.StructField
of type date.
java.sql.Date
values.StructField
of type decimal.
StructField
of type decimal.
Decision tree
model for classification.Decision tree
learning algorithm
for classification.Decision tree
model for regression.Decision tree
learning algorithm
for regression.JavaSparkContext.defaultMinPartitions()
instead
DecisionTree
DecisionTree
DenseMatrix
format from the supplied values.
Matrix
format from the supplied values.
this
and other
, diff
returns only those vertices with
differing values; for values that are different, keeps the values from other
.
this
and other
, diff
returns only those vertices with
differing values; for values that are different, keeps the values from other
.
DataFrame
that contains only the unique rows from this DataFrame
.
Accumulator
double variable, which tasks can "add" values
to using the add
method.
Accumulator
double variable, which tasks can "add" values
to using the add
method.
Param[Array[Double
} for Java.Param[Double
] for Java.Double
values.DataFrame
with a column dropped.
DataFrame
with a column dropped.
DataFrame
that drops rows containing any null values.
DataFrame
that drops rows containing null values.
DataFrame
that drops rows containing any null values
in the specified columns.
DataFrame
that drops rows containing any null values
in the specified columns.
DataFrame
that drops rows containing null values
in the specified columns.
DataFrame
that drops rows containing null values
in the specified columns.
DataFrame
that drops rows containing less than minNonNulls
non-null values.
DataFrame
that drops rows containing less than minNonNulls
non-null
values in the specified columns.
DataFrame
that drops rows containing less than
minNonNulls
non-null values in the specified columns.
DataFrame
that contains only the unique rows from this DataFrame
.
DataFrame
with duplicate rows removed, considering only
the subset of columns.
DataFrame
with duplicate rows removed, considering only
the subset of columns.
EdgeRDD[ED, VD]
extends RDD[Edge[ED}
by storing the edges in columnar format on each
partition for performance.DataFrame
with no rows or columns.
entropy
during
binary classification.true
iff the attribute evaluates to a value
equal to value
.DataFrame
containing rows in this frame but not in another frame.
DataFrame
where each row has been expanded to zero or more
rows by the provided function.
DataFrame
where a single column has been expanded to zero
or more rows by the provided function.
RandomRDDs.exponentialRDD(org.apache.spark.SparkContext, double, long, int, long)
.
RandomRDDs.exponentialJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long)
with the default seed.
RandomRDDs.exponentialJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long)
with the default number of partitions and the default seed.
RandomRDDs.exponentialVectorRDD(org.apache.spark.SparkContext, double, long, int, int, long)
.
RandomRDDs.exponentialJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long)
with the default seed.
RandomRDDs.exponentialJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long)
with the default number of partitions
and the default seed.
i.i.d.
samples from the exponential distribution with
the input mean.
i.i.d.
samples drawn from the
exponential distribution with the input mean.
extractParamMap
with no extra values.
DenseMatrix
format.
Matrix
format.
DataFrame
that replaces null values in numeric columns with value
.
DataFrame
that replaces null values in string columns with value
.
DataFrame
that replaces null values in specified numeric columns.
DataFrame
that replaces null values in specified
numeric columns.
DataFrame
that replaces null values in specified string columns.
DataFrame
that replaces null values in
specified string columns.
DataFrame
that replaces null values.
DataFrame
that replaces null values.
lower
to upper
.
PCAModel
that contains the principal components of the input vectors.
fit()
DataFrame
,
and then flattening the results.
Param[Float
] for Java.Float
values.f
to all rows.
foreachRDD
.
foreachRDD
.
f
to all the active elements of dense and sparse matrix.
f
to all the active elements of dense and sparse vector.
foreach
action, which
applies a function f to all the elements of this RDD.
DataFrame
.
foreachPartition
action, which
applies a function f to each partition of this RDD.
0.3
, numPartitions: same
as the input data}.
DataType.fromJson()
SparseMatrix
from Coordinate List (COO) format.
DStream
to a Java-friendly
JavaDStream
.
VertexRDD
containing all vertices referred to in edges
.
InputDStream
to a Java-friendly
JavaInputDStream
.
InputDStream
of pairs to a
Java-friendly JavaPairInputDStream
.
AttributeType
object from its name.
ReceiverInputDStream
to a Java-friendly
JavaReceiverInputDStream
.
ReceiverInputDStream
to a Java-friendly
JavaReceiverInputDStream
.
StructField
instance.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
RandomRDDs.gammaRDD(org.apache.spark.SparkContext, double, double, long, int, long)
.
RandomRDDs.gammaJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long)
with the default seed.
RandomRDDs.gammaJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long)
with the default number of partitions and the default seed.
RandomRDDs.gammaVectorRDD(org.apache.spark.SparkContext, double, double, long, int, int, long)
.
RandomRDDs.gammaJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long)
with the default seed.
RandomRDDs.gammaJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long)
with the default number of partitions and the default seed.
i.i.d.
samples from the gamma distribution with the input
shape and scale.
i.i.d.
samples drawn from the
gamma distribution with the input shape and scale.
Gradient-Boosted Trees (GBTs)
model for classification.Gradient-Boosted Trees (GBTs)
learning algorithm for classification.Gradient-Boosted Trees (GBTs)
learning algorithm for regression.SparkContext.addFile()
.
getDocConcentration
getTopicConcentration
StructType
.
ordinal
out of an array,
or gets a value by key key
in a MapType
.
Map
.
null
if the job info could not be found or was garbage collected.
None
if the job info could not be found or was garbage collected.
List
.
numValues
or from values
.
getOrCreate
without JavaStreamingContextFactor.
getOrCreate
without JavaStreamingContextFactor.
getOrCreate
without JavaStreamingContextFactor.
SparkContext.addFile()
.
null
if the stage info could not be found or was
garbage collected.
None
if the stage info could not be found or was
garbage collected.
Row
object.
Gini impurity
during binary classification.Stochastic Gradient Boosting
for regression and binary classification.Graph
to support computation on graphs.Graph
s from files.Graph
.GraphOps
member from a graph.
true
iff the attribute evaluates to a value
greater than value
.true
iff the attribute evaluates to a value
greater than or equal to value
.rows
by cols
grid graph with each vertex connected to its
row+1 and col+1 neighbors.
DataFrame
using the specified columns, so we can run aggregation on them.
DataFrame
using the specified columns, so we can run aggregation on them.
DataFrame
using the specified columns, so we can run aggregation on them.
DataFrame
using the specified columns, so we can run aggregation on them.
groupByKey
to each RDD.
groupByKey
to each RDD.
groupByKey
on each RDD of this
DStream.
groupByKey
to each RDD.
groupByKey
to each RDD.
groupByKey
on each RDD.
groupByKey
over a sliding window.
groupByKey
over a sliding window.
groupByKey
over a sliding window on this
DStream.
groupByKey
over a sliding window on this
DStream.
groupByKey
over a sliding window.
groupByKey
over a sliding window.
groupByKey
over a sliding window on this
DStream.
groupByKey
over a sliding window on this
DStream.
DataFrame
, created by DataFrame.groupBy
.BaseRelation
that provides much of the common code required for formats that store their
data to an HDFS compatible filesystem.org.apache.hadoop.mapred
).Partitioner
that implements hash-based partitioning using
Java's Object.hashCode
.OffsetRange
s.Model
has a corresponding parent.
n
rows.
BroadcastFactory
implementation that uses a
HTTP server as the broadcast mechanism.sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
true
iff the attribute evaluates to one of the values in the array.IndexedRowMatrix
.PartitionStrategy
.
inRange()
which uses inclusive be default: [lowerBound, upperBound]
write().mode(SaveMode.Append|SaveMode.Overwrite).saveAsTable(tableName)
.
write().mode(SaveMode.Append).saveAsTable(tableName)
.
DataFrame
to the specified table.
write().jdbc()
.
Accumulator
integer variable, which tasks can "add" values
to using the add
method.
Accumulator
integer variable, which tasks can "add" values
to using the add
method.
Int
values.DataFrame
containing rows only in both this frame and another frame.
Param[Int
] for Java.InformationGainStats
object to
denote that current split doesn't satisfies minimum info gain or
minimum number of instances per node.
collect
and take
methods can be run locally
(without any Spark executors).
NominalAttribute
and BinaryAttribute
.
true
iff the attribute evaluates to a non-null value.true
iff the attribute evaluates to null.NumericAttribute
and BinaryAttribute
.
spark.*.port
or spark.port.*
.
categoryMaps
DStream
, the basic
abstraction in Spark Streaming that represents a continuous stream of data.InputDStream
.reduceByKey
and join
.InputDStream
of
key-value pairs.ReceiverInputDStream
, the
abstract class for defining any input stream that receives data over the network.Params
.DataFrame
as a JavaRDD
of Row
s.
ReceiverInputDStream
, the
abstract class for defining any input stream that receives data over the network.SparkContext
that returns
JavaRDD
s and works with Java collections instead of Scala ones.StreamingContext
which is the main
entry point for Spark Streaming functionality.topicDistributions
DataFrame
representing the database table accessible via JDBC URL
url named table and connection properties.
DataFrame
representing the database table accessible via JDBC URL
url named table.
DataFrame
representing the database table accessible via JDBC URL
url named table using connection properties.
DataFrame
to a external database table via JDBC.
read().jdbc()
.
read().jdbc()
.
read().jdbc()
.
DataFrame
.this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
DataFrame
.
DataFrame
using the given column.
DataFrame
, using the given join expression.
DataFrame
, using the given join expression.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
this
DStream and other
DStream.
DataFrame
.
JavaRDD[String]
storing JSON objects (one object per record) and
returns the result as a DataFrame
.
RDD[String]
storing JSON objects (one object per record) and
returns the result as a DataFrame
.
DataFrame
in JSON format at the specified path.
read().json()
.
read().json()
.
read().json()
.
read().json()
.
read().json()
.
read().json()
.
read().json()
.
read().json()
.
read().json()
.
f
.
f
.
Kryo serialization library
.offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
offset
rows before the current row, and
null
if there is less than offset
rows before the current row.
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row.
offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
offset
rows after the current row, and
null
if there is less than offset
rows after the current row.
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
true
iff the attribute evaluates to a value
less than value
.true
iff the attribute evaluates to a value
less than or equal to value
.DataFrame
by taking the first n
rows.
LinearRegression
.Column
of literal value.
DataFrame
, for data sources that require a path (e.g.
DataFrame
, for data sources that don't require a path (e.g.
read().load(path)
.
read().format(source).load(path)
.
read().format(source).options(options).load()
.
read().format(source).options(options).load()
.
read().format(source).schema(schema).options(options).load()
.
read().format(source).schema(schema).options(options).load()
.
RDD.saveAsTextFile(java.lang.String)
for saving and
MLUtils.loadLabeledPoints(org.apache.spark.SparkContext, java.lang.String, int)
for loading.
RDD[LabeledPoint].saveAsTextFile
.
RDD[LabeledPoint].saveAsTextFile
with the default number of
partitions.
RDD[Vector].saveAsTextFile
.
RDD[Vector].saveAsTextFile
with the default number of partitions.
LogisticRegression
.LogisticRegressionModel
with weights and intercept for binary classification.
RandomRDDs.logNormalRDD(org.apache.spark.SparkContext, double, double, long, int, long)
.
RandomRDDs.logNormalJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long)
with the default seed.
RandomRDDs.logNormalJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, long)
with the default number of partitions and the default seed.
RandomRDDs.logNormalVectorRDD(org.apache.spark.SparkContext, double, double, long, int, int, long)
.
RandomRDDs.logNormalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long)
with the default seed.
RandomRDDs.logNormalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, double, long, int, int, long)
with the default number of partitions and
the default seed.
i.i.d.
samples from the log normal distribution with the input
mean and standard deviation
i.i.d.
samples drawn from a
log normal distribution.
Param[Long
] for Java.Long
values.key
.
key
.
CompressionCodec
.CompressionCodec
.RpcEndpointRef
which is located in the driver via its name.
StructField
of type map.
other
, but keeps the
attributes from this graph.
Matrix
."rmse"
(default), "mse"
, "r2"
, and "mae"
)
Duration
representing
a given number of milliseconds.this
and other
, minus will act as a set difference
operation returning only those unique VertexId's present in this
.
this
and other
, minus will act as a set difference
operation returning only those unique VertexId's present in this
.
Duration
representing
a given number of minutes.Transformer
produced by an Estimator
.BlockMatrix
to other
, another BlockMatrix
.
MultivariateStatisticalSummary
to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector
format in a online fashion.DataFrameNaFunctions
for working with missing data.
receiverStream
.
org.apache.hadoop.mapreduce
).SerializerInstance
.
HadoopFsRelation
, this method gets called by each task on executor side
to instantiate new OutputWriter
s.
RandomRDDs.normalRDD(org.apache.spark.SparkContext, long, int, long)
.
RandomRDDs.normalJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long)
with the default seed.
RandomRDDs.normalJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long)
with the default number of partitions and the default seed.
RandomRDDs.normalVectorRDD(org.apache.spark.SparkContext, long, int, int, long)
.
RandomRDDs.normalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long)
with the default seed.
RandomRDDs.normalJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long)
with the default number of partitions and the default seed.
i.i.d.
samples from the standard normal distribution.
i.i.d.
samples drawn from the
standard normal distribution.
true
iff child
is evaluated to false
.n
inclusive) in an ordered window
partition.
NULL
values.RankingMetrics
instance (for Java users).
DenseMatrix
consisting of ones.
DenseMatrix
consisting of ones.
OneVsRest
.GraphOps
object.
true
iff at least one of left
or right
evaluates to true
.DataFrame
sorted by the given expressions.
DataFrame
sorted by the given expressions.
DataFrame
sorted by the given expressions.
DataFrame
sorted by the given expressions.
WindowSpec
with the ordering defined.
WindowSpec
with the ordering defined.
WindowSpec
with the ordering defined.
WindowSpec
with the ordering defined.
WindowSpec
.
WindowSpec
.
WindowSpec
.
WindowSpec
.
table
RDD and merges the results using mapFunc
.
OutputWriter
is used together with HadoopFsRelation
for persisting rows to the
underlying file system.OutputWriter
s.Param.isValid
.DataFrame
.
DataFrame
.
DataFrame
in Parquet format at the specified path.
read().parquet()
.
Vector.toString
into a Vector
.
partitionStrategy
.
partitionStrategy
.
WindowSpec
with the partitioning defined.
WindowSpec
with the partitioning defined.
WindowSpec
with the partitioning defined.
WindowSpec
with the partitioning defined.
WindowSpec
.
WindowSpec
.
WindowSpec
.
WindowSpec
.
prev
) into fewer partitions, so that each partition of
this RDD computes one or more of the parent ones.partitionsRDD
already has a partitioner, use it.
2 * sqrt(numParts) - 1
bound on vertex replication.gaps
is true or tokens if gaps
is false.
PCA
that can project vectors to a low-dimensional space using PCA.Estimator
or a Transformer
.RandomRDDs.poissonRDD(org.apache.spark.SparkContext, double, long, int, long)
.
RandomRDDs.poissonJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long)
with the default seed.
RandomRDDs.poissonJavaRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, long)
with the default number of partitions and the default seed.
RandomRDDs.poissonVectorRDD(org.apache.spark.SparkContext, double, long, int, int, long)
.
RandomRDDs.poissonJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long)
with the default seed.
RandomRDDs.poissonJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, double, long, int, int, long)
with the default number of partitions and the default seed.
i.i.d.
samples from the Poisson distribution with the input
mean.
i.i.d.
samples drawn from the
Poisson distribution with the input mean.
predict()
MatrixFactorizationModel.predict
.
OutputWriterFactory
.
Metadata
.
Metadata
array.
DenseMatrix
consisting of i.i.d.
uniform random numbers.
DenseMatrix
consisting of i.i.d.
uniform random numbers.
DenseMatrix
consisting of i.i.d.
gaussian random numbers.
DenseMatrix
consisting of i.i.d.
gaussian random numbers.
Vector
of given length containing random numbers
between 0.0 and 1.0.
Random Forest
learning algorithm for classification and regression.Random Forest
model for classification.Random Forest
learning algorithm for
classification.Random Forest
model for regression.Random Forest
learning algorithm for regression.i.i.d.
samples produced by the input RandomDataGenerator.
i.i.d.
samples from some distribution.DataFrame
with the provided weights.
DataFrame
with the provided weights.
i.i.d.
samples produced by the
input RandomDataGenerator.
start
to end
(exclusive), increased by
step
every element.
start
(inclusive) to end
(inclusive).
Partitioner
that partitions sortable records by range into roughly
equal ranges.DataFrame
as an RDD
of Row
s.
InputDStream
that has to start a receiver on worker nodes to receive external data.reduceByKey
to each RDD.
reduceByKey
to each RDD.
reduceByKey
to each RDD.
reduceByKey
to each RDD.
reduceByKey
to each RDD.
reduceByKey
to each RDD.
reduceByKey
over a sliding window on this
DStream.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window on this
DStream.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
reduceByKey
over a sliding window.
gaps
is true).DataFrame
.
DataFrame
as a temporary table using the given name.
DataFrame
that has exactly numPartitions
partitions.
replacement
map with the corresponding values.
replacement
map with the corresponding values.
replacement
map.
replacement
map.
ShuffleMapTask
that completed successfully earlier, but we
lost the executor before the stage completed.VertexRDD
reflecting a reversal of all edge directions in the corresponding
EdgeRDD
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
and other
.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
this
DStream and
other
DStream.
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
using the specified columns,
so we can run aggregation on them.
DataFrame
using the specified columns,
so we can run aggregation on them.
Row
objects.start
(inclusive) to end
(inclusive).
http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf
.
run()
data
should be cached for high
performance, because this is an iterative algorithm.
run()
PowerIterationClustering.run
.
org.apache.spark.mllib.tree.GradientBoostedTrees!#run
.
Iterator[T] => U
instead of (TaskContext, Iterator[T]) => U
.
run()
and returns exactly
the same result.
org.apache.spark.mllib.tree.GradientBoostedTrees!#runWithValidation
.
DataFrame
by sampling a fraction of rows.
DataFrame
by sampling a fraction of rows, using a random seed.
write().save(path)
.
write().mode(mode).save(path)
.
write().format(source).save(path)
.
write().format(source).mode(mode).save(path)
.
write().format(source).mode(mode).options(options).save(path)
.
write().format(source).mode(mode).options(options).save(path)
.
DataFrame
at the specified path.
DataFrame
as the specified table.
OutputFormat
class
supporting the key and value types K and V in this RDD.
OutputFormat
class
supporting the key and value types K and V in this RDD.
OutputFormat
class
supporting the key and value types K and V in this RDD.
OutputFormat
class
supporting the key and value types K and V in this RDD.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
OutputFormat
(mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
this
DStream as a Hadoop file.
write().parquet()
.
write().saveAsTable(tableName)
.
write().mode(mode).saveAsTable(tableName)
.
write().format(source).saveAsTable(tableName)
.
write().mode(mode).saveAsTable(tableName)
.
write().format(source).mode(mode).options(options).saveAsTable(tableName)
.
write().format(source).mode(mode).options(options).saveAsTable(tableName)
.
DataFrame
as the specified table.
RDD.saveAsTextFile(java.lang.String)
for saving and
MLUtils.loadLabeledPoints(org.apache.spark.SparkContext, java.lang.String, int)
for loading.
sparkContext
DataFrame
.
Duration
representing
a given number of seconds.setDocConcentration()
1.0
).
setTopicConcentration()
LBFGS.setNumIterations(int)
instead
0.3
).
Short
values.DataFrame
in a tabular form.
DataFrame
in a tabular form.
FutureAction
holding the result of an action that triggers a single job.CompressionCodec
.SnappyOutputStream
which guards against write-after-close and double-close
issues.DataFrame
sorted by the specified column, all in ascending order.
DataFrame
sorted by the given expressions.
DataFrame
sorted by the specified column, all in ascending order.
DataFrame
sorted by the given expressions.
SparkContext.addFile()
.SparseMatrix
format from the supplied values.
Matrix
format.
SparseMatrix
format.
SparseMatrix
consisting of i.i.d.
gaussian random numbers.
SparseMatrix
consisting of i.i.d
.
SparseMatrix
consisting of i.i.d.
gaussian random numbers.
SparseMatrix
consisting of i.i.d
.
DataFrame
s.Column
.DataFrameStatFunctions
for working statistic functions support.
StatCounter
object that captures the mean, variance and
count of the RDD's elements in one operation.
StatCounter
object that captures the mean, variance and
count of the RDD's elements in one operation.
Strategy
StructField
of type string.
Param[Array[String
} for Java.true
iff the attribute evaluates to
a string that contains the string value
.true
iff the attribute evaluates to
a string that starts with value
.StringIndexer
.true
iff the attribute evaluates to
a string that starts with value
.String
values.StructField
of type struct.
StructField
of type struct.
StructType
object can be constructed bythis
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
that are not in other
.
this
whose keys are not in other
.
this
whose keys are not in other
.
DataFrame
.
n
rows in the DataFrame
.
take
action, which returns a
future for retrieving the first num
elements of this RDD.
StructField
of type timestamp.
java.sql.Timestamp
values.JavaRDDLike.collect()
instead
DenseMatrix
from the given SparseMatrix
.
DataFrame
with columns renamed.
DataFrame
with columns renamed.
EdgeTriplet
for convenience.
DataFrame
as a JavaRDD
of Row
s.
DataFrame
as a RDD of JSON strings.
Broadcast
implementation that uses a BitTorrent-like
protocol to do a distributed transfer of the broadcasted data to the executors.toDF()
.
SparseMatrix
from the given DenseMatrix
.
StructField
with some existing metadata.
StructField
.
(label, features)
pairs.
(label, features)
pairs.
(label, features)
pairs.
GradientBoostedTrees$.train(org.apache.spark.rdd.RDD, org.apache.spark.mllib.tree.configuration.BoostingStrategy)
DecisionTree$.trainClassifier(org.apache.spark.rdd.RDD, int, scala.collection.immutable.Map, java.lang.String, int, int)
RandomForest$.trainClassifier(org.apache.spark.rdd.RDD, org.apache.spark.mllib.tree.configuration.Strategy, int, java.lang.String, int)
DecisionTree$.trainRegressor(org.apache.spark.rdd.RDD, scala.collection.immutable.Map, java.lang.String, int, int)
RandomForest$.trainRegressor(org.apache.spark.rdd.RDD, org.apache.spark.mllib.tree.configuration.Strategy, int, java.lang.String, int)
featuresCol
, and appending new columns as specified by
parameters:
- predicted labels as predictionCol
of type Double
- raw predictions (confidences) as rawPredictionCol
of type Vector
.
featuresCol
, calling predict()
, and storing
the predictions as a new column predictionCol
.
BlockMatrix
.
JavaRDDLike.treeAggregate(U, org.apache.spark.api.java.function.Function2, org.apache.spark.api.java.function.Function2, int)
with suggested depth 2.
RDD.treeAggregate(U, scala.Function2, scala.Function2, int, scala.reflect.ClassTag)
instead.
JavaRDDLike.treeReduce(org.apache.spark.api.java.function.Function2, int)
with suggested depth 2.
RDD.treeReduce(scala.Function2, int)
instead.
RandomRDDs.uniformRDD(org.apache.spark.SparkContext, long, int, long)
.
RandomRDDs.uniformJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long)
with the default seed.
RandomRDDs.uniformJavaRDD(org.apache.spark.api.java.JavaSparkContext, long, int, long)
with the default number of partitions and the default seed.
RandomRDDs.uniformVectorRDD(org.apache.spark.SparkContext, long, int, int, long)
.
RandomRDDs.uniformJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long)
with the default seed.
RandomRDDs.uniformJavaVectorRDD(org.apache.spark.api.java.JavaSparkContext, long, int, int, long)
with the default number of partitions and the default seed.
i.i.d.
samples from the uniform distribution U(0.0, 1.0)
.
i.i.d.
samples drawn from the
uniform distribution on U(0.0, 1.0)
.
DataFrame
containing union of rows in this frame and another frame.
blocks
) and throws an exception if
any error is found.
Vector
.RDD[(VertexId, VD)]
by ensuring that there is only one entry for each vertex and by
pre-indexing the entries for fast, efficient joins.List
of values (for Java and Python).
List
of values (for Java and Python).
DataFrame
by adding a column.
DataFrame
with a column renamed.
Metadata
instance.
Map(String, Vector)
, i.e.Word2Vec
.DataFrame
out into external storage.
WriteAheadLog
.DenseMatrix
consisting of zeros.
Matrix
consisting of zeros.
|
|||||||||
PREV NEXT | FRAMES NO FRAMES |