org.apache.spark.sql.SQLContext
Uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be evaluated by matching hash keys.
Uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be evaluated by matching hash keys.
This strategy applies a simple optimization based on the estimates of the physical sizes of the two join sides. When planning a joins.BroadcastHashJoin, if one side has an estimated physical size smaller than the user-settable threshold org.apache.spark.sql.SQLConf.AUTO_BROADCASTJOIN_THRESHOLD, the planner would mark it as the build relation and mark the other relation as the stream side. The build table will be broadcasted to all of the executors involved in the join, as a org.apache.spark.broadcast.Broadcast object. If both estimates exceed the threshold, they will instead be used to decide the build side in a joins.ShuffledHashJoin.
Used to build table scan operators where complex projection and filtering are done using separate physical operators.
Used to build table scan operators where complex projection and filtering are done using separate physical operators. This function returns the given scan operator with Project and Filter nodes added only when needed. For example, a Project operator is only used when the final desired output requires complex expressions to be evaluated or when columns can be further eliminated out after filtering has been done.
The prunePushedDownFilters
parameter is used to remove those filters that can be optimized
away by the filter pushdown optimization.
The required attributes for both filtering and expression evaluation are passed to the
provided scanBuilder
function so that it can avoid unnecessary column materialization.