sampleBy {SparkR}R Documentation

Returns a stratified sample without replacement

Description

Returns a stratified sample without replacement based on the fraction given on each stratum.

Usage

sampleBy(x, col, fractions, seed)

## S4 method for signature 'SparkDataFrame,character,list,numeric'
sampleBy(x, col, fractions,
  seed)

Arguments

x

A SparkDataFrame

col

column that defines strata

fractions

A named list giving sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.

seed

random seed

Value

A new SparkDataFrame that represents the stratified sample

Note

sampleBy since 1.6.0

See Also

Other stat functions: approxQuantile, approxQuantile,SparkDataFrame,character,numeric,numeric-method; corr, corr, corr, corr,Column-method, corr,SparkDataFrame-method; cov, cov, cov, cov,SparkDataFrame-method, cov,characterOrColumn-method, covar_samp, covar_samp, covar_samp,characterOrColumn,characterOrColumn-method; crosstab, crosstab,SparkDataFrame,character,character-method; freqItems, freqItems,SparkDataFrame,character-method

Examples

## Not run: 
##D df <- read.json("/path/to/file.json")
##D sample <- sampleBy(df, "key", fractions, 36)
## End(Not run)

[Package SparkR version 2.2.0 Index]