Testing PySpark#
In order to run PySpark tests, you should build Spark itself first via Maven or SBT. For example,
build/mvn -DskipTests clean package
build/sbt -Phive clean package
After that, the PySpark test cases can be run via using python/run-tests
. For example,
python/run-tests --python-executable=python3
Note that you may set OBJC_DISABLE_INITIALIZE_FORK_SAFETY
environment variable to YES
if you are running tests on Mac OS.
Please see the guidance on how to Building Spark, run tests for a module, or individual tests.
Running Individual PySpark Tests#
You can run a specific test via using python/run-tests
, for example, as below:
python/run-tests --testnames pyspark.sql.tests.test_arrow
Please refer to Testing PySpark for more details.
Running Tests using GitHub Actions#
You can run the full PySpark tests by using GitHub Actions in your own forked GitHub repository with a few clicks. Please refer to Running tests in your forked repository using GitHub Actions for more details.
Running Tests for Spark Connect#
Running Tests for Python Client#
In order to test the changes in Protobuf definitions, for example, at
spark/connector/connect/common/src/main/protobuf/spark/connect,
you should regenerate Python Protobuf client first by running dev/connect-gen-protos.sh
.
Running PySpark Shell with Python Client#
For Apache Spark you locally built:
bin/pyspark --remote "local[*]"
For the Apache Spark release:
bin/pyspark --remote "local[*]" --packages org.apache.spark:spark-connect_2.13:$SPARK_VERSION