pyspark.sql.functions.sentences¶
-
pyspark.sql.functions.
sentences
(string: ColumnOrName, language: Optional[ColumnOrName] = None, country: Optional[ColumnOrName] = None) → pyspark.sql.column.Column[source]¶ Splits a string into arrays of sentences, where each sentence is an array of words. The ‘language’ and ‘country’ arguments are optional, and if omitted, the default locale is used.
New in version 3.2.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
arrays of split sentences.
Examples
>>> df = spark.createDataFrame([["This is an example sentence."]], ["string"]) >>> df.select(sentences(df.string, lit("en"), lit("US"))).show(truncate=False) +-----------------------------------+ |sentences(string, en, US) | +-----------------------------------+ |[[This, is, an, example, sentence]]| +-----------------------------------+ >>> df = spark.createDataFrame([["Hello world. How are you?"]], ["s"]) >>> df.select(sentences("s")).show(truncate=False) +---------------------------------+ |sentences(s, , ) | +---------------------------------+ |[[Hello, world], [How, are, you]]| +---------------------------------+