pyspark.pandas.DataFrame.map#

DataFrame.map(func)[source]#

Apply a function to a Dataframe elementwise.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

New in version 4.0.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

Note

this API executes the function once to infer the type which is potentially expensive, for instance, when the dataset is created after aggregations or sorting.

To avoid this, specify return type in func, for instance, as below:

>>> def square(x) -> np.int32:
...     return x ** 2

pandas-on-Spark uses return type hints and does not try to infer the type.

Parameters
funccallable

Python function returns a single value from a single value.

Returns
DataFrame

Transformed DataFrame.

Examples

>>> df = ps.DataFrame([[1, 2.12], [3.356, 4.567]])
>>> df
       0      1
0  1.000  2.120
1  3.356  4.567
>>> def str_len(x) -> int:
...     return len(str(x))
>>> df.map(str_len)
   0  1
0  3  4
1  5  5
>>> def power(x) -> float:
...     return x ** 2
>>> df.map(power)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489

You can omit type hints and let pandas-on-Spark infer its type.

>>> df.map(lambda x: x ** 2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489