Databricks Certified Associate Developer for Apache Spark — Question 35
The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and apply it to column customerSatisfaction in DataFrame storesDF. Identify the error.
Code block:
assessPerformanceUDF – udf(assessPerformance)
storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
Answer options
- A. The assessPerformance() operation is not properly registered as a UDF.
- B. The withColumn() operation is not appropriate here – UDFs should be applied by iterating over rows instead.
- C. UDFs can only be applied vie SQL and not through the DataFrame API.
- D. The return type of the assessPerformanceUDF() is not specified in the udf() operation.
- E. The assessPerformance() operation should be used on column customerSatisfaction rather than the assessPerformanceUDF() operation.
Correct answer: D
Explanation
The correct answer is D because when defining a UDF in PySpark, it is essential to specify the return type of the UDF. Options A, B, C, and E are incorrect as they do not accurately reflect the requirement of specifying the return type when creating a UDF.