Databricks Certified Associate Developer for Apache Spark — Question 149
Which of the following code blocks creates a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance() and applies it to Column customerSatisfaction in DataFrame storesDF?
Answer options
- A. assessPerformanceUDF = udf(assessPerformance, IntegerType) storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
- B. assessPerformanceUDF = udf(assessPerformance, IntegerType()) storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
- C. assessPerformanceUDF - udf(assessPerformance) storesDF.withColumn("result", assessPerformance(col(“customerSatisfaction")))
- D. assessPerformanceUDF = udf(assessPerformance) storesDF.withColumn("result", assessPerformanceUDF(col("customerSatisfaction")))
- E. assessPerformanceUDF = udf(assessPerformance, IntegerType()) storesDF.withColumn("result", assessPerformance(col("customerSatisfaction")))
Correct answer: B
Explanation
Option B is correct because it properly defines the UDF with the correct syntax for IntegerType and applies it to the specified DataFrame column. Option A is incorrect due to the missing parentheses for IntegerType, while Option C uses an incorrect assignment operator. Options D and E do not apply the UDF correctly to the DataFrame column in terms of returning the expected integer type from the function.