Databricks Certified Associate Developer for Apache Spark — Question 209
A data engineer is working on num_df DataFrame:
num_df = spark.range(5).toDF(“num”)
The engineer is using the Python UDF:
def cubefunc(val):
return val ** 3
Which code fragment registers and uses this UDF as a Spark SQL function to work with the DataFrame num_df?
Answer options
- A. spark.udf.register(“cubeudf”, cubefunc, IntegerType()) num_df.selectExpr(“cubeudf(num)”)
- B. cubeudf = udf (cubefunc) num_df.select(cubeudf(col(“num”)))
- C. spark.udf.register(“cubeudf”, cubefunc, DoubleType()) num_df.selectExpr(“cubeudf(num)”)
- D. cubeudf = udf(cubefunc) num_df.selectExpr(cubeudf(num)”)
Correct answer: C
Explanation
The correct answer is C because it registers the UDF with the correct return type of DoubleType(), which is suitable for handling the cubed values. Option A incorrectly uses IntegerType(), which may not accommodate all possible cube results. Option B does not register the UDF for use in SQL expressions, and Option D has a syntax error and does not properly register the UDF.