A data engineer is working on num_df DataFrame:
num_df = spark.range(5).toDF(“num”)

Question

A data engineer is working on num_df DataFrame: num_df = spark.range(5).toDF(“num”) The engineer is using the Python UDF: def cubefunc(val): return val ** 3 Which code fragment registers and uses this UDF as a Spark SQL function to work with the DataFrame num_df?

Accepted Answer

Correct answer: C. C. spark.udf.register(“cubeudf”, cubefunc, DoubleType())
num_df.selectExpr(“cubeudf(num)”) — The correct answer is C because it registers the UDF with the correct return type of DoubleType(), which is suitable for handling the cubed values. Option A incorrectly uses IntegerType(), which may not accommodate all possible cube results. Option B does not register the UDF for use in SQL expressions, and Option D has a syntax error and does not properly register the UDF.

Databricks Certified Associate Developer for Apache Spark — Question 209

Answer options

Correct answer: C

Explanation