Databricks Certified Associate Developer for Apache Spark — Question 64
The code block shown below contains an error. The code block is intended to create and register a SQL UDF named “ASSESS_PERFORMANCE” using the Scala function assessPerformance() and apply it to column customerSatisfaction in the table stores. Identify the error.
Code block:
spark.udf.register(“ASSESS_PERFORMANCE”, assessPerforance)
spark.sql(“SELECT customerSatisfaction, assessPerformance(customerSatisfaction) AS result FROM stores”)
Answer options
- A. The customerSatisfaction column cannot be called twice inside the SQL statement.
- B. Registered UDFs cannot be applied inside of a SQL statement.
- C. The order of the arguments to spark.udf.register() should be reversed.
- D. The wrong SQL function is used to compute column result - it should be ASSESS_PERFORMANCE instead of assessPerformance.
- E. There is no sql() operation - the DataFrame API must be used to apply the UDF assessPerformance().
Correct answer: D
Explanation
The correct answer is D because the registered UDF must match the name used in the SQL statement, which is case-sensitive. The others are incorrect as the same column can be referenced multiple times, registered UDFs can indeed be used in SQL statements, the argument order in spark.udf.register() is correct, and the sql() operation is valid.