Databricks Certified Associate Developer for Apache Spark — Question 160
The code block shown below contains an error. The code block is intended to create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Python function assessPerformance() and apply it to column customerSatistfaction in table stores. Identify the error.
Code block:
spark.udf.register("ASSESS_PERFORMANCE", assessPerformance)
spark.sql("SELECT customerSatisfaction, assessPerformance(customerSatisfaction) AS result FROM stores")
Answer options
- A. There is no sql() operation — the DataFrame API must be used to apply the UDF assessPerformance().
- B. The order of the arguments to spark.udf.register() should be reversed.
- C. The customerSatisfaction column cannot be called twice inside the SQL statement.
- D. Registered UDFs cannot be applied inside of a SQL statement.
- E. The wrong SQL function is used to compute column result — it should be ASSESS_PERFORMANCE instead of assessPerformance.
Correct answer: E
Explanation
The correct answer is E because the SQL function name in the query should match the registered UDF name 'ASSESS_PERFORMANCE'. The other options are incorrect because the use of sql() is valid, the order of arguments in spark.udf.register() is correct, columns can be referenced multiple times in SQL, and registered UDFs can indeed be used in SQL statements.