Databricks Certified Machine Learning Associate — Question 12
Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?
Answer options
- A. The vectorized pandas UDFs allow for the use of type hints
- B. The vectorized pandas UDFs process data in batches rather than one row at a time
- C. The vectorized pandas UDFs allow for pandas API use inside of the function
- D. The vectorized pandas UDFs work on distributed DataFrames
- E. The vectorized pandas UDFs process data in memory rather than spilling to disk
Correct answer: B
Explanation
The correct answer is B because vectorized pandas UDFs are designed to process data in batches, which significantly improves performance compared to traditional UDFs that handle one row at a time. The other options, while potentially true about vectorized pandas UDFs, do not specifically highlight the key performance benefit of batch processing.