Databricks Certified Machine Learning Associate — Question 25
Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?
Answer options
- A. pandas API on Spark DataFrames are single-node versions of Spark DataFrames with additional metadata
- B. pandas API on Spark DataFrames are more performant than Spark DataFrames
- C. pandas API on Spark DataFrames are made up of Spark DataFrames and additional metadata
- D. pandas API on Spark DataFrames are less mutable versions of Spark DataFrames
- E. pandas API on Spark DataFrames are unrelated to Spark DataFrames
Correct answer: C
Explanation
The correct answer is C because the pandas API on Spark DataFrames builds on Spark DataFrames by incorporating additional metadata, enabling better compatibility with pandas-like operations. Options A and E incorrectly state the nature of their relationship, while B inaccurately claims that pandas API on Spark DataFrames offer better performance. Option D is misleading as it doesn't accurately capture the nature of their mutability.