A data scientist has written a data cleaning notebook that utilizes the pandas library, b…

Question

A data scientist has written a data cleaning notebook that utilizes the pandas library, but their colleague has suggested that they refactor their notebook to scale with big data.
Which of the following approaches can the data scientist take to spend the least amount of time refactoring their notebook to scale with big data?

Accepted Answer

Correct answer: E. E. They can refactor their notebook to utilize the pandas API on Spark. — The correct answer is E because utilizing the pandas API on Spark allows the data scientist to leverage their existing knowledge of pandas while scaling for big data. Options A, B, C, and D involve more substantial changes to the notebook and require learning new APIs, which would take more time to implement.

Databricks Certified Machine Learning Associate — Question 26

Answer options

Correct answer: E

Explanation