A data scientist is working on a project that requires processing large amounts of struct…

Question

A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task. Which combination of Apache Spark modules should the data scientist use in this scenario?

Accepted Answer

Correct answer: D. D. Spark DataFrames, Spark SQL, and MLIib — The correct answer is D because Spark DataFrames and Spark SQL are specifically designed for structured data processing and executing SQL queries efficiently. MLlib is the machine learning library in Spark, making it essential for applying machine learning algorithms. The other options either include irrelevant modules or lack the comprehensive use of Spark SQL and MLlib for the described tasks.

Databricks Certified Associate Developer for Apache Spark — Question 206

Answer options

Correct answer: D

Explanation