A data scientist is working with a massive dataset that exceeds the memory capacity of a…

Question

A data scientist is working with a massive dataset that exceeds the memory capacity of a single machine. The data scientist is considering using Apache SparkTM instead of processing the data using traditional single-machine programming languages like standard Python scripts. Which two advantages does Apache SparkTM offer over a normal single-machine language in this scenario? (Choose two.)

Accepted Answer

Correct answer: B, D. B. It has built-in fault tolerance, allowing it to recover seamlessly from node failures during computation. — D. It can distribute data processing tasks across a cluster of machines, enabling horizontal scalability. — The correct answers, B and D, highlight Apache SparkTM's ability to recover from node failures and its capability to distribute processing tasks, which are crucial for handling large datasets. Options A and C are incorrect as Spark does require some coding and is designed to utilize memory effectively, while E is incorrect since Spark can run on commodity hardware.

Databricks Certified Associate Developer for Apache Spark — Question 180

Answer options

Correct answer: B, D

Explanation