Databricks Certified Associate Developer for Apache Spark — Question 180
A data scientist is working with a massive dataset that exceeds the memory capacity of a single machine. The data scientist is considering using Apache SparkTM instead of processing the data using traditional single-machine programming languages like standard Python scripts.
Which two advantages does Apache SparkTM offer over a normal single-machine language in this scenario? (Choose two.)
Answer options
- A. It eliminates the need to write any code, automatically handling all data processing.
- B. It has built-in fault tolerance, allowing it to recover seamlessly from node failures during computation.
- C. It processes data solely on disk storage, reducing the need for memory resources.
- D. It can distribute data processing tasks across a cluster of machines, enabling horizontal scalability.
- E. It requires specialized hardware to run, making it unsuitable for commodity hardware clusters.
Correct answer: B, D
Explanation
The correct answers, B and D, highlight Apache SparkTM's ability to recover from node failures and its capability to distribute processing tasks, which are crucial for handling large datasets. Options A and C are incorrect as Spark does require some coding and is designed to utilize memory effectively, while E is incorrect since Spark can run on commodity hardware.