Databricks Certified Associate Developer for Apache Spark — Question 181
A data scientist at a large e-commerce company needs to process and analyze 2 TB of daily customer transaction data. The company wants to implement real-time fraud detection and personalized product recommendations. To process their data the company uses a traditional relational database system, which is struggling to handle the increasing data volume and velocity.
Which feature of Apache Spark effectively addresses the challenge posed in this scenario?
Answer options
- A. Ability to process small datasets efficiently
- B. Support for SQL queries on structured data
- C. In-memory computation and parallel processing capabilities
- D. Built-in machine learning libraries
Correct answer: C
Explanation
The correct answer is C because Apache Spark's in-memory computation and parallel processing features allow it to handle large volumes of data quickly and efficiently, which is essential for real-time processing needs. Options A and B are less relevant as they do not address the scalability issues faced by the company, while option D, while useful for machine learning, does not directly tackle the challenge of processing large datasets in real time.