A data scientist at a large e-commerce company needs to process and analyze 2 TB of daily…

Question

A data scientist at a large e-commerce company needs to process and analyze 2 TB of daily customer transaction data. The company wants to implement real-time fraud detection and personalized product recommendations. To process their data the company uses a traditional relational database system, which is struggling to handle the increasing data volume and velocity. Which feature of Apache Spark effectively addresses the challenge posed in this scenario?

Accepted Answer

Correct answer: C. C. In-memory computation and parallel processing capabilities — The correct answer is C because Apache Spark's in-memory computation and parallel processing features allow it to handle large volumes of data quickly and efficiently, which is essential for real-time processing needs. Options A and B are less relevant as they do not address the scalability issues faced by the company, while option D, while useful for machine learning, does not directly tackle the challenge of processing large datasets in real time.

Databricks Certified Associate Developer for Apache Spark — Question 181

Answer options

Correct answer: C

Explanation